Testing a draft of `.replaceRegexp`

Hello,

I am attaching a draft of .replaceRegexp. Could anyone prove this before I make a PR?

method:

+ String {
	replaceRegexp { arg findRegexp, replace;
		var founds, replaced;
		founds = this.findRegexp(findRegexp);
		founds = if(findRegexp[0] == $^) {
			founds.collect { |array| if (array[0] == 0) {array} {} }
		} {
			founds
		};
		while { founds.includes(nil) } { founds.remove(nil) };
		founds = founds.asSet.asArray.sort({ |a, b| a[0] < b[0] });
		replaced = this;
		if(founds.size > 0) {
			founds.reverse.do { |idx_str|
				var foundIndex, foundString;
				#foundIndex, foundString = idx_str;
				replaced = if (foundIndex > 0) {
					var lastString = replaced[foundIndex + foundString.size ..];
					lastString = if(lastString != nil) { lastString } { "" };
					replaced[0 .. foundIndex - 1] ++ replace ++ lastString
				} {
					replace ++ replaced[foundString.size ..]
				}
			}
		} {
			replaced
		};
		^replaced
	}
}

Test code:

/* Removal of all single numbers */
"012qW567<>?,. /".replaceRegexp("[0-9]", "") // qW<>?,. /
"012qW567<>?,. /".replaceRegexp("\\d", "")   // qW<>?,. /
​
/* Removal of two adjacent single numbers */
"012qW567{}|[`~]".replaceRegexp("[0-9]{2}", "") // 2qW7{}|[`~]
"012qW567{}|[`~]".replaceRegexp("\\d{2}", "")   // 2qW7{}|[`~]
​
/* Removal of from three adjacent single numbers */
"0q12W345;6789 :'\"\\\(\)".replaceRegexp("\\d{3,}", "")   // 0q12W; :'"\()
​
/* Removal of from three to five adjacent single numbers */
"0q12W345{6789}|12345[123456\\1234567~]".replaceRegexp("\\d{3,5}", "")   // 0q12W{}|[6\67~]

/* Removal other than each single number */
"123qWe456!@£$%^&*".replaceRegexp("\\D", "")    // 123456
"123qWe456!@£$%^&*".replaceRegexp("[^\\d]", "") // 123456
"123qWe456!@£$%^&*".replaceRegexp("[^0-9]", "") // 123456

/* Removal of the single number at the beginning of the string */
"123qWe456!@£$%^&*".replaceRegexp("^[0-9]", "") // 23qWe456!@£$%^&*
"123qWe456!@£$%^&*".replaceRegexp("^\\d", "")   // 23qWe456!@£$%^&*
​
/* Removal of the series of numbers at the beginning of the string */
"123qWe456!@£$%^&*".replaceRegexp("^[0-9]+", "") // qwe456!@£$%^&*
"123qWe456!@£$%^&*".replaceRegexp("^\\d+", "")   // qwe456!@£$%^&*

/* Removal of the single number at the end of the string */
"123QWE456rty789".replaceRegexp("\\d$", "")            // 123QWE456rty78
​
/* Removal of the series of numbers at the end of the string */
"123QWE456rty789".replaceRegexp("\\d+$", "")           // 123QWE456rty

/* Removal of all numbers preceded by a non-number */
":123QWE456rty789".replaceRegexp("(?<=\\D)\\d", "") // :23QWE56rty89
​
/* Removal of all numbers not preceded by a non-number */
":123QWE456rty789".replaceRegexp("(?<!\\D)\\d", "") // :1QWE4rty7
​
/* Removal of all numbers followed by a non-number */
"432:123QWE456rty789".replaceRegexp("\\d(?=\\D)", "") // 43:12QWE45rty789
​
/* Removal of all numbers not followed by a non-number */
"432:123QWE456rty789".replaceRegexp("\\d(?!\\D)", "") // 2:3QWE6rty

/* Removal of all single lowercase letters */
"123qWe456RTY!@£$%^&*".replaceRegexp("[a-z]", "") // 123W456RTY!@£$%^&*
"123qWe456RTY!@£$%^&*".replaceRegexp("\\l", "")   // 123W456RTY!@£$%^&*

/* Removal other than single lowercase letter */
"123qWe456RTY!@£$%^&*".replaceRegexp("\\L", "") // qe*

/* Removal of all single uppercase letters */
"123qWe456RTY!@£$%^&*".replaceRegexp("[A-Z]", "") // 123qe456!@£$%^&*
"123qWe456RTY!@£$%^&*".replaceRegexp("\\u", "")   // 123qe456!@£$%^&*

/* Removal other than single uppercase letter */
"123qWe456RTY!@£$%^&*".replaceRegexp("\\U", "")   // WRTY

It should be a C++ primitive to use boost.regex’s replace, you could use stripRtf and findRegexp as starting points.

1 Like

Thank you for your guidance.

  • In my draft, I basically used `.findRegexp’ method, which seems to be from one of the primitives you mentioned, and then processed data to replace the found string. As far as I have tested it, it seems to be correct and functional. However, I am not sure if it will work in any case…
  • I cannot configure how to use .stripRtf to replace text with regular express. If you mean a functionality that replaces the text from an RTF file with regular express, I think this should be a user-side code including a = File(path, "r"), a.readAllString.stripRTF, a.close, a.replaceRegexp… Of course, it would be convenient to have this as a single method. Do you mean this?

I have one more question to avoid confusion! Is the C++ primitive a reference to start with, or is modifying the C++ primitive avoidably necessary to implement .replaceRegexp? Modifying the C++ primitive is currently beyond my capabilities… Could it not be enough to implement this feature with SC file modification if my method draft works correctly?

This is unreasonable when, for a complex task, there is a perfectly good library method you can use in C++, which doesn’t require additional review, testing or maintenance. If you are unable to write the primitive I would suggest leaving it as a task for someone else. You can always supply your code as a Quark.

You would want to write a new primitive, and the two I listed would be good as examples because they handle the two related tasks of (1) using a function from boost.regex and (2) performing replacement operations on a string.

1 Like

Maybe interesting, although I’m a bit late to let you know: a quark happens to exist which implements a replaceRegex (and some other operations as well).

1 Like

If I’m not mistaken, there is a chapter on writing language primitives in the supercollider book. Maybe we can just share it?

Too bad this is a bit intrusive in the project, since there isn’t some kind of lang plugin.

1 Like

@VIRTUALDOG
Thanks for the detailed explanation! I now understand more the importance of writing primitives using the C++ library, and also why there are so many Quarks with similar functionality.

@shiihs
Thank you for your Quarks! I have tested with my test code. It also gave me opportunity to review the problem of my method draft.

Your method returns the same result for the following two functionality:

/* Removal of the single number at the beginning of the string */
"123qWe456!@£$%^&*".replaceRegex("^[0-9]", "") // 23qWe456!@£$%^&* //<-(expected) // qWe456!@£$%^&*
"123qWe456!@£$%^&*".replaceRegex("^\\d", "")   // 23qWe456!@£$%^&* //<-(expected) // qWe456!@£$%^&*
​
/* Removal of the series of numbers at the beginning of the string */
"123qWe456!@£$%^&*".replaceRegex("^[0-9]+", "") // qwe456!@£$%^&*
"123qWe456!@£$%^&*".replaceRegex("^\\d+", "")   // qwe456!@£$%^&*

The following returns an error:

/* Removal of all numbers preceded by a non-number */
":123QWE456rty789".replaceRegex("(?<=\\D)\\d", "") // :23QWE56rty89 //<-(expected)

The following functionality returns unexpected result on Window (my method draft also has the same problem):

/* Removal of all single uppercase letters */
"123qWe456RTY!@£$%^&*".replaceRegexp("\\u", "")   // 123qe456!@£$%^&* // <-(expected) 
// 123qe456!@�$%^&* // <- on Window. my method also has the same problem on Window.

/* Removal other than single uppercase letter */
"123qWe456RTY!@£$%^&*".replaceRegex("\\U", "")   // WRTY // <-(expected)
// WRTY�  // <- on Window. my method also has the same problem on Window.

These oddities in Windows discourage me from continuing to work in this way.

Writing Primitives | SuperCollider 3.12.2 Help is also a resource.

Btw neither of these SC implementations take into account match groups/match format syntax (e.g. s/ (\d)/\1/ in sed; boost’s docs are Perl Format String Syntax - 1.84.0) which is a major part of regex substitution functionality.

3 Likes

Something interesting about Scheme as extension language, a shared library is linked to the running Guile image only when required, optimizing resource usage and flexibility.

I’m sure the devs have discussed it about sclang, and it must be challenging to do something like this.

I wrote an implementation of rational numbers that you reviewed, but you said a primitive would be better. Ok, at the time, I even compiled a version using the boost lib implementation, but it is a little intimidating to modify source files that are so central to the project. Never completed this project.

The quark is good enough for me, but people tend to make more improvised implementations than using a quark. Maybe that’s a trust issue.

1 Like

Language plugins would be great, although nobody has taken the effort for it. But let’s stick to the main topic please. (:

2 Likes

Sorry about that. I agree with you, the boost regexp is good. The part I wanted to contribute to the conversation is that if everything must be implemented as primitive (in a scheme style), there should at least be an effort to improve it. Otherwise, things just get stuck.

1 Like

I tried to write primitive using Gemini, but gave up.

It is a shame that I want to implement this feature, but personally could only do so using sclang (even though I think my draft works almost correctly), because in this case it should be implemented as a form of primitives…

So unless there is someone in the development group or a user who actively wants this feature and can handle C++, this feature won’t be implemented in the official SuperCollier build. Instead, several quarks will be written by different users. (Currently at least two…) Oh, I think this is not user friendly (for musicians even more unfriendly)…

It would be better for a user like me to ask advanced users with C++ programming skills and donate some money or a gift via Amazon for example. (I think learning C++ is too far away from music. Am I wrong?)

The philosophy of Open Source is ideal, but the state of development of SC is not ideal if a normal user wants a new feature or if all users have a similar level of programming…

Anyway, I would like to include this feature as an external method in one of my class files in my Quarks that I intend to publish, but how could I write the appropriate part of this method for the String.schelp?

It does not cover the majority of Boost’s (or any standard) regex replace functionality. It would be bad if this were the core library’s regex-replace. A simple example of a replace which doesn’t work is "a".replaceRegexp(".", "$0")"a".

Until Rust takes over, C and C++ are the standard languages for writing audio applications. If you aren’t going to learn it, that’s fine, but then you shouldn’t be surprised if you have difficulty contributing to a codebase which is already a huge majority C++ code.

The good news is that there are many, many things you can write in sclang which do not require C++ primitives and which would solve open issues. It’s primarily in the case of needing good performance, or functionality which is best provided by a C or C++ library. So you can always try one of those instead. Or, you could take a course on C++, since you have already identified this as an in-demand skill for this project.

Look at the Quark linked above: https://github.com/shimpe/scstringext/tree/master

1 Like

I had a go, seems to work as expected

3 Likes

@jordan Jordan Thank you for your kindness!

@VIRTUALDOG I had decided not to investigate learning more languages, but … I should reconsider… Anyway, thanks for the clarification. You made me understand the whole scene more deeply!

Just to chime in…

this took me about 2.5 hours, which given it was the first time I’ve touched the lang side of sc wasn’t too bad.

The real issue isn’t learning the language, its reading the existing code, and figuring out all the unspoken rules… I kept getting a nasty bug because I assumed the char* of supercollider strings was null-terminated (which I think is reasonable) yet it isn’t… I still haven’t found exactly where this is stated in the code or documentation…

It is things like that, along with many of the older conventions, that make this code very challenging for a beginner. The only solution is for someone to clean it up, but that requires that the writer understand all the code in the first place… I don’t think there is anyone with enough knowledge of this code to perform such a task (might be wrong though!).

I’m not too familiar with rust, but I imagine most of the code in the vm would have to be unsafe, so perhaps not a good match here, but definitely in the audio side of things.

2 Likes

Yes, it’s not super complicated to do it, but for the reasons you mentioned, it is. It’s not exactly the C language the problem.

A post was split to a new topic: Sclang extensibility

Thanks @jordan !

Getting used to a new codebase is always a learning curve. I don’t think it is documented, and it probably should be since that would be a reasonable assumption. In SC Strings aren’t null terminated because they are defined as a type of array. So in C++ it would be more like std::vector<char> (which isn’t null terminated) than std::string (which is).

3 Likes

I also found that symbols are null terminated, which really confused me because they store their size as well,
but it is a uint8_t and doesn’t include the null terminator.


I just had a quick look and the uint8_t causes at least one bug…

// works, produces a file with 255 'a's
a = ('a'!255).reduce('++').asString.asSymbol
f = File("~/tmp/test.txt".standardizePath, "w")
f.write(a)
f.close


// does not work, produces an empty file
a = ('a'!256).reduce('++').asString.asSymbol
f = File("~/tmp/test.txt".standardizePath, "w")
f.write(a)
f.close

There are also a few other places this occurs, although most primitives call strlen on the char*.

Definitely off topic now, but I will make a gh issue.

1 Like