Issues with SC Regexp engine

I have written some regexp using the boost.org syntax and they fail to work in SC while they are working fine in some Regexp engine one can find on the web (e.g. https://regex101.com/).

Some cases:

matching a “(”

Escaping a “(” with \( is considered as group start and not as a “(” character.
The character must be escaped a 2nd time : [\(] . This is abnormal.

"hello (how low)".findRegexp("\(.+\)"); // doesn't find "(how low")
"hello (how low)".findRegexp("[\(].+[\)]"); // working fine

look behind with alternation symbol “|”

In “hello (how low) hello”, looking for the “h” at the begin of the string or preceded by a “(”, should be written like this, but throws an error :

"hello (how low) hello".findRegexp("(?<=^|\()h");  // throws an error

(test it)

It is definitely the “|” condition within a Look behind which is causing trouble to SC. If I remove it, no more errors (but partial results)

"hello (how low) hello".findRegexp("(?<=[\(])h"); // remove the "|" condition, no error, but partial 
"hello (how low) hello".findRegexp("(?<=^)h"); // remove the "|" condition, no error, but partial 

While a “|” condition in a standard condition is working fine too which is causing trouble to SC. If I remove it, no more errors (but partial results):

"hello (how low) hello".findRegexp("(^|[\(])h"); // replace the look behind by a standard match , no error, but not the expected result
"hello (how low) hello".findRegexp("(?:^|[\(])h"); // replace the look behind by a standard match , no error, but not the expected result

Is SC using the latest version of the Boost library ?
The documentation linked in the SC help file is for an older version (1.69), while there is a newer version (1.78).

If you need a literal backslash in a SC string (as in C++), you’ll need to escape it with a preceding backslash. This makes for confusing/awkward regex strings, unfortunately, and complicates copying/pasting to/from test sites like regex101.com.

Try:

"(how low)".findRegexp("\\(.+\\)");

If you print the SC string, you’ll see it actually only contains single backslashes.

"\\(.+\\)".postcs

Great. Thanks. This is good to know.

But the second issue remains. The look behind with alternation symbol “|”.

// This combination is throwing an error
"hello (how low)".findRegexp("(?<=^|\\()h") // "h" preceded by a start of line **or** a "(" :: KO

// All the individual parts are working fine
"hello (how low)".findRegexp("(^|\\()h") // a start of line **or** a "(" followed by a "h" :: OK
"hello (how low)".findRegexp("(?:^|\\()h")  // a start of line **or** a "(" followed by a "h" (alt) :: OK
"hello (how low)".findRegexp("(?<=^)h") // "h" preceded by a start of line :: OK
"hello (how low)".findRegexp("(?<=\\()h") // "h" preceded by a "(" :: OK
"hello (how low)".findRegexp("(?<=x|\\()h") // // "h" preceded by a "(" **or** a "x"  :: OK

So, which is the Boost library version used by SC ? The latest ?

I’m not sure which Boost version is used with SC (I doubt there have many changes in Boost regex recently, but I may be wrong) but note that SC uses the ECMAScript “flavour” of regexes (note that you can also set the regex engine in the regex101.com settings; ECMAScript is not the default there, I don’t think).

Update: looks like SC uses Boost version 1.74, since 2020.

I just check my regexp with the ECMAscript flavour, and it is working fine on regex101.com.
So there is definitely an issue with the version of the Boost library used by SC.
Not a big deal, because I can workaround it by using 2 regexp, join their results and re-ordering them based on the index. Nevertheless it should worth including the new version of the Boost library in the next distribution of SC.

Note on boost: we use boost 1.74.0 for macOS and Windows binaries (probably since SC 3.11, but I’m not sure).
I think that Linux packages typically use system-provided boost libraries (whether that’s the case or not might be distribution dependent, I’m not sure) so we don’t have control over this.

Upgrading boost might also have OS compatibility consequences, but I haven’t looked whether that’s actually the case between the version we currently use and the newest one. That is to say that upgrading boost might not be straightforward for a couple reasons, but again, we’d need to look into it.

Are we sure that this is due to the version of boost and not something else?