Scparco: quark for making text parsers using parser combinators in sclang

shiihs · May 28, 2023, 8:42pm

Many who’ve tried to define a domain specific language for music specification or live coding have felt the need to parse text in sclang. Attempts to use regexes quickly reveals their limitations and can result in headache-worthy code that is hard to debug, read and extend. (Been there, done that!)

Here’s an attempt at making generation of text parsers easier: a GPLv3 quark providing Parser Combinators. Parser Combinators allow to construct a parser for a complex specification by combining together many smaller parsers. The classes come with small examples, and the readme file on github demonstrates the first non-trivial parsing exercise in any parsing course: an example of parsing (and evaluating) a nested mathematical expression.

The library is very new (I literally started it yesterday, based on a javascript tutorial) and as a result it hasn’t been used in any real-life project yet, so expect bugs and/or grave limitations/omissions. At the same time, it feels like a promising approach and invites further experimentation.

https://github.com/shimpe/scparco

The .schelp documentation for the library is generated from comments embedded in the code using the whelk tool.

https://github.com/shimpe/whelk

jamshark70 · May 29, 2023, 1:49pm

Nice.

It happens that I wrote (last year) an article about parsing techniques in sclang – which is still in preparation (factors outside my control), and now already out of date, since it should probably refer readers to this quark.

In my live coding dialect, I use regexes to look ahead and identify syntactic elements, but I end up parsing them character by character – for exactly the reason you mention: awareness of context.

hjh

shiihs · May 31, 2023, 12:31pm

Hi James, thanks for reading the announcement and for your reaction!

The fact that you wrote an article about parsers confirms my suspicion that I’m not the only one wishing for better support for parsing in sclang. There may still be advantages in using more bare-metal regex based techniques for not too complex specifications. E.g. I haven’t really tried to quantify memory usage of the declarative parsers but I suspect in some cases it may be much higher than using a more direct approach.

In the meantime, I’ve added some classes that show how to parse binary data (bit by bit if desired), which makes that the system can also be used to parse things like sysex msgs or sound file headers if desired.

I’m not sure if your last sentence implied an assumption that the parser combinator based approach leads to parsers that are not context-aware? They can be made context aware using their “chain” method.

An example showing context awareness has been added in the github readme. Context awareness is used to extract a manufacturer’s ID from a binary sysex msg. If the first byte of the ID != 0x00, the byte itself is the manufacturer’s ID. If the first byte == 0x00, then two more bytes follow that contain the manufacturer’s ID. This is easily handled in the example.

jamshark70 · May 31, 2023, 1:49pm

Not at all. It’s regexes that are very hard to make context-aware.

hjh