Hey,
tl;dr - I’m thinking of writing a reference grammar for sclang using the parser-generator tool ANTLR which can generate parsers for a bunch of languages, including C++, JavaScript, and Python. We might also be able to add a module so it can generate a parser in SuperCollider. Please tell me if you’d be interested in using it and/or helping to maintain/develop it.
Longer form:
I’ve seen at least a few discussions of ongoing projects or project proposals that might benefit from having a pre-made, well-supported parser:
- Automatic sclang code formatters
- Interpreter pre-processing sclang frontends
- Development tooling (like LSP)
I’ve been thinking about developing some kind of generic, reusable parser for sclang for a while now. I was inspired by all the neat language analysis tools that clang, the LLVM C-languages frontend, inspired because it provided a useful frontend parser for C++, a language notoriously difficult to parse.
It seemed most obvious to start with re-using the parser inside of sclang. I had written (and then abandoned) a SuperCollider PR long ago to expose the parse tree built by the sclang interpreter during compilation. The problem with this approach is that sclang, in the interest of compilation speed, transforms its parse tree somewhat while building it. So the result of parsing in sclang is a tree that is ready for the next stage of compilation but no longer exactly represents the input language in a form that makes the use cases detailed above as obvious or easy.
For example, the sclang parser expands syntax shortcuts into their underlying meaning (think performList
syntax shortcuts or generator expressions), and also does the first pass of dead code deletion (creating all those PyrDropNode
objects), etc.
My next attempt was to try and prop Hadron’s parser up as a possible “official reference parser” for the community, but as I continue my development work on Hadron, a couple of problems with this approach have come up for me:
- I think an official parser is useful enough to the community that I don’t necessarily want to couple it to the fate of a project as experimental and uncertain as Hadron.
- Hadron is written in C++ so any consumer of the parse tree data has to be able to interop with C++ (or an external binary), lowering its usefulness.
- I’d like to remove some of the design constraints on Hadron’s parser and follow in sclang’s footsteps of lowering the parse tree during construction.
- During the bootstrapping phase of Hadron compilation, I need a reference parser. Compiler writing is full of fun chicken-or-egg paradoxes like this.
I’ve looked through several different parser generators, and I think ANTLR looks the most promising. It supports languages that I’ve seen discussed most often here, and I think we could coerce it also to generate an SC parser. That’s right, we could distribute a Quark that contains an ANTLR-generated sclang parser that takes an input string and produces a parse tree of SCLang objects.
I haven’t started work yet on this, but am contemplating starting soon. I thought I’d ask for feedback and gauge interest first. So what do ya’ll think?