Code Formatter Development for Supercollider

Yes, that’s correct. More accurately, I would say that the sclang interpreter is designed for two distinct use patterns: Ahead-of-Time (AOT) compilation of the class library, and Just-In-Time (JIT) compilation of interpreter code. So the sclang parser grammar, which I take as the authoritative grammar of the SuperCollider language, was never designed to mix class definitions and interpreter code in the same input string and would consider that input invalid.

I think with some minor changes it could be possible to mix these two use modes, but the more I work on the compiler the more I feel that the ability to define (or redefine) classes at runtime adds a lot of complexity to the compiler, requires the programmer to keep a lot of state in their mind while working, and may not add proportionate value to the language when weighed against that.

I’ve merged the PR and have started work on the parser JSON dump. I should have some statistics around parsing once I’ve ironed out some of the kinks around class library compilation.

I don’t understand this question. I’m producing a PR right now that will produce a compiled C++ binary that runs on macOS (can add Linux and/or Windows as needed). It will take a --sourceFile command line flag when run, will decide if it’s a class file from the presence of the “.sc” extension on the file, and on lexing/parsing success will write a JSON stream to stdout which contains the parse tree of the file on stdout, and on failure may write a probably not very helpful error message to stderr and provide a nonzero exit code. Is that useful? Or is there some other flow of data here you’d need support for?

1 Like

I have a proof-of-concept of the JSON dump of the parse tree working. I took an example input from above, slightly modified to make it compile in sclang:

(
var y = { |a, b, c| var d; q; d = a * b; d=a*b*d; "foo".postln; a = d*b; c = d*a; d = a &b|c ; c + d; };


y = { |a, b, c| var d; q; d = a * b;




c + d; };


y = { |a, b, c| var d, q; var x;  q; d = a * b;




c + d; };

y = { |a, b, c| var d;
    d = a * b;
        d = a * b * d;
      a = d*b; c = d*a; d = a &b|c ; c + d; };
)

Unfortunately I hit character limits on this post on the resulting JSON dump, I’ve posted it as a gist here.

Each object is a dictionary with two keys _className which is the name of the object, and _identityHash which uniquely identifies that object. There are a few cycles in this object graph, only in the tail member of each parse node, which is a member you can ignore. But additional references to the same object produce a dictionary with a single key _reference which has the same value as the _identityHash key in the referenced object. Symbols are encoded as strings, floats and integers get their normal values, and nil is encoded with JSON null. I’m going to add some additional code for serialization of specialized container objects, notably Arrays and their RawArray cousins, but otherwise this is mostly working as intended, so I thought I’d share some sample output.

Next up is a load-time optimization I’ve been meaning to add for a while, which should hopefully make the execution speed of the dump-diag binary that produced this JSON faster. I should have statistics on successful parsing by then, and can add the needed code to the parser to parse close to 100% of the extant sclang code.

1 Like

Tangentially, for Python/C++ interop I overheard some colleagues at work discussing CLIF, which generates Python wrappers around C++ objects for direct usage of a C++ library in Python. It uses LLVM as a dependency and seems really complex and powerful.

1 Like

Wasn’t checking the threads because of memorial day - got a lot to catch up on !
I’ll take a deep dive later and come back to the thread. Looks like a lot of great things are happening.

Cool, no rush on my part. I gathered some statistics for parsing classes. I added a --doesItParse flag to dump-diag, which prints either YES or NO: filename to stdout, then using the command:

% find ../../third_party/supercollider -name '*.sc' -print0 | xargs -n1 -0 -I file ./dump-diag --doesItParse --sourceFile 'file' | wc -l
     469
% find ../../third_party/supercollider -name '*.sc' -print0 | xargs -n1 -0 -I file ./dump-diag --doesItParse --sourceFile 'file' | grep YES | wc -l
     406

So the parser currently parses 406/469 of the .sc files in the supercollider repository, or about 86%.

On .scd files I don’t have a statistic, some file is causing a crash on parse. I think I’m going to commit this PR, then start a PR to get the parser to 100% of the .sc and .scd files within the supercollider repository.

I’ve merged the PR with the missing parser functionality. Hadron now parses every “valid” sclang file in the supercollider repository with one exception. By “valid,” I mean every input file that sclang also parses. In supercollider/testsuite/classlibrary/TestMethod.sc, Hadron returns a parse failure on a test input that used to crash sclang when parsed.

Hadron doesn’t parse some .scd files in the examples/ directory. The common problem is that they have multiple blocks designed to be run independently instead of running the file as a whole. I spot-checked several of them, but with no automated means of determining which ones sclang can parse, I didn’t want to spend the time going through each one.

I think I’m still missing some corner cases, but I’m generally satisfied that Hadron can parse “most” valid sclang input. I’m going to focus on my previous development project of bringing the rest of Hadron’s compilation artifacts into sclang-accessible data structures, a lead-up project to an interactive debugger for sclang.

If you encounter bugs, have questions, or have specific feature requests, please reach out!

Cheers

1 Like

I’m working on a PR that does two things of interest to this thread:

a) Introduce a HadronDeserializer class that can consume the JSON generated by the dump-diag tool and convert it back to HadronParseNode objects in sclang. So, that class can serve as an example of deserializing the JSON in other languages, or you can use the code directly to work with the parse node objects in sclang if you wish.

b) Add code to convert a tree of HadronParseNode objects to a Graphviz DOT file, which allows you to visualize the parse trees. I’ve continually found this helpful when developing Hadron, so I am porting the current Python implementation to sclang now that I have access to the data structures there.

For example, here’s a lightly-edited version of the Integer method factors presented as a code block:

(
var factors = { |num|
		var array, prime;
		if(num <= 1) { ^[] }; // no prime factors exist below the first prime
		num = num.abs;
		// there are 6542 16 bit primes from 2 to 65521
		6542.do {|i|
			prime = i.nthPrime;
			while { (num mod: prime) == 0 }{
				array = array.add(prime);
				num = num div: prime;
				if (num == 1) {^array}
			};
			if (prime.squared > num) {
				array = array.add(num);
				^array
			};
		};
		// because Integer is 32 bit, and we have tested all 16 bit primes,
		// any remaining number must be a prime.
		array = array.add(num);
		^array
	};
factors(23).postln;
)

And the accompanying parse tree visualization:

1 Like

Sorry for being out of the loop on this for a while. School and work murdered me. I’ll catch up on your work @lnihlen and see where this is at!

Excited - looks like there’s a lot of great progress.

Hey, welcome back!

I’ve been moving in a different direction with Hadron’s parser recently, and would no longer advise its use for general-purpose parsing. Instead, you might want to check out Sparkler, which is an ANTLR grammar for SuperCollider. I’ve got it generating C++ parsers, but it can also generate JavaScript, Python, Java, and other language parsers. We’re also looking for volunteers to write an sclang parser generator plugin for ANTLR so we can parse sclang in sclang. The ANTLR grammar is complete to the best of my knowledge, and can successfully parse everything I’ve tested it with, including all the sclang code in the supercollider repository.

Cheers

1 Like

That’s fantastic! I’ll download it and see if i can’t get it to work. Parsing the whole of SC code is a great start. I’m also going to try the latest version of the treesitter library to see if they’ve worked out some of the nullpointer exceptions that we were hitting a few months ago.

For anyone checking this thread, I just posted a link to an auto-formatter based on Sparkle: https://scsynth.org/t/sclang-auto-indent-tool/7342