Feed Me Weird Things - building a SClang corpus

lnihlen · March 19, 2026, 1:48pm

Hey,

For validating, testing, and benchmarking Hadron’s language frontend, I am trying to build a collection of existing SuperCollider code, old and new, that is as large as possible. For more details you can see my blog post about it but tl;dr please send me links to open source projects with SuperCollider code in them, or if you have private code you’d like to donate, or if you’d like to help collate and coordinate the collection, please do get in touch.

It occurs to me, what with SuperCollider itself getting a new compiler, that this project might have benefit for more than just Hadron, in case that helps with motivation.

So far I have about 1.5 million lines of code. Things I’ve already done:

downloaded all the quarks
included the SC Class Library
made a search through GitHub for projects including SuperCollider code

TBD:

go through various awesome SuperCollider lists
?? your suggestions here

Thanks

amindfv · March 19, 2026, 5:51pm

what with SuperCollider itself getting a new compiler, that this project might have benefit for more than just Hadron

This is news to me; can you provide a link? (Unless I’m misreading, and the new compiler is Hadron?)

lnihlen · March 19, 2026, 6:59pm

Lot happening in PRs on GitHub these days. I find myself combing through there every few weeks just to keep up with the language. This is what I’m referring to:

github.com/supercollider/supercollider

sclang: large reworking of errors and large refactor of lang/

supercollider:develop ← JordanHendersonMusic:topic/error

opened 12:10PM - 08 Feb 26 UTC

JordanHendersonMusic

+7624 -6691

This pr will be broken into multiple smaller ones where possible to make reviewi…ng easier. This might take a while and somethings have to come in large chunks. One thing I've been keen on doing is not adding to the technical debt we have (e.g., excessive use of global mutable variables) but to do this properly. I've put this here to show where this is going. Part one #7394 - rewrite of lexer so token location is correct emitted, provides many other benefits, including a 'one true' lexer, extendible with templates, so we don't need the (at least) 3 currently in this repo. Also opens up the possibility of unicode support, string interpolation, dedicated DSL support...etc. Part two Grammar refactor, move lexer-parser communication to use `yylval` and `yyloc` as our current implementation relies on unsafe globals, also refactor parse node constructors so they are safe. Part three Rework main interpret entry point to accept a filename, location within the file, along with the source, this will mean IDEs need to update. Store source code of all functions with the function def, not just closed functions. Part four Errors and bytecode serialisation on SC side. This will mostly be supercollider code. Potentially bring in unicode to the class library in some way. ## Purpose and Motivation Supercollider's error messages are bad, this PR put backtraces in and highlights the code. This is something I've been trying to do for a long time now and have only just become familiar enough with the code base to be able to get this to work — it has been quite a bit of work! This is the current state of this PR. ```supercollider ( ~f = { |i, action| if (i == 0) { action.() } { ~f.(i - 1, action) } }; ~f.(4, { 3.asdf }) ) ``` ***This shows the most verbose output, there are options to disable parts of the trace.*** ``` ERROR: a DoesNotUnderstand error has occurred. 3 did not understand the message 'asdf' Call Stack ──────────── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ 1: Function ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ FunctionDef: ───────────────────────────────────────────────────────── Selectors: [a FunctionDef, a FunctionDef] Constants: [f] Bytecodes: ──────┬─────────────────────────────────────────────────────── 7 │ PushSpecialNumber One 8 │ PushLiteralX [Index 1] 9 │ TailCallReturnFromFunction 10 →│ SendSpecialMsg 3 [SpecialSelectors value] SourceCode: ─────┬─────────────────────────────────────────────────────── 1 │ ( 2 │ ~f = { |i, action| 3 │ if (i == 0) { 4 │ action.() 5 │ } { 6 │ ~f.(i - 1, action) 7 │ } 8 │ }; 9 │ 10→│ ~f.(1, { 3.asdf }) │ ^^^^^^^^^^^^^^^^^^ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ 2: Function ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ Arguments: i=1, action=a Function FunctionDef: ───────────────────────────────────────────────────────── Constants: [f] Bytecodes: ──────┬─────────────────────────────────────────────────────── 12 │ PushOneAndSubtract 13 │ PushTempZeroVar [VariableIndex, 1] 14 │ TailCallReturnFromFunction 15 →│ SendSpecialMsg 3 [SpecialSelectors value] SourceCode: ─────┬─────────────────────────────────────────────────────── 2 │ ~f = { |i, action| 3 │ if (i == 0) { 4 │ action.() 5 │ } { 6 →│ ~f.(i - 1, action) │ ^^^^^^^^^^^^^^^^^^ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ 3: Function ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ Arguments: i=0, action=a Function FunctionDef: ───────────────────────────────────────────────────────── Constants: [f] Bytecodes: ──────┬─────────────────────────────────────────────────────── 4 │ JumpIfFalse [Uint [16, 1, 0], Uint [16, 0, 7]] 5 │ PushTempZeroVar [VariableIndex, 1] 6 │ TailCallReturnFromFunction 7 →│ SendSpecialMsg 1 [SpecialSelectors value] SourceCode: ─────┬─────────────────────────────────────────────────────── 2 │ ~f = { |i, action| 3 │ if (i == 0) { 4 →│ action.() │ ^^^^^^^^^ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ 4: Function ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ FunctionDef: ───────────────────────────────────────────────────────── Selectors: [asdf] Bytecodes: ──────┬─────────────────────────────────────────────────────── 1 │ PushInteger8 [Int [8, 0, 3]] 2 │ TailCallReturnFromFunction 3 →│ SendMsg 1 [SelectorIndex 0] SourceCode: ──────┬─────────────────────────────────────────────────────── 10 →│ ~f.(1, { 3.asdf } │ ^^^^^^ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ 5: Object:doesNotUnderstand ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ Arguments: this=3, selector=asdf ...args=[], kwargs=[] FunctionDef: ───────────────────────────────────────────────────────── Selectors: [throw, DoesNotUnderstandError] Bytecodes: ──────┬─────────────────────────────────────────────────────── 3 │ PushTempZeroVar [VariableIndex, 1] 4 │ PushTempZeroVar [VariableIndex, 2] 5 │ PushTempZeroVar [VariableIndex, 3] 6 →│ SendSpecialMsg 5 [SpecialSelectors new] SourceCode: ───────┬─────────────────────────────────────────────────────── 351 │ doesNotUnderstand { |selector ...args, kwargs| 352 →│ DoesNotUnderstandError(this, selector, args, kwargs).throw; │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR: '3' did not understand the message 'asdf' Messages with a similar name understood by the receiver: asin absdif as asRef ``` FYI, if you read this on mobile, the formatting is messed up. I've used unicode arrows and boxes, don't know if that is a good idea, but it looks good in the post window! There are many many formatting issues to be resolved in the error printing, but it appears to be stable. ## Types of changes - All parsenodes can now calculate their location in the text. - FunctionDefs now store the entire source file, not just their little piece. - Bytecodes now emit their location in the source file into the function def. - Rename PyrBlock to PyrFunctionDef because blocks are syntax and not all of them are valid function defs (inlined blocks are never turned into function defs), there was also ambiguity with the naming as the class library called 'blocks' 'functiondefs'. - Rework the error printing of doesnotunderstand to print backtraces highlight the source code. - Make doesNotUnderstand calls not be tail called, otherwise we can't see the code the user has called. - rename the lexer character variables because they were ambiguous as to what they actually refereed to. - completely rework the relationship between lexer and parser, removing incorrect use of globals in favour of `yylval` and `yyloc`. - remove some unnecessary grammar rules. - refactor parts of the compiler that were too complicated to read. - renaming of a few variables as their names no longer made sense or were ambiguous. ## To-do list much... Get the clickable links in the terminal PR merged. Get all functiondefs to know their file name PR merged. Implement clickable links to the source code. Rework compiler error messages so they look similar-ish to the run time errors.

jamshark70 · March 20, 2026, 8:31am

So you have my stuff, then

Another thing I’ll do, when I have a moment, is to write up a script using my live coding dialect that can run automatically using only my publicly available instruments. This dialect can incur some pretty deep stacks (of course I’ll make sure to include some more complex, nested expressions in the script) and it likely beats up the garbage collector pretty hard too.

I was going to offer to try out Hadron with my LC environment (still willing to do that) but this would let you include it in a test suite.

Hope that’s helpful –
hjh

lnihlen · March 20, 2026, 2:10pm

Thanks @jamshark70! I appreciate your contributions here.

One important point that might save you some time. The corpus, as a whole, does not compile successfully. There are a ton of name collisions and just broken code in the corpus. So I’m not planning on ever trying to get the entire corpus to compile, and I’m not going to run code from the corpus in that context.

All I need is syntactically valid code - meaning code that parses correctly. So this is a test suite for the language frontend, meaning the lexer and parser, only. It has other uses but those are the requirements. We really can’t expect more from a body of code this large. The ecosystem of SC code was never designed to work entirely together. This is a reasonable expectation of any programming language, for instance imagine trying to compile all known extant C++ code into one project! Disaster!

I would love to include some examples of your work in the language benchmark suite I’m building, to make sure that the performance characteristics you rely on are maintained. That’s a discussion for another day.

And thanks again for your generous offer to take Hadron for a spin in your LC environment. I shall endeavour to make Hadron ready for such an experiment, and should the time come, will definitely look you up.

lnihlen · March 20, 2026, 5:05pm

Put another way, code submissions are not expected to succesfully run as-is when submitted to the corpus. But they need to be working on your computer in order to be useful. Does that make sense?

Sam_Pluta · March 20, 2026, 8:50pm

Here you go:

Most of that works.

I hope things are well Lucile.

Sam

jamshark70 · March 21, 2026, 5:11am

Ah ok, so my idea is premature as of now. Sure, no problem, will be happy to kick the tires when the time comes.

hjh

lnihlen · March 21, 2026, 4:12pm

Thanks @Sam_Pluta, I’ve added both of these to the corpus. I am well, hope you are too!

jeremyruppel · March 25, 2026, 11:01pm

How cool! Count me in claude-collider/ClaudeCollider at main · jeremyruppel/claude-collider · GitHub

jordan · April 2, 2026, 3:13pm

Are you only looking for ‘good’ code, because I’ve got some examples of bad/unusual code I’ve written based off the oddities I’ve seen in the lexer and parser source code?

lnihlen · April 2, 2026, 3:52pm

I’d love to see those, although they’d probably be better as integration or unit tests for Hadron’s lexer and parser. Please do share!

jordan · April 2, 2026, 4:41pm

I don’t have it in a repo, but here is some code

Foo {
	// unicode silently dropped
	*barαΩ { ^\bang }
}

Foo.bar == \bang;

// zero width space
" asdf\".quote == "\" asdf\""

(
var a = 2;
apow:4 == 16;
)

(
// whitespace (or anything non ASCII) allowed between ~ and identifier
~αΩa = 1;
~a == 1;
)

(
~    a = 5;
~a == 5;
)

(
~  
/* even */
// comments
a = 2;
~a == 2;
)

// setters
(
var a = ();
foo_(a, 1);
a.foo == 1;
)

(
var a = ( _: 1);
(a _: 10) == 1;
)

(
var a = ();
(foo_:)(a, 1);
a.foo == 1;
)

// weird stuff with radix
+ Integer {
	doesNotUnderstand { |selector ...args, kwargs|
		 ^args[0] 
	}
}

(
var b = 10;
// send the message 'pi' to 11ra1 (111).
11ra1pi:b == 10;
)

2r1pi:1 == 1;

0r + 1 == 1;
{ | a 0rbar 2| a + bar }.() == 2;
{|aΩ2cΩ3|ΩaΩ+ΩcΩ}.() == 5;

// hexi doesn't care about prefix
123451243xA == 0xA;

#  [ 1 ] == #[1];

// weirdness of backtick ref.
(
var a;
r = `a = 1;
a == 1;
)

(
var a;
r = `a = 1;
r.value == 1;
)

(
var a, b;
`a = b = 1;
(a == 1) && (b == 1)
)

(
var a, b;
`a = b = 1;
r.value == 1;
)

( 
var a, b, r;
r = `#a, b = [1, 2];
r.value == [1, 2]
)

( 
var a, b, r;
r = `#a, b = [1, 2];
(a == 1) && (b == 2);
)

(
var c1, c2, a1, a2;
c1 = CollStream();
c2 = CollStream();

(`1.pow(2)).storeOn(c1);
a1 = c1.collection;

( (`1).pow(2) ).storeOn(c2);
a2 = c2.collection;

a1 == a2 
)

(
var c = $
;
c.asInteger == Char.nl 
)

// Note, if you wrap this in brackets, this fails! There is NOT a space after the $
// Requires shift+enter
32.asAscii == $

(
Char.nl == $
)

// nl is defined in object
Char.nl.asString == nl;

// capped at 4, same for b
1ssssssssss == 1ssss;

// Can set interpreter vars in a closed function.
(
a = nil;
#{ a = 10 }.();
a == 10
)