Feed Me Weird Things - building a SClang corpus

Hey,

For validating, testing, and benchmarking Hadron’s language frontend, I am trying to build a collection of existing SuperCollider code, old and new, that is as large as possible. For more details you can see my blog post about it but tl;dr please send me links to open source projects with SuperCollider code in them, or if you have private code you’d like to donate, or if you’d like to help collate and coordinate the collection, please do get in touch.

It occurs to me, what with SuperCollider itself getting a new compiler, that this project might have benefit for more than just Hadron, in case that helps with motivation.

So far I have about 1.5 million lines of code. Things I’ve already done:

  • downloaded all the quarks
  • included the SC Class Library
  • made a search through GitHub for projects including SuperCollider code

TBD:

  • go through various awesome SuperCollider lists
  • ?? your suggestions here

Thanks

7 Likes

what with SuperCollider itself getting a new compiler, that this project might have benefit for more than just Hadron

This is news to me; can you provide a link? (Unless I’m misreading, and the new compiler is Hadron?)

Lot happening in PRs on GitHub these days. I find myself combing through there every few weeks just to keep up with the language. This is what I’m referring to:

So you have my stuff, then :+1:

Another thing I’ll do, when I have a moment, is to write up a script using my live coding dialect that can run automatically using only my publicly available instruments. This dialect can incur some pretty deep stacks (of course I’ll make sure to include some more complex, nested expressions in the script) and it likely beats up the garbage collector pretty hard too.

I was going to offer to try out Hadron with my LC environment (still willing to do that) but this would let you include it in a test suite.

Hope that’s helpful –
hjh

Thanks @jamshark70! I appreciate your contributions here.

One important point that might save you some time. The corpus, as a whole, does not compile successfully. There are a ton of name collisions and just broken code in the corpus. So I’m not planning on ever trying to get the entire corpus to compile, and I’m not going to run code from the corpus in that context.

All I need is syntactically valid code - meaning code that parses correctly. So this is a test suite for the language frontend, meaning the lexer and parser, only. It has other uses but those are the requirements. We really can’t expect more from a body of code this large. The ecosystem of SC code was never designed to work entirely together. This is a reasonable expectation of any programming language, for instance imagine trying to compile all known extant C++ code into one project! Disaster! :slight_smile:

I would love to include some examples of your work in the language benchmark suite I’m building, to make sure that the performance characteristics you rely on are maintained. That’s a discussion for another day.

And thanks again for your generous offer to take Hadron for a spin in your LC environment. I shall endeavour to make Hadron ready for such an experiment, and should the time come, will definitely look you up.

Put another way, code submissions are not expected to succesfully run as-is when submitted to the corpus. But they need to be working on your computer in order to be useful. Does that make sense?

Here you go:

Most of that works.

I hope things are well Lucile.

Sam

Ah ok, so my idea is premature as of now. Sure, no problem, will be happy to kick the tires when the time comes.

hjh

Thanks @Sam_Pluta, I’ve added both of these to the corpus. I am well, hope you are too! :slight_smile:

How cool! Count me in claude-collider/ClaudeCollider at main · jeremyruppel/claude-collider · GitHub

Are you only looking for ‘good’ code, because I’ve got some examples of bad/unusual code I’ve written based off the oddities I’ve seen in the lexer and parser source code?

I’d love to see those, although they’d probably be better as integration or unit tests for Hadron’s lexer and parser. Please do share!

I don’t have it in a repo, but here is some code

Foo {
	// unicode silently dropped
	*barαΩ { ^\bang }
}

Foo.bar == \bang;
// zero width space
" asdf\​".quote == "\" asdf​\""

(
var a = 2;
a​pow:4 == 16;
)
(
// whitespace (or anything non ASCII) allowed between ~ and identifier
~αΩa = 1;
~a == 1;
)

(
~    a = 5;
~a == 5;
)

(
~  
/* even */
// comments
a = 2;
~a == 2;
)
// setters
(
var a = ();
foo_(a, 1);
a.foo == 1;
)

(
var a = ( _: 1);
(a _: 10) == 1;
)

(
var a = ();
(foo_:)(a, 1);
a.foo == 1;
)
// weird stuff with radix
+ Integer {
	doesNotUnderstand { |selector ...args, kwargs|
		 ^args[0] 
	}
}

(
var b = 10;
// send the message 'pi' to 11ra1 (111).
11ra1pi:b == 10;
)

2r1pi:1 == 1;
0r + 1 == 1;
{ | a 0rbar 2| a + bar }.() == 2;
{|aΩ2cΩ3|ΩaΩ+ΩcΩ}.() == 5;
// hexi doesn't care about prefix
123451243xA == 0xA;
#  [ 1 ] == #[1];
// weirdness of backtick ref.
(
var a;
r = `a = 1;
a == 1;
)

(
var a;
r = `a = 1;
r.value == 1;
)

(
var a, b;
`a = b = 1;
(a == 1) && (b == 1)
)

(
var a, b;
`a = b = 1;
r.value == 1;
)

( 
var a, b, r;
r = `#a, b = [1, 2];
r.value == [1, 2]
)

( 
var a, b, r;
r = `#a, b = [1, 2];
(a == 1) && (b == 2);
)

(
var c1, c2, a1, a2;
c1 = CollStream();
c2 = CollStream();

(`1.pow(2)).storeOn(c1);
a1 = c1.collection;

( (`1).pow(2) ).storeOn(c2);
a2 = c2.collection;

a1 == a2 
)
(
var c = $
;
c.asInteger == Char.nl 
)

// Note, if you wrap this in brackets, this fails! There is NOT a space after the $
// Requires shift+enter
32.asAscii == $

(
Char.nl == $
)

// nl is defined in object
Char.nl.asString == nl;
// capped at 4, same for b
1ssssssssss == 1ssss;
// Can set interpreter vars in a closed function.
(
a = nil;
#{ a = 10 }.();
a == 10
)