Help with preprocessor

I’d like to create an operator with the preprocessor, |> that does function piping, and I’m having a problem with line breaks in the code and how to break the string into expressions.

This is what I want the syntax to be…

foo |> _.mod(2),,, 2343); 

…and it should turn into…

[ _.mod(2),... etc ... ].inject(foo, {|n,f| f.(n)});

I’m really stuck with breaking the string up into expressions, so that …

foo |> 
     _.mod(2),,, 2343); 

… can be processed as one statement.

The challenge here is that the comma-separated expressions can be arbitrarily complex. The example assumes partial-application syntax, which restricts the expression to one and only one method call, but there is nothing to stop someone from writing a { |x| ... function } and then there is no limit on bracket depth.

My opinion is that there’s no real substitute for a suite of functions to scan through the main SC syntax elements: bracketed groups, string/symbol literals (mind escape characters!), comments. If you want to handle every kind of input a user can throw at it, eventually you’ll end up in pretty much this place.

CollStream is very helpful for keeping track of the position within the code string and passing this state around the various functions.


Ah, so there are no syntax parsing/lexer functions already available? I suppose this makes sense given we don’t have access to the AST.

What does CollStream do exactly? Is it supposed to be short for Collection Stream? (there is no help doc).

If you are defining custom syntax that differs from sclang syntax, then a parser for sclang will throw an error on the custom syntax. So IMO this is not a right direction to go.

As it is now, custom syntax = you’re on your own.

But it is possible, with some effort and practice, to write a parser in sclang classes that generates a syntax tree.

c = CollStream("Hello, World");
-> H
-> e

-> e  (I think, not at the computer now)

Different functions can consume characters from the same stream and advance through the code.


Thanks for the quick response!

Do you perhaps know of a Quark that generates a syntax tree? Would be really useful! Particular since all the behaviour I need is already being done by the language somewhere…

I’ve managed to get it to work - its a bit dumb and will break pretty easily…

this.preProcessor = {
	var openBracket =   { |c| [ ${, $(, $[ ].includes(c) };
	var closeBracket =  { |c| [ $}, $), $] ].includes(c) };
	var funcSeperator = { |c| [ $,, $;]     .includes(c) };
	var parseSingleInstance = {
		|in, operator|
		var operatorEnd = operator + 2; // given size of operator "|>"
		var operatorBegin = operator - 1; // white space before
		var operandStart = {
			// go back until second space from operator, begining of line, opening bracket, or function seperator
			var searching = true;
			var pos = operatorBegin - 2; 
			var result;
			while {searching} {
				if( (in[pos] == $ ) || (pos <= 0) || openBracket.(in[pos]) || funcSeperator.(in[pos]), 
					{ result = pos; searching = false; },
					{ pos =  pos - 1; })
			max(result, 0)
		var preOperand = if(operandStart > 0, {in[0..operandStart]}, {""});
		var operand = in[(operandStart+1)..operatorBegin];
		var funcSpecEnd = {
			var pos = operatorEnd;
			var indent = 0;
			var searching = true;
			// go until, end of expression or indent level goes negative
			while { (indent >= 0) && searching && (pos < in.size) } {
				indent = case( 
					{ openBracket.(in[pos])  }, { indent + 1 },
					{ closeBracket.(in[pos]) }, { indent - 1 },
					{ indent }
					{(indent == 0) && (in[pos] == $;)}, { searching = false },
					{ indent < 0 }, { searching = false; pos = pos - 1 },
					{ pos = pos + 1 }
			if(in[pos] == $;, pos - 1, pos);
		var funcSpecWithoutSemi = if(in[funcSpecEnd] == $;, funcSpecEnd - 1, funcSpecEnd);
		var funcSpec = in[operatorEnd..funcSpecWithoutSemi];
		var posFuncSpec = if(funcSpecEnd != (in.size - 1), {in[(funcSpecEnd+1)..(in.size-1)]}, {""});
		var replacement = format("[%].inject(%, {|n,f| f.(n)})", funcSpec.asString, operand);
		format("% % %", preOperand, replacement, posFuncSpec).stripWhiteSpace;
	var str = code.stripWhiteSpace;
	var hasOperatorsLeft = str.find("|>");
	while {hasOperatorsLeft.isNil.not} {
		str = parseSingleInstance.(str, hasOperatorsLeft);
		hasOperatorsLeft = str.find("|>");

But stuff like this now works, which I think is a much more musical way to think about using UGens - also avoids pre-declaring variables and too many variables that won’t be used.

SynthDef(\pipeTest, {
	var sig = ( 200 |>,, 350),, 50),, thresh: -15.dbamp, slopeAbove: 1/4),,, 1))
	);, sig);

You can also write normal functions in there …


x = 1;

f = {|v| v / 2 };

j = x |>  {|v| "J"; v}, _ + 1,  _ * 2,  {|a| a*1; a*[9, 2, 4, 2, 14]; {{}}; a + 1},  f.(_);

y = (x |> _+1, 
	{|v| "Y"; v},
	_ * 2, 
	{|a| a*1; a*[9, 2, 4, 2, 14]; {{}}; a + 1}, 
) * 2;

h = { |v| v |> _ - 1, _ / 2 };

z = x |> _ * 100, _ + y, h;

[x, j, y, z].asString.warn; // WARNING: [ 1, 2.5, 5.0, 52.0 ]

g = { 200 |>, _ * -15.dbamp }.play;


… but if you mention "|>" in a string it will break - also putting comments in just brackets or stuff will just break everything, so will having more complex operands (it really needs to be a variable name and nothing else).

Anyway, because of the weird edge cases that I can’t be bothered to fix, I’m gonna stick with a function call, which is a shame.

~pipe = { |operand ...funcs| funcs.inject(operand, {|n,f| f.(n)}) };

This is true. In practice, if you’re going to insert nonstandard syntax in the middle of standard syntax, then your scanner needs to be aware of all syntactic elements that contain arbitrary characters as data rather than operations. Those are: comments (two types) and character/string/symbol literals.

  • IIRC, /* comments are /* nestable */ */ so these comments need to be handled in a recursive function.
  • Literals don’t nest, but they need to handle escaped characters.

These are solvable problems, but I suspect what often happens is that one starts with an idea that it should be “easy” to convert xyz syntax, and then discover that xyz syntax needs to be ignored in some contexts. So robust solutions haven’t entered into SC culture.

Your idea about a quark to provide some parsing functions is not a bad one. At present, I’m not sure I have time to do it myself (I imagine maintenance could be a bit time intensive; also it would duplicate some of the functionality of a project of lnihlen’s, though that project wouldn’t support nonstandard syntax).

However… what about something like this? (Not tested, but it might work.)

+ Object {
    |> { |functionArray|
        ^functionArray.inject(this, { |n, f| f.(n) });

Then you could write foo |> [_.mod(2),,, 2343)]; – the only difference being the brackets.


1 Like

My dumber solution is to define +Function { => {|a| ^a.(this)}}

So I write

{,500) => ,0,0.1) }.play

the gotcha is that you can’t call methods on the last item in the chain, you need for example to write:

4 => (_ + 5) => _.postln

but totally get your motivation, there really needs to be a good way to write flows L->R, T->Bottom rather than inner->outer other than littering mental space with variable names

1 Like

Your “littering” = my self-documenting code.

In seriousness… descriptive variable names in a SynthDef are often a complete replacement for comments.


sure, but most of the time these stages are apart of a multistep process, say I just wanna correct the sound…

var sigA = ...
var filter1 = ...
var filter2 = ...
var comp1 = ...

All I really want to know is the name of the corrected sound and how it was corrected, naming each stage just makes things harder to read as it invites the users to pull out one of the intermediary stages and use that. Sure you could do…

var sigA = {
   var filter1...

… but the immediately invoked function is quite a lot of a new comer and takes a lot of time to write for little benefit.

1 Like

This works!! I didn’t realise you could define your own operators? I mean I guess that makes sense with smalltalk(?) Or is |> already an operator, I was looking for a list and couldn’t find one.
Thank you!

1 Like

I think the way it works is that any method can be used as an operator as long as its followed by a colon →

5 rrand: 8

but if you use only some subset of non-alphanumeric characters, (its in the help somewhere!) then your method can only be used as an operator (and needs no colon)

1 Like

So I think this comes down to two options - excluding writing some quark to parse code properly.


	var res = [200,220] 
	|>, 350)
	|>	( _ * (-15.dbamp) )            // these extra brackets are necessary 
	|>	( _ *[2,2.1]) );


	var res = [200,220] |> [,, 350),
		_ * -15.dbamp,
		_ *[2,2.1])

On the one hand, James’ has auto indent and less brackets, but Michael’s functions similar to how other languages use the pipe operator (F#, javascript, Elixir… etc). I think Michael’s is superior if you don’t need the extra brackets, otherwise James’ is better. So I’m just gonna add one using |> and another using =>>.
Thank you both, never knew you could abuse operators like this.

just FYI you can simplify slightly: multiplication etc works without needing to use the operator (on compilation the multiplication become a binary op Ugen blah blah). Also no need for the initial variable.

	|>, 350)
	*  -15.dbamp        

Ohh thanks! That gets a little harder to read in some places though.
I added the variable to make it easier to see how it might read in the context of other code.

This is quite nice :grin:

The thing that always threw me about chain-style usage in SC is multiple inputs. The ChucK language lets you do things like:

SinOsc a => dac;
110.0 => a.freq;
(some other ugen) => a.phase;

I haven’t actually used ChucK – I’m guessing there’s a phase input. In any case, the principle is that you name the parameter when establishing the connection.

A SC => as defined above allows one and only one connection. If you need more, as in a more complex UGen like TGrains, the only choices I can think of are:

// 1.
var graindur = ...;
var grainrate = ...;
var center = ...;
... etc. =>, _, bufnum, grainrate, center, graindur)

// 2. => { |trig|, trig, bufnum,
		... complex expression for grainrate...,
		... complex expression for center...,
		... complex expression for graindur...

… where the first might be questioned on the grounds of too much namespace clutter, and the second loses the whole point of chaining in the first place.

We’re a bit hamstrung in SC because we have to have all of the inputs before instantiating a UGen. A ChucK-style idiom would be prohibitively difficult to implement in the preprocessor. This might be one reason why my own SynthDef coding style has evolved in the other direction: toward more naming. We’ve got a couple of “no” votes on that in this thread, but my feeling (speaking just for myself) is that descriptive names, used consistently, patiently lay out the signal flow.


with an operator overload you can define an adverb to further modify your operator…

|> {|val, adverb|
    if (adverb == <whatever>) {
    } {

you can say things like

|>.a ...
|>.b ...
|>.x ...

so you wouldn’t have to define a whole other overload for a variation of your functionality

1 Like

In this case I wouldn’t use chaining - the first solution seems perfect but with the impulse declared as a variable. But if you wanted to send the result through a series of filters to ‘correct’ the sound or reduce the number of channels or pan it or some other simple adjustment then I think chaining begins to make sense.
If you were to add an LPF on the end of TGrains, would you create a new variable? and if there were several filters in a row, that all ultimately did the same thing. I honestly don’t think anything would be more descriptive than…

grains |>, 16000) |>, 45) |>, \  |>, _);

Assuming the reader knows what LPF, HPF and Pan2 do (they all have pretty good names already). As opposed to…,, 16000), 45), \;


var grains = ...
var grains_lpf =
var grains_hpf = // what would you even call this sound? grains_lpf_hpf?
var panned =,;, panned);

The other option would be the ability to declare variables anywhere, which would simplify somethings. Or to actually use mutable variables sig = ....; sig = f.(sig) .... , but this doesn’t meet your strongly named requirement, and all of the criticisms of piping would still apply, but you’d have to write an extra variable name that means nothing more that ‘the current sound I’m working on right now’ or ‘some signal’.

I don’t think multi argument piping should be done… but…

f = {|a,b,c| postf("a: %, b: %, c: %\n", a, b, c) };
1 |>  { |a| 
	3 |> { |b| 
		4 |> f.(a, b, _)

… which has a striking resemblance to something in haskell that they solved with do notation.

1 Like

That’s cool! I am far more familiar with c++ so I would have never thought of doing this.

Tbh I do exactly this, but pretty much always for sig (taking it as an informal convention whereby sig is always the main signal path). A separate variable for each link in a chain of filters would be needlessly pedantic, agreed about that.