A more structured approach to DSLs

DSL are implemented with the preprocessor operating directly on the text of the code. Languages are structured text. By manipulating the text through code, the structure is lost. This means things like syntax highlighting, formatting and indentation, documentation look, auto complete, and others, will fail. It is also currently very difficult to have multiple DSL at once.

I’m proposing we should make DSLs structured, a token in the parser that the tooling doesn’t look into (like a string), or when an appropriate DSL class is given, can be used to highlight and validate the DSL code.

Here are some initial ideas how this could look syntactically.

// DSL class
DrumL {
   highlight {}
   parse {}
}
// delimited
DrumL #( ... dsl code ... )#

// until semi colon
DrumL #> .... dsl code ....;

The dsl code would basically be a string.

var str = #> ... dsl code ...;
str.postln;
DrumL(str)

Some other ideas for DSL delimiters…

DSLRegister(1, DrumL);
DSLRegister(2, SomeOtherLanguage);

#1 ... dsl code ... ;
~result_of_drum_l = #1 ...dsl code...;

~result_of_someother = #2 ... dsl code ...;


// multiline
#1 ... dsl code ... 
... more dsl code ...
... etc. ...
;

// or 
#1 ... dsl code ... 
#1 ... more dsl code ...
#1 ... etc. ...
;

@jamshark70 has already raised the point about brevity and need to be able to type this quickly, is this new suggestion better? If people have other ideas, do say!


I don’t want to talk about parsing/DSL implementation here, only how it can be scoped. Later it is definitively worth discussing LSP & IDE integration, but lets wait on that.

2 Likes

Great idea.

I think the first suggestion: DrumL #( ... )# seems quick to type: the class will autocomplete (and the delimiters could autopair). Seems nicer than ; termination as we can write ~this = DrumL #( ...)#;. It is also nice to be able to jump to the implementation (as opposed to the #1 idea).

another idea might be to define the languages as methods of a DS class: DS.drum #(...)#

final thought re: the registration idea: I wonder if there would be any use to being able to define DSLs as defs like `DSLdef(\drumx, {…}) or is the code simply likely to be too long for this to be practical?

This sounds awesome.

I’m in favor of brevity almost at all costs, so the first multiline example looks better to me.

Maybe also possible to use an end delimiter with the briefer syntax? That way the DSL can use semicolons freely and easier to mix in

#1 ... Code ... ;

~something = code ... #1( ... )# more code;

~other = #1(

)#

Or, probably more difficult to implement? But I think more fun to use: Let the user define their own starting and ending delimiters

DrumL.register("\\", "$");

~something = whatever.(\\ ... DSL ... $, anotherArg)

Using binary operators like this won’t work because the lexer (unlike the parser) is stateless and can’t look at the rest of the code to figure out if it is a dsl string, and therefore a delimiter, or normal code, and therefore a binary operator.

The reason I suggested #1 is because it is completely invalid syntax right now, which means there is a nice little hole where this functionality can slot right in.


Interesting, as when I suggested this to @jamshark70 he said it was too verbose. Is this is a nvim/scide split?


DSLRegister(1, DrumL);
~something = code ... #1( ... )# more code;

I like this. The hashes are ugly, but it is the only whole in the syntax I can think of right now.

I should also mention I don’t do live coding, so getting peoples perspectives before making a change like this is really important.

quick Q - do you imagine any symbol being able to be registered? ie DSLRegister(\dl, DrumL) then #dl( ... )# – I just know I am not going to remember which DSL I am associating with which number!

re typing speed I think autocompletion shoul work equally efficiently in the ide?

So I only suggested the numbers as a way to save typing.

If you wanted a name and don’t mind typing it, then DrumL#( ... )# should work. Theses two systems could live side by side.

One note, #dl is already valid syntax… what isn’t valid (and therefore something this could use) is #dl(, without spaces. This isn’t too much of a problem, but might confuse people as #dl ( ... )# will fail.

I think the best approach for my usage would be to use a token to mark a region within the document, which can contain multiple statements. Those statements should be possible to execute all as a group, or individually, e.g.

// standard SC syntax here

#(
hihat = "\fork(" \ins("-", 1..2, 1)|||")::\ins(".", 8, 0.5)";

// increase activity
hihat = "\fork(" \ins("-", 1..2, 1)|||")::\ins(".", 8, 0.5)::\ins(".", 3..5, 0.25)";

kick = "oooo";
)#

Marking every separate DSL statement with a multi-character token is, I suspect, one of those things where you think “that’s not too bad” but then when you’re on stage, it’s likely to get old fast. Starting from an empty document, it takes a good 10 minutes to get a reasonable texture going. Time moves really fast on stage. Currently I’ve got /xyz = "something" – a single keystroke to announce nonstandard syntax. Turning that into shift-3-releaseShift-1-space will be a drag – avoiding repetition would be important for live coders.

Within delimiters, the single keystroke goes away! But only if the area within delimiters is not necessarily treated as a code block. (If there’s a class for the DSL, the class could be responsible for handling the difference between “evaluate selection or line” and “evaluate selection, line or region.”) I just mention that because there seemed to be an assumption that there should be a delimiter per statement. Doesn’t have to be. The delimiters could just mark a region that the IDE tokenizer would pass to user code.

None of the above discounts the idea of using the delimiters inline within standard SC syntax, btw.

I also note the irony that the SC book ed. 2 literally just came out, with my new chapter on the preprocessor, and within two months of it hitting stores, we’re discussing how to invalidate that chapter! (Not fully invalidate it – the parsing tips in it would still be valid – but it’s a bit amusing nonetheless.)

hjh

I second this: it should be possible to easily delimit an entire document and then execute parts of it

for example, I think it would be cool to do

#(

stuff stuff stuff
more different things
all execute together as a block

these other things
all execute together
as a different block

)#

and be able to execute each line separately with shift-enter, or use a special keystroke (say alt-enter) to execute the blocks separated by empty lines, maybe shift-alt-enter to execute the whole block between #( and )#

Question that just occurs to me, looking at this: what if you want SC code embedded in your DSL? Like for example you want a quicker synthdef,

#(sd \mysynth
  -> mic(0)
  -> pedal(1)
  var mix = mic + pedal;
  Pan2.ar(mix, \pan.kr(0))
)#

how does the highlighter know that some of this should be interpreted as SC code (and autocomplete etc)? or does it default to SC highlighting unless told otherwise?

because it would be annoying if say

#(
../.(\\.'..2. Cwd.i 8
  / (  .'..2.   R.i   .
)#

were treated as sclang, for me it’s worse if everything between delimiters is always assumed to be something else.

(I’m also now thinking about automatic tabbing etc which could get really bad for multiline custom syntax, but I would miss if it’s just never available between delimiters)

Do you have ideas about this sort of issue?

This means the ide needs to keep track of where these blocks begin and end, that’s okay. What won’t work here is it you have multiple dsl.

I suggested DLang # ...code... #; because it includes the language interpreter and code all in one statement. The ide needs both to be able to check the dsl is valid, and the language is extended similar in a similar way to trailing block syntax. This means you could also do

DLang(...) # ... #.something(...);
//Similar to
fork { ... }. something 

Thinking about it… we could just use DLang # .... dsl code ... #, so long as there are spaces after the hash.

We could have a default dsl that you can register, this would achieve what you want.


It can’t. The point I’m suggesting is to resolve this by placing clear boundaries around what is and isn’t a dsl. Everything inside the delimiters is the dsl. Wherever the dsl decides to allow certain parts to be SC expression is up to it.

Dsl code would be like a string, your asking, what of we want numbers inside the string.

Use snippets and other auto complete solutions. Type ‘sd’ and have the whole structure appear.


By having strict boundaries around what is and isn’t a dsl is going to be less flexible than using the preprocessor, this isn’t about removing the preprocessor, just taking the key parts of it’s behaviour and making them safe, easy to use, and work well with language tooling… heck, we could even do auto complete in the dsl and jump to documentation.

1 Like

It could be if it’s, e.g.:

#Cll(

// then inside here doesn't need any prefixes at all
// so the typing inconvenience disappears

)#

I’d also argue against nesting these tags – too many permutations.

Hm, so we have contradictory requirements: the ability to embed custom syntax into expressions that are otherwise standard SC, and the ability to write whole custom expressions without having to tag each one individually. That’s a design problem that will take some care.

I understand the reason for this suggestion, but again, it’s quite important for my use case that it not be necessary to tag each and every individual expression, so I think I wouldn’t use this approach for myself.

hjh

orgmode has something like this - put sc code between delimiters #+BEGIN_SRC supercollider and #+END_SRC - and it is highlighted correctly and scnvim will interpret it… (indentation doesn’t quite work sadly…)

…as it is being able to send only lines, selections or blocks is suboptimal…

In scnvim I have added alongside sendBlock and sendLineOrSelection commands to send certain objects - for example anything constructed by my P class…

P.method( 
   lots of nested code here
   ....
)

so a distinct command to send DSL block seems helpful!

re: sending blocks within a DSL block - I would be afraid to add too many new commands - but a command to select a paragraph (which could later be sent) - if this is not already part of the IDE! - might be useful anyhow! (In nvim all of these kinds of things are more or less built in - you can select paragraphs - or with tree-sitter syntactically meaninful regions…)

!!

could folks embed Tidal code?

blocks of Lua or Python ?

also pinging @davidgranstrom who might have a thought about making sure that this would play nice with scnvim…

1 Like

Yes you’re right those might be better solutions to that case. I do still think it’s reasonable to want bits of synthesis code (for example) inside a DSL for whatever reason. Which as you say is up to the DSL to evaluate correctly, but if there’s no easy way to highlight / autocomplete it correctly then it’s not as useful.

1 Like

Quick note about splices since we’re borrowing from quasiquotation - they’re basically the holes where you stick dynamic values into your templates.

In Haskell, quasiquotation looks like:

[sql| SELECT * FROM users WHERE age > 21 |]
[json| {"name": "synth", "freq": 440} |]

In SuperCollider, quasiquotation could look like this:

#drum( 
    kick = "o o o o"
    hihat = "- ${~density.choose} - -"  // <-- unquote
)#

The ${...} bit gets evaluated and pasted into your pattern. It’s what makes these templates actually useful instead of static strings.

@jamshark70 - unquotes/splices might actually help with typing speed! You could prep templates and just change the dynamic bits:

~beat = { |vel| 
    #drum( kick = "o ${vel} o o" )# 
};
~beat.("O");  // quick changes during performance

On @Eric_Sluyter’s concerns: unquote/splices may be the answer, making the boundaries explicit:

#synthdef(
    name: coolSynth
    signal: ${  // SC code starts here
        var sig = SinOsc.ar(freq);
        sig * EnvGen.kr(Env.perc, doneAction: 2)
    }
)#

On implementation, the parser would need to:

  • Find ${...} patterns
  • Parse the SC expression inside
  • Decide when to evaluate (compile time vs runtime sort of thing)

Just thinking out loud - unquotes/splices seem pretty clever to make this whole thing work.

EDIT: For reference on quasiquotation and its terminology: https://docs.racket-lang.org/reference/quasiquote.html

The key insight from Racket is that quasiquotation is really about controlled evaluation—you’re building data structures where some parts are literal and others are computed.

For SuperCollider’s DSLs, this suggests:

  • Clear rules about what’s evaluated when
  • Consistent splice positions across all DSLs
  • Optimization opportunities for static parts (to save memory)
  • Error messages that show which quote level you’re in

The Racket documentation also shows why unquote outside quasiquote is an error - SC should similarly detect ${...} outside a #dsl()#

Hmm, this is an f string right?

DrumL d"o x o { "ox".choose }"

This is probably quite easy to implement as it is just an array of strings… ["o x o ", "\"ox\".choose" ], where odd indices are to be evaluated.

Perhaps it is better to just use ‘d-strings’ rather than something with a hash, or other special character?

The reason why I’m hesitant about evaluating sections of the dsl rather than the whole thing, is because that’s an ide feature, not a language one and every ide would have to implement this in their own way.

1 Like

You make a good point, and it makes sense. Just one limitation: it works with string-like DSLs, great for drum patterns, but what about more structured DSLs? I think both approaches are valid for different contexts.

EDIT: I think many languages implement that idea, isn’t it called string interpolation? See Ruby, which has a nice syntax for that: String interpolation | Ruby for Beginners

 "Interpolation works in double-quoted strings: #{1 + 2}."

After letting this digest for a while, I think we have a few features here.

String interpolation

f"1 2 3 { 2 + 2 } 4 5"

I think using pythons syntax here is a good idea as it is familiar to many. It is technically a breaking change because it will make the symbol f" a token in the lexer, when it would have been a variable identifier and a string delimiter.

Interpolating DSL strings

Basically a cross between string interpolation and a block, will having ‘trailing dsl string’ syntax, like trailing block.

MyLang d" ... dsl code goes { "h" + "ere" } ... ";

MyLang(*someArgs) d" ... code ...";

This means ‘{’ and ’ " ’ can be a part of the language, unless escaped by a forward slash. So maybe we could do something else, options have been discussed above, but I don’t mind this.

DSL regions

This is an IDE feature. This means each one has to implement it themselves. I think magic comments with C#'s region specifier should be used for this.

//! dsl begin MyLang

Blah blah
Blah

//! dsl end

These cannot be nested and cannot be included other SC code, as it isn’t SC code.