Code Formatter Development for Supercollider

All,

Based on my past threads and the conversation with @madskjeldgaard on the Tree-Sitter thread, I have started work on creating a code formatter for SuperCollider code.

The idea is to function similar to the Black formatter for Python, the clang-format for C++ code, and gofmt for go code, where the formatter does not exist within an editor, but rather is its own command line process that can then be imported via plugin into an editor.

The organization of the code is fairly straightforward - the sclang_format script is a wrapper around the format functions defined in the format_rules, and it currently takes in both a code file and a treesitter language object. This will likely change in the future, but it’s good for now.

This doesn’t even work right now, but I’m going to be hacking on it and would love to get people involved to start the development. Hopefully it won’t take too long, but the devil is always in the details.

Repo is here:

5 Likes

Woohoo this is super exciting!

Added the formatting for the rule

Add spaces within curly brackets {}

Turns this:

( ( 1..3 ) collect: {|x| x + 1} bubble: 0         )
~rhA_dur = [        1, 1,1.5,0.25,1.75, 0.5, 0.5, 0.5, 0.5, 0.5         ];
( (         1..3) collect: { |x| x + 1 } bubble: 0         )
(( 1..3 ) collect: { |x| x + 1 } bubble: 0         )

into:

((1..3) collect: { |x| x + 1 } bubble: 0)
~rhA_dur = [1, 1,1.5,0.25,1.75, 0.5, 0.5, 0.5, 0.5, 0.5];
((1..3) collect: { |x| x + 1 } bubble: 0)
((1..3) collect: { |x| x + 1 } bubble: 0)

@madskjeldgaard - I noticed that the treesitter returns an error if an argument list if it has spaces in it. As in, the below returns an error when I try to get the tree:

((1..3) collect: { | x | x + 1 } bubble: 0)

I don’t think that’s necessarily wrong code, so I suspect something is wrong on my end. Do you expect that to parse ?

1 Like

Yes it’s a bug unfortunately.

1 Like

Okay cool, thanks! I’ll put a note in the code that this is a “to-fix” and put a link to the GitHub issue. Once it’s resolved, I can add the code check in.

Added boolean operator spacing.
Turns:

( ( 1..3 ) collect: {|x| x + 1} bubble: 0         )
~rhA_dur = [        1, 1,1.5,0.25,1.75, 0.5, 0.5, 0.5, 0.5, 0.5         ];
( (         1..3) collect:{ |x| x      +1 } bubble: 0         )
(( 1..3 ) collect: { |x| x+1 }bubble:0         )

into

((1..3) collect: { |x| x + 1 } bubble: 0)
~rhA_dur = [1, 1,1.5,0.25,1.75, 0.5, 0.5, 0.5, 0.5, 0.5];
((1..3) collect: { |x| x + 1 } bubble: 0)
((1..3) collect: { |x| x + 1 } bubble: 0)
2 Likes

Added handling for commas:

Now the above example turns into:

((1..3) collect: { |x| x + 1 } bubble: 0)
~rhA_dur = [1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5];
((1..3) collect: { |x| x + 1 } bubble: 0)
((1..3) collect: { |x| x + 1 } bubble: 0)

@madskjeldgaard - it looks like at this point, the low-hanging and proof-of-concept stuff is done and the harder stuff, such as indentation/etc., whitespace between blocks/etc. is now up for debate.

I was going to start from the indentation logic within the SC codebase, but I know there are some errors in there. I’d like to have a discussion with someone to start formalizing a ruleset and thinking about edge cases.

Stuff like:

  • What happens when we hit maximum width ? Should maximum width be 80 characters ? I think this is the absolute hardest one, as it may require a few passes to smooth out the text, and it’s going to require some understanding of the structure and when things should be broken apart or joined togther.
  • Are tabs really the best character to use for indentation ?
  • What really counts as a new level of indentation ?
  • Is there anything we can take from existing Smalltalk editors ?
  • Are there cases (like with argument lists) where deprecated code should be replaced in favor of the preferred approach ?

This is a hard one in a technical sense - but, this has never been a feature of sc code formatting before, and all major IDE’s have some kind of line wrap functionality, so I think the priority is quite low relative to everything else. Probably something to keep in mind while implementing, but definitely not lose sleep over now.

Because of, uh, people’s strong feeling about this, I think both spaces and tabs would have to be an option. But this should be an utterly minor code difference.

There are some solid precedents and probably only a minority of cases that could be considered controversial based on sclang norms in the classlib etc. I’ve got some examples from a previous attempt, I can try to find and post them tomorrow.

Unlikely - sclang’s syntax is still quite different from other varieties of smalltalk - if anything, sclang is ALMOST a sub/superset of something like javascript or typescript, so there might be more to be gained by stealing from code formatters on that side.

Any substantive change to the actual parsed result feels like it’s a HARD problem - maybe something to experiment with, but probably not put anywhere near an “officlal” formatter. These kinds of “code fixing” linters can be occasionally frustrating even in languages like Python, and those have had 1000x better support and dev entery than we’ll ever get…

Awesome - thanks for the comments!

RE: Maximum Width
I feel like maximum width is a feature that code-formatters generally adopt. Clang-Format has it, Black has it, and while gofmt doesn’t, someone wrote a tool to do it. I think the reasoning is strong (Cleaner go code with golines | Benjamin Yolken) and in my professional life, having a consistent coding format, including line length, allows for easier reading and comprehension of a codebase (especially during code reviews). I’ll fiddle with it, but I think it’s going to be the last thing to tackle, after the remaining technical problems are sorted out.

RE: Spacing Characters
I actually did already put a tab/spaces option in the script, but I think that I may just remove that and add tabs because of the coding guidelines already in place. Can’t have a disagreement if there’s no option to have a disagreement, right ? :slight_smile:

RE: Indentation
Thank you! I definitely want to see what those precedents look like, and what the potential controversial pieces of code are.

RE: Code Fixing
Well said - perhaps that can be marked as a warning or some other sort of warning about ‘non-standard’ language features.

1 Like

Which python dependencies does this have ?

Should just be treesitter, argparse, and logging.

I’ll create a pipenv and a shell script to run it this morning.

1 Like

Added a pip file and a shell script to run.
Looks like the two dependencies are argparse and tree_sitter

1 Like

Got a proof of concept working in vim - click the link below for an animation of writing a file and the formatter being applied.

This is just a hack on top of vim-filetype-formatter, but it works. I’ll make a self-contained plugin in the flavor of black-vim by the time this is done.

3 Likes

@scztt - when you’ve got time, want to pick your brain about the work you did previously. Would like to know what the indentation cases you enumerated are and the issues you faced.

Woohoo that’s really great!

1 Like

Today I added formatting for assignment operators (putting a space before and after) and normalizing argument lists to the pipe format with spaces.

I think the case of how to indent long arrays of data is going to be a bit of a problem:

~long_data = [1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1.5, 0.25, 1.75, 0.5, 0.5, 0.5, 0.5, 0.5, ]

I don’t think the right answer is to allow lines to be of arbitrary length (I’m sticking to this for now!) but I also don’t think the right solution is to have each element on its own line, as you would with python.

My gut tells me that something along the lines of ‘two groups of 4’ would be easily digestible and cover most standard use cases. So something like this:

~long_data = [1,   1,   1.5, 0.25,   1.75, 0.5,  0.5,  0.5, 
              0.5, 0.5, 1,   1,      1.5,  0.25, 1.75, 0.5, 
....

But i also think that could cause some issues, and computing the offsets would be a pain in the butt. It may just be limiting the data to x characters and having it overflow onto the next line .Have either of you run into, or thought about this case, or is there a standard practice around it ?

I think a good place to start would be to let everything fall out of the two rules -

  1. Add a line break where the line gets longer than settings['maxLineLength'] characters (assuming some setting here…)
  2. Indent continued statements / open arrays 1 step from the indent level they were started at.

Doing array alignment is a great nice-to-have, but it’s a monster to figure out how to get it right…

~long_data = [1, 0000000.001, 0.01324234234, 2, 3, Rest(1), [ a, b, c ], 10.collect { |i| i.pow(3) } ]

If we wanted a truly useful formatter for array / matrix data, I would maybe consider going in the direction of something non-automatic and fairly configurable. For example, imagine if the formatting rule was:

  1. For a line with array data that has a specific comment after it, e.g. // align
  2. Remember the column position for every array element on that line…
  3. And align all future elements of that array to the column positions from step 2.

As an algorithm this is easy-ish to get right, and the programmer can basically set the column positions themselves to do things like “grouped” spacing as your example showed. Honestly this would be more powerful than formatters I’ve used in other languages, and wouldn’t annoy users by formatting things they don’t want formatted. Using a comment could open the door to having inline options as well - for example:

a = [
     1.0,      0.03,    0.005,   10.3,       // align(decimal)
     0.1,     10.0,   123.0,      0.0001
];
b = [
       4,      10,     100,       // align(right)
     123,       3,    2342 
];
1 Like

I would give anything if the same ruleset could also auto-format my Event’s like:

~event = (
    degree: [3, 5],                 // align(colon)
    octave: 3,
       amp: 0.6,
    finish: {
                ~legato = rrand(0.5, 2)
            },
  callback: {
                "note played".postln
            }
);
2 Likes