Code Formatter Development for Supercollider

Personally I would much prefer a code formatter that makes my code look less like western notation than before :stuck_out_tongue: On a more serious note, I think there are distinct scenarios for both ways of writing where one of them may be more appropriate (Iā€™m thinking fractions versus decimal numerals), and favoring one over the other is introducing unecessary bias that doesnā€™t make for more consistent or easily readable code (as opposed to ā€œproductiveā€ bias when e.g. choosing spaces over tabs).

1 Like

Works for me! It was a potentially contentious thought and it changes things but I figured I would float it.

Iā€™m leaning towards the idea of just treating the arrays as data and expecting the developer to use tools on their side to give it semantic meaning.

Most of the code I see doesnā€™t give formatting of the numbers in arrays any thought and thereā€™s so many possibilities underlining what the data means that Iā€™m leaning towards it being a foolā€™s errand to try and handle it

Iā€™ll write it with that in mind and see if it ends up being gibberish.

Great to hear. Plaese feel free to post issues to the tree-sitter-supercollider repo if you find any bugs in the parsing. Thereā€™s still edge cases that might not be covered correctly.

And yeah just generally - treesitter is such a cool tool!

2 Likes

This should roughly do it:

(
~unformatted = "{currentEnvironment.keysValuesDo({ |key, value| key.postln; value.postln }) }";
~formatted = "{currentEnvironment.keysValuesDo({ |key, value| key.postln; value.postln; }) }";

~deepCompare = {
	|a, b, depth=0|
	"comparing a=%, b=%".format(a, b).postln;
	
	if (depth > 1000) {
		Exception("Recursion problem...").throw;
	};
	
	if (a.class != b.class) {
		false
	} {
		if (a.slotSize == 0) {
			(a == b)
		} {
			(0..(a.slotSize - 1)).detect({
				|slotIndex|
				if (a.slotKey(slotIndex).postln == \sourceCode) {
					false
				} {
					~deepCompare.(a.slotAt(slotIndex), b.slotAt(slotIndex), depth+1).not
				}
			}).isNil
		}
	}
};

~deepCompare.(
	~unformatted.compile.def,
	~formatted.compile.def
).postln;
)

Basically, compile your two source codes and then do a recursive comparison of the FunctionDef, considering every slot EXCEPT for sourceCode (since obviously this differs). I believe any difference here should be considered a possible functional difference, and this should be off-limits for the formatter. Recursion might be a problem here, Iā€™m not sure - if the recursion depth error gets hit, this might need some amendments to resolve the case of nested data structures.

Itā€™s worth noting that changing rationals to floats or vice versa does, as James suggested, violate the ā€œno functional changesā€ rule. This also catches changes like e.g. | argument=1 | and | argument=(1) | which are also not functionally equivalent because of sclang obscurities.

Thanks! That should be a good place to start.

Floats to rationals makes sense that it changes the underlying calculation, but itā€™s really surprising that ā€œ1ā€ is different than ā€œ(1)ā€.

With that in mind, do you know offhand if there is any difference between:

ā€œ{}ā€ and ā€œ({})ā€ and ā€œarg x =1;ā€ and ā€œ|x=1|ā€ or ā€œarg xā€ and ā€œ|x|ā€ ?

I doubt those last things are different, but the real test is just to compare the FunctionDefs.

The difference between a=something and a=(something) in an arg or var is - the former (no parentheses) is only for primitive values, float, int, bool etc - these are stored as part of the FunctionDef itself (as a prototype Frame). The latter (parentheses) actually constructs the value at runtime when the function is called. Since Object-like things have constructors that need to be called, this is the only way you can have non-primitive default values.

Thereā€™s also a difference between these two for arg specifically, related to how nil arguments are handled:

({ |a=10| a.postln }).(nil); // nil
({ |a=(10)| a.postln }).(nil) // 10

This is a rather unlovely part of the language, butā€¦ it is what it is I guess.

2 Likes

Thatā€™s wild! Every language has its quirks. :slight_smile:
Cue the mind blown gif.

Google ā€œpython mutable default argumentsā€ if you want to see something even more horrifyingā€¦

1 Like

Forgive me if this a bit blurry - at a bar thinking about this after a long week with work. Iā€™m also typing this on my phone for my review later, so feel free to ignore this post. :slight_smile:

My m.o. for each of the rules so far is to do a query and then process the results of the query in reverse order so that the string can be processed in place without having to rebuild the tree.

Tree indenting in this code will happen after all of the other formatting rules have happened, so we can safely assume all spacing, normalization, etc. has already happened.

If we are processing for indenting then we need to do a first pass of the tree and pull out each of the code blocks with their ā€œlevelā€ attached, and then as we process each of the elements in a code block that are on the same level, we can tab indent them to the appropriate level, and if we hit another code block, we handle that recursively.

I think itā€™s fair to say that if we blow out the stack that the code is too complicated and we can handle a crash.

For each of the code blocks, I think the following should apply:

  • If the code block is a single line that goes over the 80 character limit, we know that itā€™s going to be broken up, so we need to mark that itā€™s going to be bounded by parenthesis after the transformation.
  • If the code block is a single line that does not go over the 80 character limit, then we wonā€™t touch it. The existing transformations are sufficient.
  • If the code block goes over the 80 character limit and somehow isnā€™t bound by parenthesis, weā€™ll mark it as needing to be bound by parenthesis. (I have no idea how likely this is.)

Non-Class Code:

  • Arrays that go over the limit will be split as close to the 80 character limit as possible, with the next line(s) having an extra level of indentation. There is no accommodation given to meter and no data within the array is changed.

  • Multiple line functions are broken out by statement with the parameter list on the same level as the function definition with one statement each on subsequent lines. Functions are formatted K&R style and this should be maintained.

Need to continue to think and will update as I work my way through - any thoughts about non class cases that need to be handled ?

Why do we need to introduce parentheses? A newline and a space are semantically equivalent in sclang - adding a newline doesnā€™t require any extra parens to preserve the old meaning of the code.

Maybe a minor point, but ā€” it seems like adding a line break because of the column limit shouldnā€™t be a special case or require any special tracking - it should behave exactly the same as if the programmer had introduced a line break at that point themselves. ā€œthe next line having an extra level of indentationā€ is just the same extra level that would would get if any single statement in sclang is broken up across multiple lines - this should fall naturally out of the indentation algorithm, right?

I believe indentation can basically be handled in a single pass top-to-bottom - so, you SHOULD be able to have an algorithm that looks like:

  1. Apply indentation to the current line, based on the current indentation stack depth.
  2. If the lineLength > columnLimit, insert a \n at an appropriate place.
  3. Continue to the next line and start again at #1 (if we added a \n, that partial line will be the next line)

I would expect that this would be identical for class and non-class source code

Iā€™m not sure the ā€œone statement per lineā€ really makes sense in the context of sclang (but maybe Iā€™m misunderstanding you?) - itā€™s common to have functions with a single top-level statement that are nonetheless 50+ lines long.

For the parenthesis, I was thinking in terms of the editor and making sure that if you were able to evaluate the block as one line previously, then if it were split across multiple lines, you could still evaluate with the same keystroke.

i.e.

{....}

to

({
  .....
})

Perhaps thatā€™s foolish, but part of the exercise is to make editing easier, and with a live-coding language like this, I feel like we need to keep the editor in mind, and splitting up lines has always been a bugbear of mine when Iā€™m tinkering.

Theoretically - yes. I wanted to make sure I mentioned it, though. I was thinking out loud, specifcally about continuation lines and how different formatters manage that.
I was reviewing Drew Devaultā€™s ~sircmpwn/cstyle - C style guide - sourcehut git and I was thinking specifically about the continuation section and this piece:

if (is_valid_foobar(possible_foobar)
                && get_number_of_foobars(possible_foobar) > 10
                && is_application_ready()) {
        /* Do work */
}
int
do_foobar_work(struct foobar *foo, struct foobar *bar,
                struct foobar *baz, struct foobar *bet)
{
        /* Do work */
}
Incorrect:

if (is_valid_foobar(possible_foobar)
        && get_number_of_foobars(possible_foobar) > 10
        && is_application_ready()) {
        /* Do work */
}
int
do_foobar_work(struct foobar *foo, struct foobar *bar,
        struct foobar *baz, struct foobar *bet)
{
        /* Do work */
}
  • This visually distinguishes new scope from its parent.

I think that makes a lot of sense, but I want to see how it works in practice. I was making that note more for myself.

The problem with doing top down is that as soon as you insert something into the text, the tree and its underlying text become disjoint. We donā€™t want to rebuild the tree after every operation, so the reason why Iā€™m going back to front is to maintain the ā€˜start positionā€™ in every node and maintain the semantics around what each node means. Otherwise, yes, that point is true.

I was thinking logical statement. So a 1 + 1; 2 +2; 3 + 3; would be split onto multiple lines. A gigantic megasynth like

{0.1*Resonz.ar(GravityGrid.ar(Impulse.kr(LFNoise0.kr([0.2,0.3],70,80)),[11.2,12.5], LFNoise0.kr(5.4,0.4),LFNoise0.kr(10,0.8), b.bufnum),SinOsc.ar(0.03,0,300,500),SinOsc.ar(0.03,0,0.3,0.4))}.play

Iā€™m still not exactly sure how I want to handle, if thatā€™s what youā€™re referring to.

Those big one-liners are somewhat critical to development of SC code, but can get unweildy both to think about and to write - any thoughts on how we can go about making those easier to read, edit, and evaluate ?

Did some transforms by hand here, applying some of the rules that I came up with.

I think the rules of ā€œdouble indentingā€ make some of those longer lines really legible.

{ 0.1 * Resonz.ar(GravityGrid.ar(Impulse.kr(LFNoise0.kr([0.2, 0.3], 70, 80)), [11.2, 12.5], LFNoise0.kr(5.4, 0.4), LFNoise0.kr(10, 0.8), b.bufnum), SinOsc.ar(0.03, 0, 300, 500), SinOsc.ar(0.03, 0, 0.3, 0.4)) }.play;

becomes:

({ 0.1 
   * Resonz.ar(
      GravityGrid.ar(
          Impulse.kr(LFNoise0.kr([0.2, 0.3], 70, 80)), 
          [11.2, 12.5], 
          LFNoise0.kr(5.4, 0.4), 
          LFNoise0.kr(10, 0.8), 
          b.bufnum), 
      SinOsc.ar(0.03, 0, 300, 500), 
      SinOsc.ar(0.03, 0, 0.3, 0.4)
  )}.play;
)

An idea: It would be really cool if especially the more controversial rules but in reality all rules could be toggled on/off or configured in a config file local to the project and / or global to the system it is being used on. Would allow many styles and per project specificities.

Not sure if this was mentioned already but thought it worth mentioning here

I want to avoid that for now. I feel like if we can get to a place where itā€™s 95 percent good enough for everyone, the benefit of having code thatā€™s consistent regardless of who wrote it will be better for the community than allowing for ideosyncracies. Itā€™ll also greatly simplify development.

I may be proven wrong in the future, but Iā€™m still working with Black as a model for the design in mind.

Thanks for this!
Do you intend to implement this feature for SC-IDE later? I often forget a white space before a closing curly brace, after a comma, and before and after operators. So, it would be practical to have a code formatter also in SC-IDE since SC-IDE only correct indentation.

Makes sense.
I would gently caution against premature optimization here (just in case you happen to find yourself deep in a rabbit-hole while solving this problem). Pragmatically - when I connect this formatter to the sclang language server, it will almost for sure re-format the entire file each time. Building a pipeline for diff-based incremental reformatting would bring some relatively heavy-weight technical requirements, and I simply donā€™t think this would be a priority for a while, if ever (clang-format does not run incrementally though it has a much harder job, and the user experience for C++ formatting is totally acceptable). I would hate to have you pull off an incredible feat of engineering to have ultra-fast incremental updates, only to not use it. :slight_smile:.
Also worth checking how much work TreeSitter is doing here - parsing is more expensive than formatting, and I thought I recall one of the benefits of TreeSitter is that fast incremental updates are somewhat built in?

Yes, those are more the cases Iā€™m thinking about. Single-statement SynthDefs are not really my style, but some people write this way (and, I guess I do as well for certain kinds of cases). The more common one is using a single long statement when building GUIā€™s declaratively. For example, these are all one statement, and the code would be pretty disrupted if the line breaks were changed in substantial ways:

Oh these are great examples ! Thanks. Iā€™ll take a gander at those today.

Wrt premature optimization - not going too far down that rabbit hole, but I tend to think in terms of ā€œdoing less workā€ bc my job requires me to write efficient cpp. The treesitter is great and super fast, but if you can do the changes without doing additional work, all the better.

My goal in terms of performance is to keep it somewhat reasonable, around 200-400 ms on my desktop and about a second on my netbook.

No, not going to do this for the ide - different codebases and itā€™d be significantly more work since thereā€™s no treesitter. The goal is to do this for non scide tools and allow them to work the tool into their pipeline.

1 Like

Update: still working on this - trying to figure out the best way to traverse the tree and do the indentation - itā€™d be easy if we had pointers to the nodes in the string (because then it wouldnā€™t matter if we moved things around!), but alas I decided to use python. :slight_smile:

Itā€™s getting there, but itā€™s still messy. Iā€™ll post something when I get it working. Just wanted to keep everyone up to date that itā€™s still in flight.

1 Like