Code Formatter Development for Supercollider

bovil43810 · May 12, 2022, 5:11pm

Personally I would much prefer a code formatter that makes my code look less like western notation than before On a more serious note, I think there are distinct scenarios for both ways of writing where one of them may be more appropriate (I’m thinking fractions versus decimal numerals), and favoring one over the other is introducing unecessary bias that doesn’t make for more consistent or easily readable code (as opposed to “productive” bias when e.g. choosing spaces over tabs).

john-d-murphy · May 12, 2022, 5:54pm

Works for me! It was a potentially contentious thought and it changes things but I figured I would float it.

I’m leaning towards the idea of just treating the arrays as data and expecting the developer to use tools on their side to give it semantic meaning.

Most of the code I see doesn’t give formatting of the numbers in arrays any thought and there’s so many possibilities underlining what the data means that I’m leaning towards it being a fool’s errand to try and handle it

I’ll write it with that in mind and see if it ends up being gibberish.

madskjeldgaard · May 12, 2022, 8:10pm

Great to hear. Plaese feel free to post issues to the tree-sitter-supercollider repo if you find any bugs in the parsing. There’s still edge cases that might not be covered correctly.

And yeah just generally - treesitter is such a cool tool!

scztt · May 13, 2022, 11:06am

This should roughly do it:

(
~unformatted = "{currentEnvironment.keysValuesDo({ |key, value| key.postln; value.postln }) }";
~formatted = "{currentEnvironment.keysValuesDo({ |key, value| key.postln; value.postln; }) }";

~deepCompare = {
	|a, b, depth=0|
	"comparing a=%, b=%".format(a, b).postln;
	
	if (depth > 1000) {
		Exception("Recursion problem...").throw;
	};
	
	if (a.class != b.class) {
		false
	} {
		if (a.slotSize == 0) {
			(a == b)
		} {
			(0..(a.slotSize - 1)).detect({
				|slotIndex|
				if (a.slotKey(slotIndex).postln == \sourceCode) {
					false
				} {
					~deepCompare.(a.slotAt(slotIndex), b.slotAt(slotIndex), depth+1).not
				}
			}).isNil
		}
	}
};

~deepCompare.(
	~unformatted.compile.def,
	~formatted.compile.def
).postln;
)

Basically, compile your two source codes and then do a recursive comparison of the FunctionDef, considering every slot EXCEPT for sourceCode (since obviously this differs). I believe any difference here should be considered a possible functional difference, and this should be off-limits for the formatter. Recursion might be a problem here, I’m not sure - if the recursion depth error gets hit, this might need some amendments to resolve the case of nested data structures.

It’s worth noting that changing rationals to floats or vice versa does, as James suggested, violate the “no functional changes” rule. This also catches changes like e.g. | argument=1 | and | argument=(1) | which are also not functionally equivalent because of sclang obscurities.

john-d-murphy · May 13, 2022, 12:16pm

Thanks! That should be a good place to start.

Floats to rationals makes sense that it changes the underlying calculation, but it’s really surprising that “1” is different than “(1)”.

With that in mind, do you know offhand if there is any difference between:

“{}” and “({})” and “arg x =1;” and “|x=1|” or “arg x” and “|x|” ?

scztt · May 13, 2022, 1:20pm

I doubt those last things are different, but the real test is just to compare the FunctionDefs.

The difference between a=something and a=(something) in an arg or var is - the former (no parentheses) is only for primitive values, float, int, bool etc - these are stored as part of the FunctionDef itself (as a prototype Frame). The latter (parentheses) actually constructs the value at runtime when the function is called. Since Object-like things have constructors that need to be called, this is the only way you can have non-primitive default values.

There’s also a difference between these two for arg specifically, related to how nil arguments are handled:

({ |a=10| a.postln }).(nil); // nil
({ |a=(10)| a.postln }).(nil) // 10

This is a rather unlovely part of the language, but… it is what it is I guess.

john-d-murphy · May 13, 2022, 1:36pm

That’s wild! Every language has its quirks.
Cue the mind blown gif.

scztt · May 13, 2022, 4:22pm

Google “python mutable default arguments” if you want to see something even more horrifying…

john-d-murphy · May 14, 2022, 1:09am

Forgive me if this a bit blurry - at a bar thinking about this after a long week with work. I’m also typing this on my phone for my review later, so feel free to ignore this post.

My m.o. for each of the rules so far is to do a query and then process the results of the query in reverse order so that the string can be processed in place without having to rebuild the tree.

Tree indenting in this code will happen after all of the other formatting rules have happened, so we can safely assume all spacing, normalization, etc. has already happened.

If we are processing for indenting then we need to do a first pass of the tree and pull out each of the code blocks with their “level” attached, and then as we process each of the elements in a code block that are on the same level, we can tab indent them to the appropriate level, and if we hit another code block, we handle that recursively.

I think it’s fair to say that if we blow out the stack that the code is too complicated and we can handle a crash.

For each of the code blocks, I think the following should apply:

If the code block is a single line that goes over the 80 character limit, we know that it’s going to be broken up, so we need to mark that it’s going to be bounded by parenthesis after the transformation.
If the code block is a single line that does not go over the 80 character limit, then we won’t touch it. The existing transformations are sufficient.
If the code block goes over the 80 character limit and somehow isn’t bound by parenthesis, we’ll mark it as needing to be bound by parenthesis. (I have no idea how likely this is.)

Non-Class Code:

Arrays that go over the limit will be split as close to the 80 character limit as possible, with the next line(s) having an extra level of indentation. There is no accommodation given to meter and no data within the array is changed.
Multiple line functions are broken out by statement with the parameter list on the same level as the function definition with one statement each on subsequent lines. Functions are formatted K&R style and this should be maintained.

Need to continue to think and will update as I work my way through - any thoughts about non class cases that need to be handled ?

scztt · May 14, 2022, 11:49am

Why do we need to introduce parentheses? A newline and a space are semantically equivalent in sclang - adding a newline doesn’t require any extra parens to preserve the old meaning of the code.

Maybe a minor point, but — it seems like adding a line break because of the column limit shouldn’t be a special case or require any special tracking - it should behave exactly the same as if the programmer had introduced a line break at that point themselves. “the next line having an extra level of indentation” is just the same extra level that would would get if any single statement in sclang is broken up across multiple lines - this should fall naturally out of the indentation algorithm, right?

I believe indentation can basically be handled in a single pass top-to-bottom - so, you SHOULD be able to have an algorithm that looks like:

Apply indentation to the current line, based on the current indentation stack depth.
If the lineLength > columnLimit, insert a \n at an appropriate place.
Continue to the next line and start again at #1 (if we added a \n, that partial line will be the next line)

I would expect that this would be identical for class and non-class source code

I’m not sure the “one statement per line” really makes sense in the context of sclang (but maybe I’m misunderstanding you?) - it’s common to have functions with a single top-level statement that are nonetheless 50+ lines long.

john-d-murphy · May 14, 2022, 1:15pm

For the parenthesis, I was thinking in terms of the editor and making sure that if you were able to evaluate the block as one line previously, then if it were split across multiple lines, you could still evaluate with the same keystroke.

i.e.

{....}

to

({
  .....
})

Perhaps that’s foolish, but part of the exercise is to make editing easier, and with a live-coding language like this, I feel like we need to keep the editor in mind, and splitting up lines has always been a bugbear of mine when I’m tinkering.

Theoretically - yes. I wanted to make sure I mentioned it, though. I was thinking out loud, specifcally about continuation lines and how different formatters manage that.
I was reviewing Drew Devault’s ~sircmpwn/cstyle - C style guide - sourcehut git and I was thinking specifically about the continuation section and this piece:

if (is_valid_foobar(possible_foobar)
                && get_number_of_foobars(possible_foobar) > 10
                && is_application_ready()) {
        /* Do work */
}

int
do_foobar_work(struct foobar *foo, struct foobar *bar,
                struct foobar *baz, struct foobar *bet)
{
        /* Do work */
}

Incorrect:

if (is_valid_foobar(possible_foobar)
        && get_number_of_foobars(possible_foobar) > 10
        && is_application_ready()) {
        /* Do work */
}

int
do_foobar_work(struct foobar *foo, struct foobar *bar,
        struct foobar *baz, struct foobar *bet)
{
        /* Do work */
}

This visually distinguishes new scope from its parent.

I think that makes a lot of sense, but I want to see how it works in practice. I was making that note more for myself.

The problem with doing top down is that as soon as you insert something into the text, the tree and its underlying text become disjoint. We don’t want to rebuild the tree after every operation, so the reason why I’m going back to front is to maintain the ‘start position’ in every node and maintain the semantics around what each node means. Otherwise, yes, that point is true.

I was thinking logical statement. So a 1 + 1; 2 +2; 3 + 3; would be split onto multiple lines. A gigantic megasynth like

{0.1*Resonz.ar(GravityGrid.ar(Impulse.kr(LFNoise0.kr([0.2,0.3],70,80)),[11.2,12.5], LFNoise0.kr(5.4,0.4),LFNoise0.kr(10,0.8), b.bufnum),SinOsc.ar(0.03,0,300,500),SinOsc.ar(0.03,0,0.3,0.4))}.play

I’m still not exactly sure how I want to handle, if that’s what you’re referring to.

john-d-murphy · May 14, 2022, 1:37pm

Those big one-liners are somewhat critical to development of SC code, but can get unweildy both to think about and to write - any thoughts on how we can go about making those easier to read, edit, and evaluate ?

john-d-murphy · May 14, 2022, 3:04pm

Did some transforms by hand here, applying some of the rules that I came up with.

gist.github.com

https://gist.github.com/john-d-murphy/ffe006813c69bee8c6a16a33bc7bb742

gravitygrid.sc

//GravityGrid comes out with some interesting noisy sounds- WARNING, loud, aliasing! You'll probably want to filter the outputs, maybe with RLPF, Slew et al.

// All Transforms Done by Hand

// Transform 1
{ Pan2.ar(0.05 * GravityGrid.ar(Impulse.kr(2), MouseX.kr(0.1, 10, 'exponential'), LFNoise0.kr(2, 0.8), LFNoise0.kr(2, 0.8)), 0.0) }.play;

({ Pan2.ar(
      0.05 
      * GravityGrid.ar(

This file has been truncated. show original

I think the rules of “double indenting” make some of those longer lines really legible.

{ 0.1 * Resonz.ar(GravityGrid.ar(Impulse.kr(LFNoise0.kr([0.2, 0.3], 70, 80)), [11.2, 12.5], LFNoise0.kr(5.4, 0.4), LFNoise0.kr(10, 0.8), b.bufnum), SinOsc.ar(0.03, 0, 300, 500), SinOsc.ar(0.03, 0, 0.3, 0.4)) }.play;

becomes:

({ 0.1 
   * Resonz.ar(
      GravityGrid.ar(
          Impulse.kr(LFNoise0.kr([0.2, 0.3], 70, 80)), 
          [11.2, 12.5], 
          LFNoise0.kr(5.4, 0.4), 
          LFNoise0.kr(10, 0.8), 
          b.bufnum), 
      SinOsc.ar(0.03, 0, 300, 500), 
      SinOsc.ar(0.03, 0, 0.3, 0.4)
  )}.play;
)

madskjeldgaard · May 14, 2022, 5:58pm

An idea: It would be really cool if especially the more controversial rules but in reality all rules could be toggled on/off or configured in a config file local to the project and / or global to the system it is being used on. Would allow many styles and per project specificities.

Not sure if this was mentioned already but thought it worth mentioning here

john-d-murphy · May 14, 2022, 7:53pm

I want to avoid that for now. I feel like if we can get to a place where it’s 95 percent good enough for everyone, the benefit of having code that’s consistent regardless of who wrote it will be better for the community than allowing for ideosyncracies. It’ll also greatly simplify development.

I may be proven wrong in the future, but I’m still working with Black as a model for the design in mind.

prko · May 15, 2022, 6:35am

Thanks for this!
Do you intend to implement this feature for SC-IDE later? I often forget a white space before a closing curly brace, after a comma, and before and after operators. So, it would be practical to have a code formatter also in SC-IDE since SC-IDE only correct indentation.

scztt · May 15, 2022, 11:37am

Makes sense.
I would gently caution against premature optimization here (just in case you happen to find yourself deep in a rabbit-hole while solving this problem). Pragmatically - when I connect this formatter to the sclang language server, it will almost for sure re-format the entire file each time. Building a pipeline for diff-based incremental reformatting would bring some relatively heavy-weight technical requirements, and I simply don’t think this would be a priority for a while, if ever (clang-format does not run incrementally though it has a much harder job, and the user experience for C++ formatting is totally acceptable). I would hate to have you pull off an incredible feat of engineering to have ultra-fast incremental updates, only to not use it. .
Also worth checking how much work TreeSitter is doing here - parsing is more expensive than formatting, and I thought I recall one of the benefits of TreeSitter is that fast incremental updates are somewhat built in?

Yes, those are more the cases I’m thinking about. Single-statement SynthDefs are not really my style, but some people write this way (and, I guess I do as well for certain kinds of cases). The more common one is using a single long statement when building GUI’s declaratively. For example, these are all one statement, and the code would be pretty disrupted if the line breaks were changed in substantial ways:

github.com

scztt/ServerView.quark/blob/master/ServerView.sc#L292


      
          				{ Server.killAll }),
          			ServerViewAction("Close Server View", 27.asAscii.asSymbol,
          				{ parent.window.rememberPosition(\ServerView); parent.close() }),
          
          		];
          
          		^actions
          	}
          
          	menuActions {
          		^[
          			MenuAction("Query nodes", {
          				server.queryAllNodes(true)
          			}).shortcut_("n"),
          
          			MenuAction("Level meters", {
          				server.tryPerform(\meter)
          			}).shortcut_("l"),
          
          			MenuAction("Tree view", {
          				if(server.serverRunning) { TreeSnapshotView(server).autoUpdate().front() }

github.com

scztt/ServerView.quark/blob/master/ServerView.sc#L755


      
          			{
          				muteButton.value_(flag.binaryValue);
          			}.defer
          		})
          		.put(\ampRange, {|changer, what, min, max|
          			volSpec = [min, max, \db].asSpec;
          			volumeSlider.value_(volSpec.unmap(server.volume.volume));
          		});
          		volSpec = [server.volume.min, server.volume.max, \db].asSpec;
          
          		view = View().layout_(
          			VLayout(
          				HLayout(
          					StaticText().font_(this.font(9)).string_("VOL:  "),
          					volumeSlider = (Slider()
          						.action_({ |v| server.volume_(volSpec.map(v.value).round(0.1)) })
          						.orientation_(\horizontal)
          						.maxHeight_(height)
          						.thumbSize_(height - 5)
          						.value_(volSpec.unmap(server.volume.volume))
          					),

github.com

scztt/ServerView.quark/blob/master/ServerView.sc#L87


      
          		widgetLayout.margins = 3;
          		widgetLayout.spacing = 1;
          		view.onClose_(this.onClose(_));
          		this.populateView();
          	}
          
          	populateView {
          		containers = List();
          		widgets = List();
          
          		([ServerSelectorWidget] ++ this.class.widgets).do {
          			|wClass, i|
          			var container, widget;
          			widget = wClass.new(this.server, this).view();
          			widgets.add(widget);
          			container = View().layout_(
          				VLayout(
          					widget,
          					if (i != this.class.widgets.size()) {
          						UserView().fixedHeight_(10).drawFunc_({
          							|v|

john-d-murphy · May 15, 2022, 1:16pm

Oh these are great examples ! Thanks. I’ll take a gander at those today.

Wrt premature optimization - not going too far down that rabbit hole, but I tend to think in terms of “doing less work” bc my job requires me to write efficient cpp. The treesitter is great and super fast, but if you can do the changes without doing additional work, all the better.

My goal in terms of performance is to keep it somewhat reasonable, around 200-400 ms on my desktop and about a second on my netbook.

john-d-murphy · May 15, 2022, 1:18pm

No, not going to do this for the ide - different codebases and it’d be significantly more work since there’s no treesitter. The goal is to do this for non scide tools and allow them to work the tool into their pipeline.

john-d-murphy · May 19, 2022, 11:38am

Update: still working on this - trying to figure out the best way to traverse the tree and do the indentation - it’d be easy if we had pointers to the nodes in the string (because then it wouldn’t matter if we moved things around!), but alas I decided to use python.

It’s getting there, but it’s still messy. I’ll post something when I get it working. Just wanted to keep everyone up to date that it’s still in flight.