Emacs - syntax highlighting colors

It is possible to retain the current design while improving the Tree-Sitter grammar. Fixing minor bugs and leveraging it scope from maybe 70% to 90% (just a guess).

It is a language design decision. Ruby, for example, has a minimal ts grammar. Haskell also features layers of code de-sugaring.

Just looking at the colors in your screenshot, it is possible to see the same color in curve: and amp:, even when they have different semantics. The same happens when a method using the syntax method: is treated as an operator.

In sclang all binary operators are evaluated left-to-right (so 2 + 3 * 4 = (2 + 3) * 4), and the only ā€œtighterā€ binding is that message sends (method chains) group before binary operators.

To actually take advantage of a TS grammar, we need to think about those things. Regular expressions are good enough for syntax highlighting.

In other words, Tree-sitter is a next-generation GPS system designed for cities with regular grid patterns. SuperCollider is more like Venice - beautiful and functional, but following its own logic that doesn’t map neatly onto standards. The developers need to decide how to design something that does not force SuperCollider into a strange worldview but adapt Tree-Sitter tech to grow in line with SuperCollider’s nature.

My intuition is that a ā€œlayered semanticsā€ approach would be appropriate, while using ts grammar just for syntax highlighting is not a big step forward.

Instead, find the use-cases that TS grammar can excel, while keeping it as simple and maintainable as possible:

a) identifying syntactic structures quickly and accurately - but don’t ask it to understand SuperCollider’s semantic peculiarities. In other words, the ts grammar should be deliberately dumb about semantics, focusing only on structural patterns.

b) Contexts: special meanings in different places (like inside a SynthDef, a Pattern, JITLib, a DSL, etc).

c) The best design decisions will be those that embrace this uniqueness, where different people can work on various parts of a design while modernizing SC.

cc: @madskjeldgaard @jamshark70 @julian

1 Like

I thought my fix was working. Feel free to polish it, just tell me what you changed so I can learn from you. :slight_smile:

Just a heads up: The Treesitter code for SC has undergone a lot of bug fixes recently.

Also, it would be great to have an emacs setup in the wiki for TS-SC if anyone feels like contributing it!

1 Like

yo @madskjeldgaard

That’s good news! I tested it using the code from yesterday’s GitHub repository, just for the record. If that’s the correct repository, it appears that the operator bug is still present.

Yeah, please open up an issue if you see bugs or missing things in the grammar. It’s got most things covered now and parses most of what I test with it without issues but we constantly discover weird things in the language we didn’t know about, hah.

Yes! thank you and the other supercollider tree-sitters very much, great work!
I noticed that issues I was having with the ! duplication statement went away.
Even though I managed to create a kind of working ts mode I am not sure I am the right person to write a useful set-up guide, yet.

yo @madskjeldgaard

This line of code offers a great opportunity for constructive improvement. Again, while it currently has some bugs related to operators, methods, and other details that seriously compromise the integrity of the tree structure, the explanations and fixes mentioned earlier provide a path for fixing them. In other words, it is currently broken.

freq = \freq.kr(440) * (Env.perc(..., curve: -1).ar * 48 * \bend.kr(1)).midiratio;

IMO, insisting on a comprehensive semantic framework may face complex challenges, particularly concerning key components, such as Patterns and JITLib. Do you think this is maintainable?

I am keen on collaborating. I will submit the issues and a patch for them.

1 Like

Are you sure this is valid SC syntax? The elipsis in Env.perc doesn’t run here.

Also why is my post being flagged for being against community guidelines when I’m just asking for people to open github issues ?_?

I used the original code, I typed it with abbreviations for shortness. The errors are in the text above, with details.

Ah okay. With the latest TS-SC, I can parse the following:

(
{
var freq = \freq.kr(440) * (Env.perc(0.01, curve: -1).ar * 48 * \bend.kr(1)).midiratio;
}
)

as

(source_file [0, 0] - [5, 0]
  (code_block [0, 0] - [4, 1]
    (function_block [1, 0] - [3, 1]
      (variable_definition [2, 0] - [2, 86]
        name: (variable [2, 0] - [2, 8]
          (local_var [2, 0] - [2, 8]
            name: (identifier [2, 4] - [2, 8])))
        value: (function_call [2, 11] - [2, 86]
          (receiver [2, 11] - [2, 76]
            (binary_expression [2, 11] - [2, 76]
              left: (function_call [2, 11] - [2, 24]
                (receiver [2, 11] - [2, 16]
                  (literal [2, 11] - [2, 16]
                    (symbol [2, 11] - [2, 16]
                      (identifier [2, 12] - [2, 16]))))
                (method_call [2, 16] - [2, 24]
                  name: (method_name [2, 17] - [2, 19])
                  (parameter_call_list [2, 20] - [2, 23]
                    (argument_calls [2, 20] - [2, 23]
                      (unnamed_argument [2, 20] - [2, 23]
                        (literal [2, 20] - [2, 23]
                          (number [2, 20] - [2, 23]
                            (integer [2, 20] - [2, 23]))))))))
              right: (code_block [2, 27] - [2, 76]
                (function_call [2, 28] - [2, 75]
                  (receiver [2, 28] - [2, 69]
                    (binary_expression [2, 28] - [2, 69]
                      left: (binary_expression [2, 28] - [2, 61]
                        left: (function_call [2, 28] - [2, 56]
                          (receiver [2, 28] - [2, 31]
                            (class [2, 28] - [2, 31]))
                          (method_call [2, 31] - [2, 53]
                            name: (method_name [2, 32] - [2, 36])
                            (parameter_call_list [2, 37] - [2, 52]
                              (argument_calls [2, 37] - [2, 41]
                                (unnamed_argument [2, 37] - [2, 41]
                                  (literal [2, 37] - [2, 41]
                                    (number [2, 37] - [2, 41]
                                      (float [2, 37] - [2, 41])))))
                              (argument_calls [2, 43] - [2, 52]
                                (named_argument [2, 43] - [2, 52]
                                  name: (identifier [2, 43] - [2, 48])
                                  name: (unary_expression [2, 50] - [2, 52]
                                    right: (literal [2, 51] - [2, 52]
                                      (number [2, 51] - [2, 52]
                                        (integer [2, 51] - [2, 52]))))))))
                          (method_call [2, 53] - [2, 56]
                            name: (method_name [2, 54] - [2, 56])))
                        right: (literal [2, 59] - [2, 61]
                          (number [2, 59] - [2, 61]
                            (integer [2, 59] - [2, 61]))))
                      right: (literal [2, 64] - [2, 69]
                        (symbol [2, 64] - [2, 69]
                          (identifier [2, 65] - [2, 69])))))
                  (method_call [2, 69] - [2, 75]
                    name: (method_name [2, 70] - [2, 72])
                    (parameter_call_list [2, 73] - [2, 74]
                      (argument_calls [2, 73] - [2, 74]
                        (unnamed_argument [2, 73] - [2, 74]
                          (literal [2, 73] - [2, 74]
                            (number [2, 73] - [2, 74]
                              (integer [2, 73] - [2, 74])))))))))))
          (method_call [2, 76] - [2, 86]
            name: (method_name [2, 77] - [2, 86])))))))

Seems alright to me but yeah be free to open an issue if you see any bugs :slight_smile:

No, the parse tree is not correct! This demonstrates exactly the bugs that I tried to identified in the discussion.

Look at the curve: -1 parameter:

(named_argument [2, 43] - [2, 52]
  name: (identifier [2, 43] - [2, 48])
  name: (unary_expression [2, 50] - [2, 52]  

The value -1 is incorrectly stored under name: instead of value:. The tree-sitter grammar has both fields defined as name:

You see name: twice.

The .midiratio is being applied to the entire expression instead of just the parenthesized part:

* `receiver`: `[2, 11] - [2, 76]` = `\freq.kr(440) * (Env.perc(0.01, curve: -1).ar * 48 * \bend.kr(1))`
* `method_call midiratio`: `[2, 76] - [2, 86]`

The parser is incorrectly including \freq.kr(440) * as part of the receiver for .midiratio

The parse tree shows:

  • receiver is the entire binary expression \freq.kr(440) * (...)
  • method_call for midiratio is applied to that whole receiver

The grammar treats parentheses as just another expression wrapper, not as a method binding boundary

The tree-sitter grammar should identify syntactic structures, but it currently gets many things incorrectly.

The tree-sitter-supercollider grammar needs basic corrections like parenthesis and operators, and in my opinion, also be simplified. It would be unmanageable to keep going and try to also deal with semantics.

It would be like using a lexer for type checking: wrong tool for something with such semantic complexity.

Right now, relying on sclang’s own understanding of the code (direct querying using introspection) would be way more reliable.

And also, more realistic. The language the semantic analyzer could be written is an open question, can be sclang or not. It is more a question of convenience, and how it will affect performance .

Ah okay, I get it. Thanks for clarifying

1 Like

@madskjeldgaard

Let’s start a thread to discuss these issues and determine the best design. One thing is clear to me: it will work if different people become interested in other components of this.

The tree-sitter could be completed in a few weeks, if we’re optimistic.

With good testing tools, we could advance with confidence once we get things correct. Currently, neither parentheses nor operators are accurate. People should not use it yet.

I’ve restored that post; it looks fine to me.

hjh

1 Like

I think this an exaggeration, IMO, and I would politely disagree.

I don’t know how it works in Emacs because I don’t use that editor but in Neovim it works great and I use it every day with no issues. Yes, it still needs work and I’m very happy to take contributions as I said but I’d politely disagree that it’s unusable. I use it for both syntax highlighting and context aware LSP sort of features like renaming variables, folding/unfolding collections etc. every day with few or no issues. I’ll admit I don’t sit here and read the parsed syntax tree from TS on a daily basis but for the tasks I need it for it works.

The hard thing about making the TS grammar for SC is that there is no official spec for the language anywhere. This TS syntax grammar is, as far as I know, one of the closest things to a spec we have right now. As an example: You mention the parenthesis as a ā€œmethod binding boundaryā€ — not sure where you got this from, and I’ve never heard of it before, but if it’s a meaningful distinction that could help make things more clear then I’m happy to talk about that, especially if you make a pull request or open an issue. Bear in mind, I’m a composer, not a computer scientist, so distinctions like these may not be perfect in my code.

I also don’t understand your worries about SC being a special case where a TS syntax would not be able to be written — JITLIB and patterns are just SuperCollider code. The special problems arise if people start doing their own mini programming languages in SC with the preprocessor hacks but I don’t see a lot of people doing that, and I’m also not sure this is an issue for TS. There are now TS syntaxes for virtually all programming languages and I don’t see how SC should be that much different. I do like your idea of simplifying the grammar though, and if it’s practically possible then I’d love to see it done. I am not personally sure how though, and I don’t have a lot of time to put into this myself for the moment, but it sounds like you have great ideas and I’m happy to see concrete solutions/PRs/issues as mentioned.

Also for tree-sitter discussions, please let’s continue here: Tree-sitter support for SuperCollider

1 Like

I watched some videos lately ragarding tree-sitter and I still don“t understand what it“s good for.

I’m a composer too! Just like you, homie.

It was more like an ad-hoc terminology trying to explain to you, one more time, that I was trying to say: this specific issue where syntax (parentheses) should determine method binding scope, especially given SC’s unusual left-to-right evaluation and the challenges of implementing its grammar in tree-sitter.

I did not ā€œget it fromā€ anywhere; I simply mean syntactic disambiguation regarding associativity and precedence rules.

I wrote about it too, and we share the same opinion. The thing is, one aspect is the scope of the core parser in C++, another is how it appears in user code within the IDE.

One option would be to treat it really like a ā€œcommunity grammar and specā€, mirroring the core parser. And building one layer above it that would de-sugar it into this spec.

In a way, the C++ parser is the only spec we got (hehe), and the tree-sitter grammar could become a formal, maintained mirror of it.

Example:

Base TS grammar (mirrors C++ parser):

  • Binary operators (all left-to-right)
  • Method calls, parentheses, brackets
  • Basic syntactic structures

De-sugaring/semantic layer:

  • Pattern recognition (Pbind, Pseq behaviors)
  • SynthDef context understanding
  • JITLib proxy transformations
  • (add all examples I wrote about before here)