Tree-sitter support for SuperCollider

This looks just the ticket. Good work. Excited to see the end result. I think this deserves a seperate thread at some point whenever you feel like it.

Once the script is finished I can try and get it running in neovim

Thanks! I’ll make a separate thread shortly.

aaak Treesitter crashing nvim on certain files (see issue here How to find a segfault in a grammar? (SuperCollider) (select_smaller_error symbol:ERROR, over_symbol:ERROR) · Discussion #1723 · tree-sitter/tree-sitter · GitHub )

anyone got a workaround for now?

The only workaround I know is to disable ts. It’s extremely difficult for me to fix this bug solo so if anyone can help please do so. I’ve found that it doesn’t happen if .collect etc’s function has parenthesis around it. See here

1 Like

I’ve got cycles today as I’m traveling.
Getting my head around the grammar queries and I’m able to pick things out.

I can try parsing this with gdb and see if I can find the culprit ; if it’s a segfault I’d father it’s looking for a paired piece of data that doesn’t exist.

2 Likes

That would be awesome John thanks. I did try it in GSB but I didn’t have treesitter itslef installed with debug symbols so it was pretty uninformative haha.

Yeah, I see the stacktrace - I was surprised where it was hitting - but I need to rebuild with debugging symbols.
I can reliably get it to crash on that code, though, which is good. :slight_smile:

I was working on a netbook from 2013 today, though, so rebuilding treesitter wasn’t in the cards ; back at my home computer tomorrow so I’ll be able to check.

I also got the queries and string replacing working for the code formatter - I’ve got one more thing to do to confirm the proof of concept and then I’ll start a thread with the repo and look for pull requests.

@madskjeldgaard

Stack trace details:

lib/src/parser.c

|     1541      // If there were no parse actions for the current lookahead token, then                                                                                                                                                                                                                                                                     │
│     1542      // it is not valid in this state. If the current lookahead token is a                                                                                                                                                                                                                                                                       │
│     1543      // keyword, then switch to treating it as the normal word token if that                                                                                                                                                                                                                                                                     │
│     1544      // token is valid in this state.                                                                                                                                                                                                                                                                                                            │
│     1545      if (                                                                                                                                                                                                                                                                                                                                        │
│  >  1546        ts_subtree_is_keyword(lookahead) &&                                                                                                                                                                                                                                                                                                       │
│     1547        ts_subtree_symbol(lookahead) != self->language->keyword_capture_token                                                                                                                                                                                                                                                                     │
│     1548      ) {                                                                                                                                                                                                                                                                                                                                         │
│     1549        ts_language_table_entry(self->language, state, self->language->keyword_capture_token, &table_entry);                                                                                                                                                                                                                                      │
│     1550        if (table_entry.action_count > 0) {                                                                                                                                                                                                                                                                                                       │
│     1551          LOG(                                                                                                                                                                                                                                                                                                                                    │
│     1552            "switch from_keyword:%s, to_word_token:%s",                                                                                                                                                                                                                                                                                           │
│     1553            TREE_NAME(lookahead),                                                                                                                                                                                                                                                                                                                 │
│     1554            SYM_NAME(self->language->keyword_capture_token)                                                                                                                                                                                                                                                                                       │
│     1555          );                                                                                                                                                                                                                                                                                                                                      │
│     1556                                                                                                                                                                                                                                                                                                                                                  │
│     1557          MutableSubtree mutable_lookahead = ts_subtree_make_mut(&self->tree_pool, lookahead);                                                                                                                                                                                                                                                    │
│     1558          ts_subtree_set_symbol(&mutable_lookahead, self->language->keyword_capture_token, self->language);                                                                                                                                                                                                                                       
│     1559          lookahead = ts_subtree_from_mut(mutable_lookahead);                                                                                                                                                                                                                                                                                     │
│     1560          continue;                                                                                                                                                                                                                                                                                                                               │
│     1561        }                                                                                                                                                                                                                                                                                                                                         │
│     1562      }

Some GDB output:

(gdb) bt
#0  ts_parser__advance (allow_node_reuse=<optimized out>, version=<optimized out>, self=0x5555560a73f0) at lib/src/parser.c:1546
#1  ts_parser_parse (self=0x5555560a73f0, old_tree=<optimized out>, input=...) at lib/src/parser.c:1904
#2  0x000055555567c9ef in tree_sitter_cli::parse::parse_file_at_path ()
#3  0x00005555555d365c in tree_sitter::run ()
#4  0x00005555555cefba in tree_sitter::main ()
#5  0x00005555555da0e3 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#6  0x00005555555daef9 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h11ad262672dee719 ()
#7  0x0000555555894a51 in std::rt::lang_start_internal ()
#8  0x00005555555d6322 in main ()
(gdb) f 1
#1  ts_parser_parse (self=0x5555560a73f0, old_tree=<optimized out>, input=...) at lib/src/parser.c:1904
(gdb) p input
$1 = {payload = 0x7fffffffb760, read = 0x55555563ae60 <_ZN11tree_sitter6Parser10parse_with4read17hd615fe985a8ffc5eE.llvm.5132543314823820277>, encoding = TSInputEncodingUTF8}
(gdb) info locals
allow_node_reuse = <optimized out>
version = 1
min_error_cost = <optimized out>
position = <optimized out>
last_position = 321
version_count = 11
result = <optimized out>
(gdb)

So my guess is that there needs to be an additional bit of grammar around that “if” section because it’s trying to match the parenthesis for it, it can’t find it, but it correctly thinks that it’s valid code so it ends up choking.

Does that help at all ?

1 Like

Hey John did you happen to investigate this any further?

Would love a PR if you would be interested in trying your hand at fixing it.

Hey! On my list. I’ve been buried with school and work - have a midterm this week and have had a few major releases the last few weeks. I’ll get back on this within the next week.

Ah okay, no stress John. Take it easy - thanks!!

Back on it! Summer through september were wild.
I’ll take a look at this today.

1 Like

Excellent, John. Thank you!

I set this up in emacs (linux) over the weekend. I’ll add some notes here, in case it’s helpful to anyone.

Basic steps:

  1. clone the tree-sitter-supercollider repo and install/configure tree-sitter CLI
  2. compile a shared library from the src files (parser.c & scanner.c)
  3. register the grammar with the emacs tree-sitter package

Pre-requisites:

  1. node
  2. a C compiler
  3. emacs with scel and tree-sitter installed

Initial tree-sitter set up

This seems straight forward, but there are a couple of typical little traps that can waste your time. So I’ll go into detail here.

Cloning the repo and running npm install from inside the tree-sitter-supercollider directory will install the tree-sitter-cli package.

Next, configure the tree-sitter CLI with tree-sitter init-config.
This will create a directory at ~/.tree-sitter/bin
And a config file at ~/.tree-sitter/config.json

Open the ~/.tree-sitter/config.json file and edit the "parser-directories" array, so that the CLI can find the tree-sitter-supercollider directory.

Run tree-sitter generate from within the tree-sitter-supercollider directory.

tree-sitter dump-languages should now list the supercollider parser.

Try out the highlight & parse commands on the example.scd file in the directory
tree-sitter parse example.scd
tree-sitter highlight example.scd

info for systems with an existing tree-sitter binary (explicit install or a dep for other packages)

The emacs tree-sitter package requires grammars that were compiled with a version of tree-sitter before 0.20. The tree-sitter-supercollider project does specify a compatible version (^0.19.4) in the package.json.

But if there’s a tree-sitter binary already installed, there’s a chance it’s more recent (> 0.20) and also taking precedence over the version that’s installed to the project locally.

which tree-sitter will show which tree-sitter binary is being used.

I just went with installing a specific version of the npm package, globally as suggested here:

A note on the tree-sitter init-config command.

The C binary package will create a directory in ~/.config/tree-sitter, the npm package will use ~/.tree-sitter and the rust package probably does something different also with rustup etc.

If tree-sitter parse isn’t working or tree-sitter dump-languages isn’t showing the supercollider parser, then that’s a sign that the config.json file doesn’t have the correct path in "parser-directories" array.

If the parser-directories array looks good but the parser is still not showing up, that’s a sign that the wrong config.json file was edited. For example, I was editing the ~/.config/tree-sitter/config.json which belongs to the binary package, but using the npm cli package which looks in ~/.tree-sitter/config.json.

Create a shared library (.so) file

The emacs tree-sitter package needs the tree-sitter grammar to be compiled into a shared library. I’m not fully across the inner workings of static/dynamic linking and shared libs in C, or qualified to be offering up commands. But here is what worked for me after some searching/reading on the topic.

From within the tree-sitter-supercollider/src directory, run:
gcc -shared -o supercollider.so -fPIC scanner.c parser.c

There should now be a supercollider.so file, in the src directory.
Move it to ~/.tree-sitter/bin .

My reference for the above command is here:
http://www.microhowto.info/howto/build_a_shared_library_using_gcc.html

Register the supercollider grammar with the emacs tree-sitter package

This step assumes that the scel and tree-sitter packages are already installed and configured in emacs. I’m not using a config framework like Doom/Spacemacs. So there might be some slight differences.

Evaluating the following line in a buffer is probably a good start. It will confirm where the emacs tree-sitter package will look for compiled grammars on the system. This should be precisely where the supercollider.so file was moved to, in the previous step.

(tree-sitter-cli-bin-directory)

Adding the following line to your config will let the tree-sitter package know to look for a grammar called ‘supercollider’ and register it with sclang-mode.

(add-to-list 'tree-sitter-major-mode-language-alist '(sclang-mode . supercollider))

If you run into any issues, checking the value of that list is a good start.

Either via ‘Describe Symbol’ C-h o tree-sitter-major-mode-language-alist or with M-x customize-option [ Return ] tree-sitter-major-mode-language-alist, which will display a GUI for the list. A list entry can be easily deleted/edited from there.

The next step is to pull in the highlights.scm file that’s located at tree-sitter-supercollider/queries/highlights.scm . The tree-sitter package will use it for syntax highlighting. Here it is alongside the line above.

(add-to-list 'tree-sitter-major-mode-language-alist '(sclang-mode . supercollider))

(defun import-supercollider-highlights ()
  (with-temp-buffer
    ;; add the actual path on your system below
    (insert-file-contents "PATH/TO/tree-sitter-supercollider/queries/highlights.scm")
    (goto-char (point-max))
    (insert "\n")
    (buffer-string)))

(defun set-supercollider-default-patterns ()
  (setq tree-sitter-hl-default-patterns (import-supercollider-highlights)))

(add-hook 'sclang-mode-hook 'tree-sitter-hl-mode)
(add-hook 'sclang-mode-hook 'set-supercollider-default-patterns)

This is the exact block of code that is working for me currently. It’s a bit naive but easy to get the gist of. You could chuck the multiple hooks into a function, but this suits me for now while I’m establishing some other sclang-mode related functionality in emacs.

here is an example of what looks like a pretty robust approach to a more detailed config.

link to the emacs tree-sitter docs

The finer grained syntax highlighting makes a huge difference and I’m looking forward to adding some additional functionality now that I have tree-sitter set up.

Thanks very much @madskjeldgaard for all your hard work on this. I’m definitely up for helping out with the project and am slowly starting to get my head around how it works.

3 Likes

This is absolutely excellent. Thanks for these notes!

Excellent notes!
I’ve been running into a lot of issues with the treesitter crashing when writing classes. Not sure if it’s the neovim implementation, or the grammar, but I will report back as I work my way through the issues.

Hi, I haven’t encountered that issue yet. But a simple test looked ok here. If you let me know your steps to repro a crash, I can try it with treesitter in emacs.

It may be because i haven’t compiled the treesitter object in a while, but if memory serves, it was crashing when I was writing even simple objects. It mostly crashed on a “{” for a class with an argument list after it, and only when writing new code. (That means, if I had a class with complete functionaltiy written, the treesitter was happy, but when I was in the middle of writing, it would crash.)

From what I can tell, normal SC code is fine, and the exception issues are happening mostly with class code.

I’ll find a few spare cycles, rebuild the treesitter and then see if I can recreate.

You are right about this John. Nothing has changed really in the grammar, but something in either Treresitter og nvim-treesitter seems to have changed, so that when it fails like this it does not crash the editor/treesitter itself very often (I have a vague suspicion that it is related to tree-sitter version [0.20.7](Release v0.20.7 · tree-sitter/tree-sitter · GitHub which came out in september (but as they still don’t publish changelogs, it’s not super easy to figure out what might have changed in this regard)) .

This used to cause a reproducible crash:

TestKometFaustFiles : KometTest{
    // Test if faust files compile
    test_faustFilesCompile{
        this.assert(
            KometPath.faustFilesPath.files.collect{|fff|
                if(
                "faust % > /dev/null".format(fff.fullPath).systemCmd == 0
            }.every{|bool| bool }
        )
    }
}

It used to crash neovim, but now you can actually see in tree-sitter-playground where it gets confused (and it doesn’t crash):

And running tree-sitter parse on a file that contains this (again, this used to crash):

(ERROR [0, 0] - [11, 0]
  (class [0, 0] - [0, 19])
  (class [0, 22] - [0, 31])
  (line_comment [1, 4] - [1, 34])
  (identifier [2, 4] - [2, 26])
  (variable [3, 8] - [3, 12]
    (local_var [3, 8] - [3, 12]
      name: (identifier [3, 8] - [3, 12])))
  (identifier [3, 13] - [3, 19])
  (class [4, 12] - [4, 21])
  (method_call [4, 21] - [4, 36]
    name: (method_name [4, 22] - [4, 36]))
  (method_call [4, 36] - [4, 42]
    name: (method_name [4, 37] - [4, 42]))
  (identifier [4, 43] - [4, 50])
  (parameter_list [4, 51] - [4, 56]
    (argument [4, 52] - [4, 55]
      name: (identifier [4, 52] - [4, 55])))
  (function_call [6, 16] - [7, 33]
    (receiver [6, 16] - [6, 73]
      (binary_expression [6, 16] - [6, 73]
        left: (function_call [6, 16] - [6, 68]
          (receiver [6, 16] - [6, 37]
            (literal [6, 16] - [6, 37]
              (string [6, 16] - [6, 37])))
          (method_call [6, 37] - [6, 58]
            name: (method_name [6, 38] - [6, 44])
            (parameter_call_list [6, 45] - [6, 57]
              (argument_calls [6, 45] - [6, 57]
                (unnamed_argument [6, 45] - [6, 57]
                  (function_call [6, 45] - [6, 57]
                    (receiver [6, 45] - [6, 48]
                      (variable [6, 45] - [6, 48]
                        (local_var [6, 45] - [6, 48]
                          name: (identifier [6, 45] - [6, 48]))))
                    (method_call [6, 48] - [6, 57]
                      name: (method_name [6, 49] - [6, 57])))))))
          (method_call [6, 58] - [6, 68]
            name: (method_name [6, 59] - [6, 68])))
        right: (literal [6, 72] - [6, 73]
          (number [6, 72] - [6, 73]
            (integer [6, 72] - [6, 73])))))
    (ERROR [7, 12] - [7, 13])
    (method_call [7, 13] - [7, 33]
      name: (method_name [7, 14] - [7, 19])
      (parameter_list [7, 20] - [7, 26]
        (argument [7, 21] - [7, 25]
          name: (identifier [7, 21] - [7, 25])))
      (variable [7, 27] - [7, 31]
        (local_var [7, 27] - [7, 31]
          name: (identifier [7, 27] - [7, 31]))))))
fuck.scd        1 ms    (ERROR [0, 0] - [11, 0])

This leads me to conclude that the problem is stil there (and seems related to chained methods without parenthesis (collect{...}.every{...}) but does no longer cause segfaults (hurray!)

Oh fantastic! I’ll rebuild.