Sclang-lint — a static linter and LSP server for SuperCollider

A linter and LSP for sclang is something a lot of us have wanted for a long time. Here’s one:

The motivation is probably familiar to anyone here. It’s a weakly and dynamically typed language, which makes it easy to write fragile code. Since classes have to be compiled before use, the write-test cycle can get long. The error messages, as we all know, are often less than helpful. In my experience a lot of these problems can be caught earlier, before you even evaluate.

sclang-lint does static analysis on the source directly, so there is no running sclang process needed at lint time. The lexer and parser are written in Python, mirroring sclang’s own lexer/parser, so it follows the real grammar rather than approximating it. On top of the AST it runs a set of rules: syntax errors, unused/shadowed/undeclared variables, var-not-at-top, reads before writes, assignment used as an if-condition, unreachable code after ^return, and more. With an optional JSON dump of your class library (generated once with sclang itself) it also checks unknown classes and methods, arity and keyword arguments, getter-vs-setter mistakes, and argument types, each with a “did you mean?” suggestion.

It runs as an LSP server over stdio using the standard protocol, so it works in most editors. I use it in nvim alongside scnvim, but it should be straightforward to run it in VS Code or Emacs. SCIDE doesn’t speak LSP. If anyone who knows the SCIDE codebase thinks this would be worth integrating, I’d be glad to help.

Thanks to Ludvig Elblaus (@ludo) for ideas and feedback along the way. We’ve been running it for several weeks and it’s genuinely changed how I work in SuperCollider, a real quality-of-life improvement.

Feedback, bug reports, and ideas for additional rules all welcome.

12 Likes

This is so very useful and has saved me so much time already in the short period I have had the opportunity to use it. Recompiling and restarting only to get caught on a simple typing error or missing semicolon is so annoying. Thanks again @Luc_Doebereiner for sharing!

1 Like

Hey Luc - thats such a cool project, thanks for sharing!

I currently thinking about implementing a native sclang LSP, see Add LSP server to sclang · Issue #7446 · supercollider/supercollider · GitHub - would be great to have some input by you!

The plan is the do all the document/io scaffolding within C++ and all the responses within sclang, with some information from the parser sprinkled in.

1 Like

This is very cool! I’m currently reworking SC Lang’s parser so it will output a usable structure so people don’t have to constantly rewrite it. I’ve recently done the same for the lexer, I’d you wanted to fuzz against it there is a pretty simple application in the repo under langutils/sc_lexer as there are some very unusual oddities in the language, e.g., 1233xA 12rapow: 2, and how newlines work in quoted symbols.

1 Like

Hi Dennis, thanks!

A native LSP is exciting, and there’s stuff you can do from inside the running interpreter that can’t be done from the outside, the live class library, for instance. But then, we need to restart to compile anyway. My linter consumes a JSON dump of the class library, which could be regenerated as part of library compilation. Similarly for the parser, reusing the real one means no drift on syntax changes.

My impression is that the C++ side hasn’t seen as much movement recently. It’s a tough codebase to work on, and the contributor pool is small. The real question is probably what functionality we actually want from an LSP. Once that’s nailed down it’s easier to see where it has to live.

I went external and Python for the opposite trade-off: runs without an sclang process, doesn’t affect sclang, lints files in isolation. I see it more as a basis for something like Flow for JS down the line, with type annotations or inference. I certainly wouldn’t want sclang’s performance affected by a linter or LSP. So the scope is deliberately narrower, pure static diagnostics, but for that it works really well. Anything needing live introspection is clearly better from inside sclang.

I remember doing something similar for a jupyter kernel, and performing already a JSON loads of the whole class library is a huge resource hog b/c JSON is not a good format for loading large documents.

Its indeed though, but Jordan is doing amazing work r/n and there is actually lots of stuff happening now.

I don’t think that something like Python script can be shipped as an “official” LSP and wouldn’t also help w/ the problem that the native LSP wants to tackle (easy I/O plugin-able) - this would make Python + dependencies a runtime dependency for SC which is not really nice.
So the idea of the built-in LSP is targeting is probably more longevity and reliability and not so much features.

All fair, the “official LSP” framing might be right one and built-in needs to live inside the project. However, LSPs are almost always separate processes in whatever language (pyright in Node, ruff in Rust, gopls in Go, etc.), and the editor doesn’t care, it just spawns a subprocess. So Python only becomes a runtime dep for SC if you actually bundle it. If users install it separately like every other LSP, SC itself stays untouched. The real argument for built-in is more about project ownership and longevity than runtime cost, which I think is a perfectly good argument on its own.

Also, it would probably be worth for scide to gain LSP client support regardless of how the built-in server is implemented or shipped. Once scide speaks LSP, users get to pick whatever server fits their workflow.

On the JSON: in my case it’s a one-off 14 MB load at server startup, decoded into Python dicts and kept in memory, not re-parsed per request. Editor startup is the only place you’d notice, and I don’t.

Longevity and reliability as the goal makes a lot of sense, and that’s exactly what an external tool can’t promise.

2 Likes

By the way, I like the way you handle does not understand here! Simple but works well and doesn’t suggest the incorrect interface, i.e., Scale.

In my mind, the main motivation for having the language server talk to the runtime is so you can ask the class what messages it responds to. This would involve some new method to report this, and I’m thinking it would be a compile time warning if the user implements doesnotunderstand and said method. Having not written an lsp before, I’m unsure what happens if this set changes with time…

Out of curiosity how do you handle code that hasn’t been executed yet? I see you report when an environmental variable is read but not defined in the file. Does ‘defined’ count when it is written, or when it is executed?

I think in the long run we might want to rethink how we deal with the difference, and in general the .scd format because it’s a bizarre whitespace format that needs to work with broken code… I haven’t had a change to play with your project, but does it handle this well?

1 Like

Hi Jordan,

Thanks! The does-not-understand handling is deliberately conservative.

The linter is fully static, with no runtime interaction with sclang at all (which is a design choice I actually prefer). And for the “what messages does this class respond to” question, you don’t really need the runtime. The method set is fixed at compile time, which is exactly why the JSON dump works. My linter already knows which methods every class supports. It just reads them out of the dumped class metadata rather than asking a live interpreter. The runtime would only buy you genuinely dynamic stuff, not the static method set.

On the environmental var question (the “globals” with ~): “defined” here means written somewhere in the file (an assignment appears in the AST), not executed. There’s no execution model at all in the linter. I don’t track evaluation order or interpreter state, I just scan the file for whether ~foo is ever assigned. That obviously doesn’t cover every case (cross-file, things you typed into the interpreter earlier, etc.), but reading an env var that’s never assigned anywhere in the same file is such a common mistake. It’s one of the most frequent errors I see my students (and myself) make, and it’s virtually always assumed to live in the same file that the per-file check earns its keep, I think.

On broken code: the parser does error recovery. It synchronizes at statement and definition boundaries, so a half-typed or syntactically broken buffer still produces useful diagnostics for the parts it can parse, rather than bailing on the whole file. That’s the case that matters most for an LSP, since the buffer is broken on basically every keystroke. Yes, the .scd whitespace weirdness you mention is real, I’m really looking forward to your rework!

I’m also planning a config file so rules can be toggled on/off. I think people have different tastes, and some warnings are surely matters of preference.

1 Like

Jordan is referring to this case:

e = (myCustomMethod: { rrand(10, 20) });

At this point, yes, it’s true that the class Event doesn’t understand myCustomMethod, but the event instance e does respond to it with a number, rather than an error. Currently there isn’t a consistent interface to find out what messages an object responds to, in contrast to asking which messages are defined in a class.

Edit: I missed it at first, but now I see that, of course, you’re aware of this. So it’s a value judgment: what’s gained in exchange for the performance cost. I use soft methods a lot, so the runtime would benefit me greatly. Others may not care.

hjh

Actually that specific case probably isn’t worth it as it will only realistically work with environment vars. I was actually thinking about doesnotunderstand on classes, so *doesnotunderstand. A similar thing could be applied to synth names or patterns. I think Scot’s on does something like this?

Ah so that does miss out a lot of ‘doesnotunderstand’ methods that are only created during initClass. Still, I think it’s a reasonable compromise!

I’m no bison expert, but I think this can actually be done in the official grammar. Which might help with the whitespace stuff in scd files.

Yeah, it’s quite annoying in a way that it isn’t actually an error.

If you think of a = (), because event never throws a doesnotunderstand error, technically it responds to every single message so what your doing there isn’t technically correct, but it still really useful… I have gripes with the event/environment class.