Hey @joslloand, thanks for your interest in Hadron.
I’ve definitely been looking at Julia for inspiration, there’s a lot there to like. And I have some thoughts about possible speedups for floating-point code in Hadron, but am deferring questions about optimizations like that until after the basic runtime is operational.
I’ll do my best avoid a long digression into comparing speeds of language implementations, it’s a complex topic that I’ve been doing some reading on recently, mostly in JavaScript land, where there has been much debate about which JS interpreter is the “fastest.” I want to be very careful when talking about Hadron vs. sclang, and only make assertions about things when they are evidence-based and relatively clear. What is clear from my reading is that it is possible to design code that will run extremely fast, and also code that will run extremely poorly, on any language implementation. The goal is to build an implementation that runs very well for a broad variety of program inputs.
My intuition suggests that for some use cases of sclang Hadron may be significantly faster than sclang. I believe this because of some design decisions that Hadron has made in this direction. Along with myriad smaller decisions there are three big design approaches that Hadron takes specifically for speed of compiled code:
a) Type deduction. Hadron does its best to deduce types for every variable in a block of code. So if you’re using a loop counter, for example, and it starts as an Integer and only interacts with other Integers, Hadron may be able to inline all of the binop calls down to single (or few) instruction calls. This has a widely varying impact on compiled code. Because of method calls, and SuperCollider is very much a message-driven language, much type determination has to happen at runtime. So, for example, in that aforementioned loop counter, if it is inside a function that takes a number of iterations as an argument, we can’t assume that that argument will always be an Integer, and so the type ambiguity creeps into the rest of the operations on the loop counter. There may be extensions to the SuperCollider language down the road to allow Hadron to determine types and optimize further, but these are for later. There’s also the realm of speculative (or profile-driven) optimization, but that’s for even further later.
b) Register-driven vs. stack-based. I think sclang could fairly be described as a stack-driven language, where most operations happen values stored as Slots on the program stack. Stack-driven interpreters have a variety of advantages, but operate mostly on values that, as they reside on the stack, therefore reside in memory. Hadron takes pains to keep as many values as possible in CPU registers, saving values out to memory only when sending messages or in situations of register overflow. Registers are the fastest form of storage on a computer, and so the hope is that by keeping values in registers Hadron won’t face memory bandwidth limitations as often as sclang might.
c) JIT compiling to machine code. From a certain point of view sclang is also a JIT compiler, it’s compiling input code down to virtual machine bytecode which gets run on the VM interpreter. Hadron takes this a step further by generating host machine code. In theory this should be faster from an instruction cache memory perspective - straight-line Hadron code will not branch, whereas sclang has to go through a jump table after every operand.
Now for the bad news - I haven’t done comparisons yet but Hadron is doing quite a bit more work on input code than does sclang, so intuition suggests that compilation times may be noticeably slower for Hadron than they are for sclang. This could be a real problem, particularly for the live coding use cases, where the performer may care a great deal more that their snippets are being executed quickly after sending them to the interpreter than that those snippets are being heavily optimized. I want to include some options users can set to disable some optimizations, but there’s a baseline amount of processing that’s required just to get code lowered from SuperCollider input down to machine code, so disabling optimizations might not be enough to get the compilation speeds comparable.
Also, as I stated earlier, it’s actually easy to conceive of code for which points (a), (b), and (c) are either neutral or possibly even bad for performance. For example very branch-heavy SuperCollider code with a lot of method calls is likely to reduce the impact of all three optimization approaches.
The sclang interpreter has been worked on by a lot of smart people for a very long time. A lot of optimizations have already been applied to it. Furthermore, it’s compiled C++ code that is itself built ahead of time and by an optimizing compiler (msvc, clang, or gcc) built by massive teams of compiler experts, and that can take all the time it needs to generate optimized code for the sclang runtime. It’s a tall order to try and build something that can beat that, and I have a lot of respect for things like the Second System Effect that might come in to play here as well. So I’m hopeful, and I have some ideas for design approach that may have merit. But we’re a long way from even being able to characterize Hadron runtime performance, and even further from any sort of credible claim of it being “faster.”
Sorry for the long screed! I want to ensure expectations are being set correctly here, and also to show appropriate respect to the work of the developers of sclang. They’ve built a great work of software, no doubt about it.