Hadron Update October 2023

While active with Hadron development, I will publish monthly (ish) updates on the Hadron website, cross-posted to the scsynth forum. The hope is to drive engagement and volunteerism.

We’ve Moved!

After hearing the news of their involvement with ICE, I looked into leaving GitHub. However, this recent rebranding of GitHub to center around their Large Language Model product, Copilot, provided a second big push to go.

I’m moving all Hadron-related projects to a self-hosted GitLab instance I set up at Solitary Bees. Please ping me if you’d like me to send you an account invite. We’re still setting up shop regarding continuous integration and other DevOps infrastructure, but I plan to keep Hadron and my other personal projects there for the foreseeable future.

Fuzzing

Fuzzing is a vital part of modern testing strategies, and I’m pleased to report we’ve added fuzzing support to this new Rust-based iteration of Hadron. Please read the instructions at docs/fuzzing.md in the Hadron repository, try them out, and file any bug reports if you find them.

Parser Updates

A JIT compiler like Hadron needs to parse sclang quickly to allow for rapid feedback to users on input code snippets. Modern language frontends see a lot of invalid and incomplete code due to the rise of fast-feedback development tooling like LSP, popularized by VSCode.

So, not only do parsers need to be fast, but they need to give actionable feedback to users while they are typing. We’d like to “raise the bar” for user feedback with Hadron, providing great error messages with lots of context and helpful hints about what might be going wrong.

I’m writing two parsers, one that parses the input source code in detail and one that only builds an “outline” of the parsed code and is much more robust in the face of errors. If parsing fails using the detailed parser, it can hand off to the outline parser for error recovery and to make better suggestions around the input code. Furthermore, the outline parser will allow for lazy compilation of the class library.

I copied another speed-centric design choice from the Carbon Language Project by representing the parse tree in a linear array, holding the parse nodes in postorder traversal order. This structure keeps the parser inner loops free of memory allocations. This linear structure also makes an iterative parser implementation easier, saving additional computation and memory required by a recursive parser.

Help Wanted

  • Rust: #1 I’ve broken the detailed parser work into a bunch of handle_ functions in lang/src/toolchain/parser/tree. Please let me know if you’d like to tackle a particular part!
  • DevOps: #2 Are you interested in learning how to build a continuous integration pipeline on Google Cloud?
  • Docs: #3 The parser has been moving around quite a bit but should now be settled down enough that we could start to pay down some of the RustDoc debt.
  • Rust: To get early feedback on the parser, particularly the error messages, I’ll be spinning up a Rust-based LSP implementation targeting WASM.
  • TypeScript, VSCode: Given a rust-based WASM LSP crate, add the trimmings for a full-featured sclang VSCode extension.
  • Anyone: Suggestions and feedback. Is there something on your mind about sclang and Hadron? Let me know!
6 Likes

Very nice idea with the update posts, keep them coming! :slight_smile:

I was not aware that you switched to Rust - I’m still very new to Rust and trying to get my first project up and running, but I’ll be skimming through the code occasionally, but it seems to have a good sweet spot between maintainability, abstraction and speed. I hope some day I can contribute some things.

I also recently took a look at implementing a sclang lexer in Rust and came across this nice looking library called GitHub - zesterer/chumsky: Write expressive, high-performance parsers with ease. which seems like a very nice approach at lexer/ast construction - but it seems you also got one working now, very nice, congratulations! :slight_smile:

Concerning the CI - why do you want to use the Google Cloud for this? I always appreciated the Gitlab CI (although the Github Actions are more convenient nowadays) and once you have a runner up and running it is also pretty nice - at least better than Jenkins :smiley: If you want to stay closer to GitHub but want to avoid GitHub I think Gitea is also worth taking a look - it has something similar to GitHub actions, see Compared to GitHub Actions | Gitea Documentation - but most likely you still need an environment for your runner in this case.

I looked at a number of parsing libraries in Rust, including chumsky, but ultimately decided to hand-code one out of a combination of reasons:

a) Most wouldn’t, or wouldn’t easily, support building a linear/postorder parse tree output, relying instead on the traditional dynamically-allocated tree data structures, and
b) I didn’t like their API for creating detailed error messages, and
c) Many sacrifice speed for usability, whereas I feel like a handwritten parser you can have both

I am using GitLab CI. The GitLab instance at solitarybees.us is a VM running on Google Cloud. The runner is a separate instance VM. I just have some sysadmin work left to do to get everything set up, or a volunteer could step in and save me the trouble.

This is actually my second attempt at leaving GitHub, and the first attempt was using Gitea. I like Gitlab better. But yes, either Gitea or Gitlab will require the same amount of work in rebuilding my CI system in this new context. I can do it, but this second attempt on Hadron I’m trying to be more communicative about what I’m doing and more open in asking for help, in the hopes that I can build a sustainable community around the project this time.

3 Likes

This is a beautiful project and very necessary. Thank you for pursuing it. I’d love to help out if I can once things settle down around here.

Sam

2 Likes