While the new opcode abstractions are a step forward for compiler code clarity, they underscore the absence of a shared specification. Without one, the compiler and interpreter operate on implicit agreements that are prone to strange errors.
It’s almost like one skipped this step in the process.
A bytecode specification plays a crucial role in any virtual machine-based language. It is the authoritative contract between the compiler, which emits bytecode, and the interpreter, which runs it.
A well-defined, machine-verifiable specification makes this contract explicit rather than implied. It outlines:
- The list of opcodes
- Their binary representation
- Operand types
- Instruction lengths and invariants
This isn’t just documentation — when shared between the compiler and VM, the spec can automatically validate correctness and generate boilerplate safely, avoiding many classes of errors that otherwise show up only at runtime.
The interpreter still consumes raw bytes directly (like before!). If the NEW compiler-side opcode abstractions diverge — in ordering, layout, or semantics — nothing is enforcing that the interpreter remains in sync.
What Can Go Wrong Without a Spec?
A LOT
These problems are subtle. Code may compile but misbehave in edge cases, especially with less common opcodes or combinations.
The goal: a single definition, verified across both layers. In other words, a formal machine-verifiable specification that itself needs to be tested as well. (Yes, it is not rare that specifications themselves have mistakes)