Why predeclared variables?

Does anyone know why sclang requires you to declare variables at the top of functions instead of inline? I’d be interested to know if there’s something about the interpreter or whatever that means its necessary? I’m not an expert on programming languages or interpreters, so I’d have a tough time reading the source code, but I’m just curious about this as it’s not necessary in C any more.

I think the technical term is “hoisted” - javascript automatically hoists var declarations, meaning it treats them as if they were placed at the beginning of a function. This makes for some weird and awful things - you can “var foo” declare a variable after you do foo = 10, for example :face_vomiting:.

SuperCollider does not do this - I don’t know of a specific reason. I’ve tried to add this a couple times before, but it seems extra tricky due to some particulars of the language.

If anyone wants to try to be a hero and tackle this feature - or just wants to learn a little bit about how sclang is parsed, or how parsers work in general… The sclang parser lives in the lang11d file here. The make_parser.sh script uses the lang11d to generate actual C++ code for the parser, which will turn source code into AST nodes that are then compiled. Adding “hoisted var declarations” support would most likely be a parser change, plus possibly a little C++ to keep track of variable declarations collected during the course of a function. If you change the parser and run make_parser.sh it will generate the corresponding .h and .cpp files, or notify you if for some reason you made a change that isn’t logically consistent (this is usually what happens). Bison parser syntax is pretty well documented, and it’s not so hard to play around with the lang11d to make minor modifications and see what you get.

as a real-time interpreted language, declaring a var requires memory, and memory allocation (and setting aside of space) is costly. Bunching it all together means you can figure out how much memory it will take all at once.
Also - this was put in place in the late '90s, early 2000s when this was a pretty common practice. It may be possible to change this now, but it would likely be a lot of work.

Heh, the tantalizing thing is, it’s NOT a lot of work - I would guess it’s on the order of a ~10-20 line change to the parser and some PyrParseNode code. And, it wouldn’t even have a big testing impact, since it only affects one narrow part of the parser and would have no effect on functionality once something is compiled. It’s just that - the change required takes a deep-ish understanding of the parser and the language, which few of us have. It’s one of those “two weeks to research, ten minutes to make the change” sort of things.

2 Likes

Hmm, mighty interesting. As I say, I have only a passing knowledge of how interpreters work (e.g. made a toy lisp once, but not much further) but this is quite an intriguing challenge. Might try to understand it a bit if the urge strikes. Thanks for your post.

Wow yeah a brief read looks like nothing I’ve seen before. Seems I’d have to learn quite a lot about Bison to understand this. Oh well, could be a fun weekend project to learn about that.

if you’d like to familiarize yourself with this part of sclang but want to try something considerably easier, this issue is perfect:

incidentally it’s the oldest open issue in the SC github repository. this is okay:

Foo { }
+ SinOsc { }

this is invalid syntax:

+ SinOsc { }
Foo { }

and i doubt there’s any good reason for that other than laziness when the parser was being written.

if you or anyone else reading wants to take this on, it would be super appreciated!!

1 Like

But maybe the worst thing about parsing in SuperCollider is math, no ???

1 - 1/2         ->  0         //// hm ?
1/2 - 1/2      ->  -0.25  ////  ....

I mean, SuperCollider is essentially about doing this kind of thing. I tend to love SC syntax, and hoisting is far behind the math problem, IMO.

Also, good conditional management could be a good improvement, in order to do this very noble

if (10 < x < 20) 

without parenthesis ! And I suspect it would be the same problem as math priorities.

Would it be very hard to change that ?

yes. it would be very hard to change that.

[quote=“simdax, post:8, topic:243”]

Given the number of custom operators allowed as methods, a non-objectionable implementation would probably requite the ability to declare operator precedence in general in user code. Which would be fairly challenging and might require a more “expensive” (e.g. GLR) parser. I think not even ocaml allows this level of flexibility. E.g. they have this rule for precedence:

The precedences and associativities of infix symbols in expressions are determined by their first character(s): symbols beginning with ** have highest precedence (exponentiation), followed by symbols beginning with *, / or % (multiplication), then + and - (addition), then @ and ^ (concatenation), then all others symbols (comparisons).

So something like that might be more practical. But then you’d have quite a few things to consider e.g. is <>, which is used for composition in SC, going to have the same precedence e.g. decided by first char as <=?

I think Haskell is the only language I know of that allows you to define the precedence and associativity of operators; the precedence is on a 0-10 scale (only 0-9 can assigned to normal operators; 10 is reserved for “space” which is function application). But Haskell is pretty far from imperative languages in quite a number of ways… including the toolset used for writing the compiler itself. Implementation-wise the GHC seems to have special “renamer” step that (among other things) rearranges the AST to the declared operator precedence and associativity after an initial default parse.

The uniform precedence and associativity of binary operators that SC uses is apparently of Smalltalk tradition. (This research page interestingly argues that a “total order” precedence of operators–like e.g. Haskell has–is still deficient in some use cases, and that even a partial order has problems, and so argues that an intransitive relation “order” is preferable! Also interesting perhaps, there’s some psychological basis for using an intransitive relation in such preference/precedence contexts.)

Making that behavior more sensible would mean automatically creating new [function-like] scopes, so that

x = 1; 
var x = 2;

gives the proper/decent error, i.e. it’s treated like

x = 1;
{ var x = 2; //....

But I understand that something like that (when written explicitly) disables inlining for the inner function/“block” in the present compiler.

To be able to have the cake and eat it in that sense, would requite some reaching-definitions analysis in the compiler. And I think the SC compiler does nothing of the sort.

Note that unlike in JS, in “modern” C and (even ancient) C++

Modern C compilers such as gcc and clang support the C99 and C11 standards, which allow you to declare a variable anywhere a statement could go. The variable’s scope starts from the point of the declaration to the end of the block (next closing brace).

That’s the semantics we’d want basically, but getting them (and having inlining still work) is less trivial than the simple JS-style “hoisted” approach.

As much as the edge cases of hoisting are strange, changing scope details part-way through a function is afaict extremely inadvisable in a language with closures like JS and sclang. It can lead to some very ambiguous and confusing behavior - in your case, your first x would potentially capture any outer Frame that also has an x defined, for example.

Beyond usability, I think it’s unlikely that this could practically be implemented in sclang: managing scope frames is probably the most computationally expensive part of the entire language apart from GC. An order-of-magnitude expansion of the complexity of scopes in sclang that was still performant would likely require a massive rewrite - and the only gain would be allowing users to produce new and interesting kinds of code smells related to variable definitions and scope idiosyncrasies :slight_smile:

This doesn’t, of course, preclude providing rules re. “declare before you use”, but those are not, I think, required by the syntax, and would effectively be either a “best practices” error or a way to skip writing some extra-tricky compiler code (potentially).

In order to reduce the number of frames created at runtime (which is one of the benefits on function-inlining done in SC) one would need reaching definitions analysis. Presently

{ if (true) { var xx; } { } }.def.dumpByteCodes

cannot be inlined, as I understand it, because of the inner var xx, even if it there is nothing in the outer scope(s) that is a name conflict for xx. (Lack of reaching-defs analysis prevents the compiler from figuring this out.)

BYTECODES: (9)
  0   6C       PushSpecialValue true
  1   04 00    PushLiteralX instance of FunctionDef - closed
  3   04 01    PushLiteralX instance of FunctionDef - closed
  5   B0       TailCallReturnFromFunction
  6   C3 0B    SendSpecialMsg 'if'
  8   F2       BlockReturn
-> < closed FunctionDef >

vs

{ var xx; if (true) { } { } }.def.dumpByteCodes

which is just

BYTECODES: (4)
  0   6C       PushSpecialValue true
  1   F0       Drop
  2   6E       PushSpecialValue nil
  3   F2       BlockReturn
-> < closed FunctionDef >

I think this is obvious, but in SC vars are only lexically scoped (unlike Environments which are dynamically scoped) making compile-time analysis for var closure/scoping more feasible (unlike for Environment scoping.)

x = 2
f = { x = x + 1 }
g = { var x = 22; f.() }
g.() // -> 3
x // -> 3 

vs

~x = 2
f = { ~x = ~x + 1 }
g = { (x: 22).use { f.() } }
g.() // -> 23
~x // -> 2

And yeah, one can “disable” dynamic scoping for Envirs

~x = 2
f = { ~x = ~x + 1 }.inEnvir
g = { (x: 22).use { f.() } }
g.() // -> 3
~x // -> 3

but that really returns a new function with the envir pre-bound lexically via an arg (which is really just a caller-initialized var).

	// attach the function to a specific environment
	inEnvir { |envir|
		envir ?? { envir = currentEnvironment };
		^{ |... args| envir.use({ this.valueArray(args) }) }
	}

But to come back to Javascript, I see they introduced a new let keyword in ES6 to work around their hoisting issues; reproducing some examples from https://davidwalsh.name/for-and-against-let

function foo() {
    a = 1;                  // careful, `a` has been hoisted!

    if (a) {
        var a;              // hoisted to function scope!
        let b = a + 2;      // `b` block-scoped to `if` block!

        console.log( b );   // 3
    }

    console.log( a );       // 1
    console.log( b );       // ReferenceError: `b` is not defined
}

If you accidentally try to use a block-scoped variable in the block earlier than where its declaration exists, you’ll get an error:

if (a) {
    b = a + 2;      // ReferenceError: `b` is not defined

    // more code

    let b = ..

    // more code
}

And even

for (let i=1; i<=5; i++) {
    setTimeout(function(){
        console.log("i:",i);
    },i*1000);
}

It’ll print out i: 1, i: 2, i: 3, etc. Why?

Because the ES6 specification actually says that let i in a for loop header scopes i not only to the for loop, but to each iteration of the for loop. In other words, it makes it behave like this:

{ let k;
    for (k=1; k<=5; k++) {
        let i = k; // <-- new `i` for each iteration!
        setTimeout(function(){
            console.log("i:",i);
        },i*1000);
    }
}

That’s super cool – it solves a very common problem developers have with closures and loops!

All languages want to become C++, or thereabout (C++ with ocaml keywords :grin:).

And the later issue Scott was talking about, now present in JS with let,

… even has a semi-official (scifi) name, at least on MDN: the “temporal dead zone”.

Rather than examine every possible permutation, it might make more sense to make some decisions about what is feasible to support in future SC and what will not be feasible – narrow the scope to something that is doable.

IMO:

  • OK: Change the parser to hoist variable declarations later in a function to the top of the same function definition.

  • Maybe OK: Reaching-definition analysis for inline-able blocks (i.e., if a var xx is in an if branch that is otherwise inline-able, you might hoist it to the immediately enclosing scope – but be careful about looping structures – you can’t pop var xx = 0 outside of the loop body in while { ... } { var xx = 0; ... } without rendering the assignment into the bytecodes).

  • Not OK: Let’s please not allow variables to be used before they are defined. (JavaScript strikes me here as a cautionary tale rather than a model to emulate.)

hjh

2 Likes

I think (but I’m not entirely sure) that your last point implies either some form of reaching def analysis too (if we want it done at compile time) or auto-generating (“invisible”) blocks scopes, the inlining of which would also require reaching def analysis.

I strongly suspect one of the main reasons why JS let their vars be used before being declared is that that’s an incredibly convenient solution in terms of implementation, in conjunction with hoisting declarations.

Perhaps someone knows how the “temporal dead zone” (TDZ) for let is implemented in ES6-compliant JS back-ends. Your last point is basically asking for that feature. My impression is that they simply throw a runtime error rather than detect that at compile time, like in C99. So let seems to still do hoisting, but initializes the variable with a “bomb” value that can only be defused by the place where the let was placed. That’s probably not complicated to implement, but the error checking for use before the let-declaration is only done at runtime, it seems.

No, absolutely not. I had thought I was suggesting to make the “temporal dead zone” simply illegal, i.e.

{
	xyz = 0;
	...
	var xyz = 3;
	...
}

would be a parse error.

I am not opposed to hoisting:

{
	var abc = 2;
	... blah blah...
	var def = abc + 2;
	...
}

to be compiled the same as this, today:

{
	var abc = 2;
	var def;
	... blah blah...
	def = abc + 2;
	...
}

I might even be OK with lifting variables out of inline blocks:

{ |a|
	if(a.odd) {
		var temp = (a - 1) div: 2;
		...
	}
}

–>

{ |a|
	var temp;
	if(a.odd) {
		temp = (a - 1) div: 2;
		...
	}
}

But my last point was intended to say that all local variables must be introduced at first use by var, period, no exceptions (specifically to avoid the kinds of JS problems that you mentioned).

The purpose of SC is not to innovate language design. The purpose of SC is to facilitate working with audio in code. I see no benefit, and some risk, in over-complicating concepts of variable scope.

hjh

This is actually not as simple as it sounds. As @scztt mentioned before, what about

{
	var xyz;
	{
		xyz = 3; // sets outer xyz or error?
		var xyz;
		xyz = 4;
	}
}

That should be a syntax error too.

If a user thinks they need to have an inner-scope local variable with the same name as an outer-scope local variable and have access to both of them in the inner scope, then the user has made a conceptual error. (The principle being: a variable name within a given scope should always refer to the same variable. Similarly, it would be illegal to declare a var i but use the interpreter variable i earlier than that – because, in the same way, the variable identifier i would have two meanings in the same closure. Similarly, for hoisting out of inline blocks, if the outer scope has xyz and the if block declares another xyz, then that would override inlining and the if block would have to be a distinct function object.)

And it’s still possible to write the algorithm: change the name of the inner-scope variable.

Allowing var later within a function scope creates no obligation to support the case quoted there.

hjh

Off top of my head, it doesn’t seem hard to implement the semantics you want, if we accept that the compiler would complain upon encountering the 2nd var statement in that example, rather than on the first assignment–because barfing on the assignment would require (full-function) look-ahead for vars. Basically, the compiler would have to keep track of what names have been accessed (read or written) in every scope and barf if any of those names it accumulated on that “accessed already” list get var'd within the scope.

Basically, in that example, when reaching the xyz = 3 line it would resolve the name (as a pointer to the outer scope’s stack) and mark it on the list of names accessed in the inner scope. Then, when it gets to the 2nd var xyz, in the inner scope, it would check it against the already accessed names for the inner scope, and error because the var’d name is already accessed in that inner scope.

N.B. deciding whether a function is “open” or “closed” is probably done in a somewhat similar manner, but for that there’s probably no list being kept for names accessed, because encountering any names accessed that don’t match the args or vars of the scope can simply set a boolean flag that the function is open.

(I said “list” generically above, it would have to be something more efficient, a dictionary, a trie etc.)

With this approach, there’s probably the somewhat more difficult issue of having to delay emitting the function preamble opcodes (which probably allocate the stack frame) until the function is fully parsed. If it’s only a stack size being set, it should be relatively easy to do a “fixup” of previously emitted opcodes. But I’m not really familiar with what that preamble emitting consists of presently in SC… nor do I know if the compiler is of the “stream output” variety in which case it might not be able to easily walk back to do such fixups, even if they require no shifting of instructions, but just changing some number in the stack allocation opcode.

The compiler might already have such logic – the following is already a parse error (“variable already declared within the same scope” or similar verbiage, I forget the exact wording):

{
    arg xyz;
    var xyz;
    ...
}

There’s a good chance the same logic could be extended.

hjh