Why predeclared variables?

VIRTUALDOG · June 12, 2020, 1:04am

How about

{
 var foo = 3, bar = 5;
 {
  arg foo = foo;
  var bar = bar;
 }
}

? which is in fact already legal SC syntax, since var foo = foo; and arg bar = bar; are legal by themselves. (:

RFluff · June 12, 2020, 7:30am

Well, it compiles, but you’d be surprised maybe how something like that runs

(
value {
	var bar = 4;
	value {
		var bar = bar + 1;
		bar.postln;
	}
}
)

ERROR: Message '+' not understood. RECEIVER: nil

The initializer expression doesn’t actually access the outer scope once bar has beed var’d. This could arguably even be considered a bug (even) in the present compiler implementation.

In theory, there’s no strong reason why the initializer expressions aren’t a bit smarter about scope (i.e. exclude the var just being declared from inner-scope lookup), but I guess nobody cared about this corner case, so the simplest implementation (just immediately create the lookup entry in the inner scope) for bar is what’s done, before the initializer expression is considered / evaluated.

As far as the proposed new semantics goes, in theory it would be ok with this kind of expression, even if it were to access the outer scope in the initializer (which presently doesn’t happen), because the = bar initializer expression comes after the re-declaration. When the inner var bar is reached, the outer bar was not accessed in the inner scope yet, so the bar name is clear/clean to be var’d in the inner scope. (arg doesn’t seem any different in that respect.)

Only something like

{
    var bar = 5;
    {
        bar = bar + 1; // ok, outer scope access
        var bar = bar; // compile error would be raised here ...
    }
}

… because the bar name is already accessed before this point in the inner scope. But the latter does not parse in SC presently anyway.

Somewhat surprisingly, the following is “borked” too in the present compiler, when it comes to running

(
value {
	var bar = 4;
	value {
		var tmp = bar + 1, bar = tmp;
		tmp.postln;
	}
}
)

It works ok without the bar = tmp part. Basically, in the present compiler the whole list of vars is created “in one pass”, then all the initializers are considered / evaluated. So tacking that bar = tmp screws up the lookup of the bar + 1 initializer, even if the latter appears to “come before” the (re)declaration of bar.

Basically that compiles as if it were written like

(
value {
	var bar = 4;
	value {
		var tmp, bar;
		tmp = bar + 1;
		bar = tmp;
		tmp.postln;
	}
}
)

I.e. there’s “hoisting lite” done on (all) the var declarations themselves before the initializers.

jamshark70 · June 12, 2020, 7:30am

(
var a = 1, b = 2;
f = { arg a = a; var b = b; a + b };
)

f.def.dumpByteCodes;
BYTECODES: (18)
  0   30		 PushTempZeroVar 'a'
  1   8F 1A 00 04 ControlOpcode 4  (8)
  5   30		 PushTempZeroVar 'a'
  6   08 00 00 StoreTempVarX 'a'
  9   F0       Drop
 10   31		 PushTempZeroVar 'b'
 11   80 01    StoreTempVar 'b'
 13   30		 PushTempZeroVar 'a'
 14   31		 PushTempZeroVar 'b'
 15   B0       TailCallReturnFromFunction
 16   E0       SendSpecialBinaryArithMsg '+'
 17   F2       BlockReturn

The principle I had suggested is that a declared variable within a given function scope should have one and only one referent. The byte code listing here shows that SC already, in fact, follows that rule. Inside the function assigned to f, there’s only one a and only one b, and these are not the a and b declared outside that function scope.

Unsurprisingly, then, both are nil:

f.value;

ERROR: Message '+' not understood.
RECEIVER:
   nil
ARGS:
   nil

So there’s no legitimate use case to support here.

hjh

VIRTUALDOG · June 12, 2020, 1:23pm

hahaha, believe it or not i actually did run this myself before i shared! it’s funny that this is allowed by the parser since the result is quite absurd.

in both a C++ lambda capture list and a Python function parameter list, initializing a variable with the same name behaves closer to how i’d expect. the token after the assignment captures the variable in the outer scope, and the token before the assignment is a new variable in the inner scope.

int x = 3;
[x = x]() { return x; }; // inner x is a copy of outer x, ret 3
[x = x + 1]() { return x; } // 4

x = 1
def f(x=x):
  return x

f() # 1
f(2) # 2

in C (and C++ and objective C) you can init a variable with itself when in block scope, not a good idea of course. (https://en.cppreference.com/w/cpp/language/scope)) the scope of that variable begins after its declarator and before its initializer. apparently in SC too!

in Python the scope of the new name begins immediately after the assignment statement. (no reference, i tested that in a python 3 interpreter) that’s what i would expect, too. computation of an assignment proceeds from right to left, so intuitively i like that the scoping also follows that ordering.

function declarations are a different beast, though. “temporally”, they participate in both the outer flow and their own potential inner flow. so to me, capturing from the outer scope for a default argument seems more natural.

not sure why this is all of a sudden not legitimate, what a scary word.

jamshark70 · June 12, 2020, 3:51pm

Maybe a better way to say it is: We don’t have to support everything.

If you need the functionality implied in the original example (where the inner scope has access to the outer scope’s foo and bar), there’s an easy way to do that: name the inner variables differently. Hence, there is no reason why it should be required to break that principle of one and only one referent.

I find, in some of these threads, a hint of a suggestion that SC as a tool is not useful if it isn’t completely transparent and permissive. In some ways, this is at odds with the history of electronic music, which is always about appropriating devices that are not intended to make music, and then accepting and working through/around the limitations implicit in using inappropriate technology for music making (and also finding happy accidents within the limitations – the classic example being the way that acid house is a result of the TB-303’s quirky filter implementation). Orson Welles: “The absence of limitations is the enemy of art.”

Or from a different perspective: Users have complained about the necessity to declare variables at the start of a block, and about inlining limitations. Users have never complained (literally never, in 18 years I’ve been involved) that they are disallowed from naming variables identically in inner and outer scopes while retaining access to both scopes. So I tend to think, Python notwithstanding, that this is a theoretically interesting but practically unnecessary capability. We don’t need it.

hjh

(PS I replied once, and the forum posted twice…? So I deleted one of them.)

VIRTUALDOG · June 12, 2020, 4:11pm

uhhh i think this is escalating unnecessarily, don’t you? i’m just spitballing some ideas, not suggesting the history of electronic music should be overturned.

jamshark70 · June 12, 2020, 4:31pm

It’s 12:30 am in China, so I won’t attempt a complete reply at this hour. But I do want to suggest that it’s possible to read what I’m saying as an alternate perspective and not as an escalation.

hjh

RFluff · June 12, 2020, 8:41pm

I think that’s what people normally expect when they write expressions like that.

“Score 5, interesting”. I actually didn’t know about the C++ bit though; pasting here the example they gave, since it’s a long page full of other stuff

unsigned char x = 32; // scope of the first 'x' begins
{
    unsigned char x = x; // scope of the second 'x' begins before the initializer (= x)
                         // this does not initialize the second 'x' with the value 32, 
                         // this initializes the second 'x' with its own,
                         // indeterminate, value
}

And sure enough (tested here):

//clang 6.0.0

#include <iostream>

int main()
{
    int x = 42;
    {
        int x = x;
        std::cout << "Hello, world: " << x;
    }
}

outputs something like:

Hello, world: 1151113408

So, the “mini-hoisting” that SC does on initializers is found outside Javascript! I suspect this is due to implementation convenience rather than semantics desirability.

I guess they had a change of heart in C++ (in that mini-hoisting regard) when they added lambda expressions. (I haven’t actually tested them in this regard.) And the “change of heart” seems to have spread to other new constructs in C++

The point of declaration for the variable or the structured bindings (since C++17) declared in the range_declaration of a range-based for statement is immediately after the range_expression:

std::vector<int> x;
 
for (auto x : x) { // OK: the second x refers the std::vector<int>
// x refers the loop variable in the body of the loop
}

Alas in C/C++ as in SC, changing the mini-hoisting behavior in constructs where it happens (vars) to a more more semantically intuitive behavior is likely to screw up some old programs… so its’ a non-starter (“won’t fix”) for backwards compatibility reasons.

But in the “interesting” department too, if you try to auto that int x = x, at least you get a compile error:

//clang 6.0.0

#include <iostream>

int main()
{
    int x = 42;
    {
        //int x = x;
        auto x = x;
        std::cout << "Hello, world:" << x;
    }
}

error: variable 'x' declared with deduced type 'auto' cannot appear in its own initializer
        auto x = x;
                 ^
1 error generated.

RFluff · June 12, 2020, 9:10pm

On the other hand, I’ve also tried the equivalent of this SC code

(
value {
	var bar = 42;
	value {
		var tmp = bar + 1, bar = tmp;
		tmp.postln;
	}
}
)

ERROR: Message '+' not understood. RECEIVER: nil

in C++

//clang 6.0.0

#include <iostream>

int main()
{
    int bar = 42;
    {
        int tmp = bar + 1, bar = tmp;
        std::cout << "Hello, world: " << tmp;
    }
    std::cout << "\nHello, universe: " << bar;
}

And that one actually works as expected:

Hello, world: 43
Hello, universe: 42

(And type C++ inference is also happy with that, i.e. the inner declaration can be auto’d instead of int, I’ve tried it.)

While in SC the mini-hoisting applies to all vars as a block, in C++ it’s only per individual initializer statement, i.e. in the expression tmp = bar + 1, bar is taken from the outer scope in C++ but from the inner scope in SC (even if bar is only declared in the the inner scope after that use.)

So, on that angle, SC is a bit “less legitimate” in this regard, even compared to C++ plain old “vars” because there’s relatively easy workaround in C++ (shown above) but which doesn’t work in SC.

On the other hand…

Yeah, I agree with the sentiment on this, it’s a rather obscure corner case which is a bit tricky to write even in C++ (which also has some, albeit fewer limitations in this regard than SC). The behavior of such expressions can a bit more obscure in SC due to the whole-vars-list mini-hoisting (ahead of var-list initializers) that happens already, but (IMHO) it’s not worth putting effort into addressing it in particular, unless it interferes with some other, more worthy goal.

VIRTUALDOG · June 12, 2020, 9:42pm

James,

the way i felt talked down to by your last post leaves me with little interest in trying to understand anything else you had to say. i’m talking specifically about phrases like “We don’t need it”, “in the N years i’ve been involved”, “users have never complained X”, and “this [my post] is at odds with the history of electronic music, which is always about X”. i understand that you have a lot of experience you want to share, but maybe you could be a bit more humble about sharing it? after all, none of us knows exactly what other people do or don’t need out of this tool, or what every user of sclang throughout time has complained of, or the entire history of electronic music and its essence (if such a thing exists).

this kind of phrasing is liable to make people like myself who know a bit less feel like we ought not to participate in discussions unless we are as expert as you.

VIRTUALDOG · June 12, 2020, 9:57pm

my guess is that it was inherited from C (which, up until C89 i think, required variable declarations at the top of a block). i read somewhere once that one of the main goals of C was to make things easy for compiler writers. i’d have to say i agree. the newer standards of C++ have in my experience tried to be a little more careful about making things less surprising and painful for users. these scoping rules are a good example!

Alas in C/C++ as in SC, changing the mini-hoisting behavior in constructs where it happens ( var s) to a more more semantically intuitive behavior is likely to screw up some old programs… so its’ a non-starter (“won’t fix”) for backwards compatibility reasons.

SC isn’t governed by an international standards body, luckily (: i wouldn’t call it a non-starter, but it would probably have to be thought through carefully.

OTOH, scoping rules are probably something you want to get right the ~~first~~ second time, if you’re going to modify them at all!

RFluff · June 12, 2020, 10:06pm

Yeah “non-starer” was probably a bit too strong in my statement. SC indeed does a lot more “breaking changes” when sufficiently justified compared to standards orgs.

Per my post immediately after that one, the mini-hoisting in SC is actually a bit more confusing than the mere “bad” behavior on var x = x;, because it also affects expressions like var y = x, x;, in summary, because it’s treated like var y, x; y = x;. I don’t know how much practical impact this has on the user base presently though…

But to get back on the main topic here, if SC were to support variables declared anywhere, that mini-hoisting could potentially get substantially more confusing. Right now, you have to have all the var declarations “in one place”, i.e. in a fairly narrow region of code. If you could actually write

var y = x;

// lots more code here

var x;

And the latter var x changes the meaning of x (i.e. the frame it’s look up in) in the internalizer in the first var… I can see the potential for more substantial confusion among users… This is basically / exactly the JavaScript var-hoisting issue, if I’m not mistaken.

Sooo, it turns out (my) proposed semantics that a var would error if its “target” name has been accessed already, which would prohibit the above program (with a compiler error on 2nd) would actually break some presently valid SC program too, namely just

var y = x;
var x;

or even just

var y = x, x;

Therefore, a “breaking change” (making some old programs compile-error) might actually be need to allow variables declared anywhere (with non-confusing, i.e. non-hoisting semantics). The good news is that such errors would all be at compile time, not at runtime.

VIRTUALDOG · June 12, 2020, 11:31pm

i’m not sure if this is entirely obvious, but the reason the names a, b, x, etc. can be used interactively is because they are instance variables of Interpreter. by the same token (pun intended) you can also reference other instance variables of Interpreter interactively, such as cmdLine, preProcessor, and codeDump. i.e. in an interpreted context you can write

var x = 3;
postln(codeDump);
postln(cmdLine.scramble);

the example above,

var y = x;
var x;

is getting at a more general problem, which is how changes to scoping and variable declaration rules would interact with the scopes of instance and class variables.

RFluff · June 13, 2020, 12:37am

I just used x and y so I didn’t have to write a longer equivalent example that overrides some variables that don’t exist in the Interpreter’s scope, such as

(
value {
	var xx = 42;
	value {
		var yy = xx.postln, xx = 5;
	}
}
)

There’s actually still something I don’t understand with the last one. Why is xx is posted as 5 and not nil… It seems some initializers get run before others… and not necessarily in the order they were given! It looks like some constant-initializers get put into the function prologue, because they don’t show up in the byte-code for the inner function:

BYTECODES: (7)
  0   31		 PushTempZeroVar 'xx'
  1   C1 3A    SendSpecialMsg 'postln'
  3   80 00    StoreTempVar 'yy'
  5   6E       PushSpecialValue nil
  6   F2       BlockReturn

In contrast

(
value {
	var xx = 42;
	value {
		var yy = xx.postln, xx = 2 + 3;
	}
}
)

prints nil, because the initializer for xx is not a constant now; the bytecode generated for this 2+3 initializer is visible on a dump, and comes after the postln call.

So, the actual rule/algorithm presently implemented by the SC mini-hoisting seems to be:

Pull all vars into a local scope table; this is use to resolve all name accesses, in preference to the outer scopes.
Initialize all vars that have constant initializers via the function prologue.
Generate byte code for the rest of the initializers in the function body, in the order in which these non-constant-expression initializers appear in the var statements.

(1 & 2 are probably a single step/pass.)

Since name lookups in initializers (of the form x = y) are treated as non-constant initializers , they are done in step 3, so they can access constant-initialized vars from step 2 (seemingly) “out of order”.

And sure enough

value { |yy = xx, xx = 5| yy.postln }

is a parse error, has to be written as

value { |yy = (xx), xx = 5| yy.postln }

jamshark70 · June 13, 2020, 2:11am

I take your point here. That certainly wasn’t my intention. I apologize for leaving that impression and I’ll try to do better about it. (In some sc-dev threads, it was suggested that a Skype call could do more to resolve conflicts than email-style discussions. Is it maybe time for you and me to try that? I have some concerns on my side as well.)

I appreciate your bringing it up. It’s hard in this medium to know how one’s words are being received.

Returning to the topic: Thinking about it further, probably the main reason why I’m skeptical of changing the scoping rules away from the current “one and only one referent” is that I’m often in the position of explaining things to relatively new users. It’s easier to explain clear and simple principles.

f = {
	var xyz;
	var g = {
		var xyz = xyz;
		...
	};
	...
};

To say “in g, xyz refers only and always to the xyz declared within g’s braces” is a clear and simple rule.

To say “in g, var xyz refers to g’s scope, the = xyz initializer refers to f’s scope, and subsequent appearances of xyz refer to g again” is a complex rule that’s likely to lead to confusion.

So then my question becomes – what do we gain from the complexity and potential for confusion? Is there anything that absolutely cannot be written in any other way? I can’t think of anything. var xyz = xyz strikes me as needless obfuscation. var xyz2 = xyz or var gXyz = xyz is immediately unambiguous and easier to understand. When I said “We don’t need it,” what I meant is that it would allow users to write code that is more confusing to read (which is of questionable value), and not allow algorithms to be expressed that can’t already be expressed (which would be a genuine need).

I do understand what Python and C++ are doing in that case, but I don’t see the benefit and I do see developer cost, and new opportunities for user mistakes.

hjh

jamshark70 · June 13, 2020, 2:13am

f = { var yy = xx.postln, xx = 5; xx };

f.def.prototypeFrame;

-> [ nil, 5 ]

(The prototypeFrame is why SynthDef argument defaults must be literals.)

hjh

RFluff · June 13, 2020, 11:33am

The problem I see with “letting this be” and allowing vars to be declared anywhere is that it forces the JavaScript solution: auto-hoisting from arbitrarily far var declarations, as long as they are in the same scope.

Now you can only write

(
value {
	var xx = 42;
	value {
		var yy = xx.postln;
		// right now only more vars allowed here, but with "vars-anywhere",
		// there could be any number of statements here, of any kind
		// including more that use xx;
		// And the next var decl, no matter how far away in lines of code, but
		// still in the same scope would (still) change the scope of the
		// xx in the postln (above), as well as in any other code here using xx
		var xx = 5; 
	}
}
)

Basically, if we allow vars anywhere, one can have an arbitrary number of (non-var) statements between those two, all of which would be affected by the later var statement, i.e. this is hoisting.

I have to say that even the present situation is not entirely without gotchas. I have a SynthDef with 8 var lines… luckily synths don’t use nested scopes. But there’s still the issue of the “two-pass” initializers being a gotcha in itself to some extent, i.e if 5 is replaced with 2+3 above, the program prints something else.

semiquaver · June 13, 2020, 2:41pm

please - any heroes reading the thread - just the minimal hoisting declarations to the top of the same function definition would be a great friction-reducing change!

jamshark70 · June 13, 2020, 3:07pm

OK, let me try again.

(
value {
	var xx = 42;

At this point, the compiler knows there is a function, and a variable identifier xx within that function.

	value {
		var yy = xx.postln;

And now, the compiler knows there is a second function within the first function’s scope, and this inner function is using xx which was declared in the outer scope.

		... stuff...
		var xx = 5;

Now, the second function declares a second referent for the identifier xx.

What I am proposing is that this should be illegal. At this point, it would be legitimate – I would say even desirable – for the compiler to throw a parse error and refuse to evaluate the code.

If you rephrase “this inner function is using xx” as “xx now exists within the inner function’s scope” (it wasn’t declared in the inner scope, but it exists in this scope by way of being imported from the outer scope), then at var xx, the compiler could invoke the same logic that is already used for the error message “ERROR: Function variable ‘xx’ already declared in Interpreter:functionCompileContext” (maybe tweak the wording to “already exists in…”).

	}
}
)

Basically, if we allow vars anywhere, one can have an arbitrary number of (non-var) statements between those two, all of which would be affected by the later var statement…

Well, technically, yes, the preceding statements would be “affected” by the later declaration – but in my proposal, they would be invalidated and not executed. That is indeed “affected,” but without dangerous side effects or ambiguity.

Well, what if you need a local xx in the inner scope and access to the outer xx? Easy.

(
value {
	var xx = 42;
	value {
		var yy = xx.postln;
		var xxx = 5; 
	}
}
)

And what I am proposing would also not invalidate a common use case:

(
var i = 0;
while { i < 10 } {
	20.do { |i|
		... do stuff with inner 'i'...
	};
	i = i + 1;  // outer 'i'
}
)

… because the immediate declaration of arg i in the do loop establishes i’s referent, uniquely. It blocks access to the outer ‘i’ within the do loop, but as noted in the preceding example, a different identifier would allow access to both. (And, to be clear, f = { arg i = i; ... } should also be illegal.)

I will also express the opinion that var yy = xx; var xx is not an ideal programming practice. It’s deliberately (perversely?) unclear, without any necessity to be unclear (and the un-clarity is easily removed by changing an identifier). That JavaScript allows it doesn’t make it good. So I am still of the opinion that allowing var to follow regular statements doesn’t create any obligation to support this type of code example.

hjh

RFluff · June 13, 2020, 10:35pm

jamshark70:

value {
		var yy = xx.postln;
this inner function is using xx which was declared in the outer scope.
... stuff...
		var xx = 5;
What I am proposing is that this should be illegal.

Yes, I actually agree(d) that that would be a good approach. I was merely saying that it turned out that some programs written like that are presently valid in SC. (They’re presently valid as long they don’t have non-var “stuff” in between those lines.)

So, making the var-ing of xx illegal (after use in the same scope, in that example in the assignment to yy [and its postln]) would be a “breaking change”, but not a terrible one, as the presently valid programs rendered illegal by this approach would be flagged at compile time (as opposed to e.g. still running, but with some different semantics, which would be a much nastier kind of breaking change.)

Also, another point I was trying to get across, is that some of these programs that would be rendered illegal by this proposal (already) have somewhat dubious runtime semantics, like if you changed 5 to 2+3 in the above, the xx.postln would print nil instead of 5. In other words, the proposed change would render illegal programs that have rather non-intuitive semantics in the present SC implementation.

This cannot be a (Bison) parser error per se, but it can be compile-time error. (It’s more like a C++ deals with a * b;, which depends whether a is a previously defined type or not. If it is, that’s a variable declaration, otherwise if it’s a variable bound to a that’s multiplication, and if there’s nothing bound then it’s an error. I.e. the result there depends on what’s bound to the symbol a; there’s actually a name for this–the “lexer hack”.) But for this var-after-use SC issue, one of the alternatives (xx already accessed in the inner scope) leads to compile-time error, rather than a valid program, and the statement is otherwise unambiguous. The SC compiler would have to track accessed names (in a the scope), not merely declared ones though.

Speaking of which… this tracking needs to extend to (arbitrarily nested) inner-inner scopes, e.g.

{
    var xx;
    {
        { xx = 3 }; // or even { { { xx = 3  } } } 
        var xx; // compiler would need to abort with error here
        // even though outer xx only used in some further nested sub-scopes
    }
}

It’s actually not strictly necessary to make such program illegal with above proposal… and in fact it’s a bit of extra work to add that check because:

The compiler “agreed” that i is a valid new local var name at the point where it parsed just arg i. There was no “use” of i in the scope prior to this point with the present deferred implementation of expression initializers (what I labelled “step 3” in a prior post.)
So, the expression (being) assigned to i would need to flagged to be checked for non-occurrences of i. There are presently no such forbidden-names-in-expressions checks in the compiler.

So, detecting/forbidding arg i = i (and/or var i = i) is a somewhat orthogonal issue to allowing “vars anywhere”.