Does someone reading this have the “institutional memory” to say why only some symbols can be written with leading backslash? E.g. the following are parse errors: \+
or \a+
. More obscurely \1
is valid but \1a
is not.
The documentation says on two different pages (“Syntax Shortcuts” and “Symbolic Notations”) that the single quotes and backslash-led symbols are equivalent, without mentioning any caveats.
There have also been some bug reports and even (abandoned) pull requests opened on this, e.g. https://github.com/supercollider/supercollider/pull/2676
I was updating the documentation (Syntax Shortcuts) recently, but it’s not clear what to say on the matter, i.e. if the backslash is intended not to work for some stuff or say nothing because that’s a bug that should be fixed…
The relevant bits from the lexer (I think I got all of them in this snippet)
// in the big "if"
if (c == '\\')
goto symbol1;
else if (c == '\'')
goto symbol3;
// then
symbol1:
c = input();
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_')
goto symbol2;
else if (c >= '0' && c <= '9')
goto symbol4;
else {
unput(c);
yytext[yylen] = 0;
r = processsymbol(yytext);
goto leave;
}
symbol2:
c = input();
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_' || (c >= '0' && c <= '9'))
goto symbol2;
else {
unput(c);
yytext[yylen] = 0;
r = processsymbol(yytext);
goto leave;
}
symbol4:
c = input();
if (c >= '0' && c <= '9')
goto symbol4;
else {
unput(c);
yytext[yylen] = 0;
r = processsymbol(yytext);
goto leave;
}
symbol3 : {
int startline, endchar;
startline = lineno;
endchar = '\'';
/*do {
c = input();
} while (c != endchar && c != 0);*/
for (; yylen < MAXYYLEN;) {
c = input();
if (c == '\n' || c == '\r') {
post("Symbol open at end of line on line %d of %s\n", startline + errLineOffset,
printingCurrfilename.c_str());
yylen = 0;
r = 0;
goto leave;
}
if (c == '\\') {
yylen--;
c = input();
} else if (c == endchar)
break;
if (c == 0)
break;
}
if (c == 0) {
post("Open ended symbol started on line %d of %s\n", startline + errLineOffset, printingCurrfilename.c_str());
yylen = 0;
r = 0;
goto leave;
}
yytext[yylen] = 0;
yytext[yylen - 1] = 0;
r = processsymbol(yytext);
goto leave;
}
So it looks like (symbol4 branch) numbers were intended to be supported: if the first char is a digit, only digits are accepted thereafter. As for the symbol2 branch (taken on letters and underscore) only the same plus digits are accepted thereafter. So, fairly “usual” rules for identifies in many programming languages. But nothing else seem like was intended to work after backslash as a symbol besides numbers and “identifiers”. (The symbol3 branch is for stuff in single quotes).
Sooo, there’s actually a 3rd help page on this https://doc.sccode.org/Reference/Literals.html which actually has details
Symbols
A symbol can be written in two ways. One method is to enclose the contents in single quotes. Any printing character may be used within a symbol except for non-space whitespace characters (
\f, \n, \r, \t, \v
). Any single quotes within the symbol must be escaped (\'
).
‘x’
‘aiff’
‘BigSwiftyAndAssoc’
‘nowhere here’
‘somewhere there’
‘.+o*o+.’
‘\‘symbol_within_a_symbol\’’
A second way of notating symbols is by prefixing the word with a backslash. This is only legal if the symbol consists of a single word (a sequence of alphanumeric and/or underscore characters).
\x
\aiff
\Big_Swifty_And_Assoc
\not really a symbol // illegal
Thus “a sequence of alphanumeric and/or underscore characters” is basically the help-given specification.
But it does look like someone has put effort into making \1a
not work… because otherwise they could have added the digits to the initial char allowed in the symbol 2 branch (as opposed to adding the whole symbol 4 branch.)