What are unboxed classes (types) in SC?

It turns out the SC compiler does a two-stage compilation (parsing) of the class tree.

In the first stage, which is driven by a C parser (that’s somewhat mistakenly co-located in the lexer file–this stage is entered in parseOneClass) it builds tree of classes and a linked list of class extensions from the actual sc source files, which are minimally parsed in this stage only to match closing braces and extract what class “depends on” (i.e. extends) what other. There is no bytecode emitted in this stage and the Bison parser (yyparse) is never called in this stage, as far I can tell…

In the second stage of compilation, the tree previously built is fully compiled first by passing every “virtual compilation unit” (extracted in the first stage–every unit basically consists of a sc file and range of line numbers where the class is located) to yyparse. After all the classes are compile all the class extensions are compiled. The sequence is basically driven by the calls traverseFullDepTree - compileDepTree - compileClass and after that’s done - compileClassExtensions (Alas these functions are also located in the “lexer” file.)

So, to actually get my question… Normally such two-stage compilation is needed/done in a couple of scenarios:

  • when you have unboxed objects (like ints in Java) that can be placed on the stack, like they can be in C++. E.g. in C++ you can put a struct of class on the stack; you can’t do that in Java, by the way, only the “primitive types” get to go on the stack in Java (and these don’t have method in Java, only their boxed counterparts are.)

  • when you want to devirtualize method calls (for objects whose types are fully know to the compiler at library compile time). (The so-called “closed-world assumption, in which no dynamic class loading is allowed”, which is used by SC, makes devirtualization somewhat simpler to implement.)

As far as I can tell, SC doesn’t actually do this latter optimization, i.e. call devirtualization. (Correct me if I’m wrong, maybe SendSpecialBinaryArithMsg below actually does that.)

But surely there unboxed types found in SC’s bytecode instructions, i.e. that can be pushed on the stack directly

{2 + 3}.def.dumpByteCodes

BYTECODES: (6)
  0   65       PushSpecialValue 2
  1   2C 03    PushInt 3
  3   B0       TailCallReturnFromFunction
  4   E0       SendSpecialBinaryArithMsg '+'
  5   F2       BlockReturn


{2.0 + 3.0}.def.dumpByteCodes

BYTECODES: (5)
  0   6A       PushSpecialValue 2.0
  1   40       PushLiteral Float 3.000000   00000000 40080000
  2   B0       TailCallReturnFromFunction
  3   E0       SendSpecialBinaryArithMsg '+'
  4   F2       BlockReturn

So, besides ints and floats, what other values can be unboxed in SC, in that sense? (It matters if one wants/hopes to write a minimal “core” classlib with everything else load-able later.)

(Also, why are there different opcodes for PushSpecialValue and PushLiteral ?)

There is a list at the end of opcodes.h, which alas is pretty long despite its (mis)leading comment:

/*
    special classes:
    Object, List, Number, Int, Float, Signal, Complex, Point
*/
enum {
    op_class_object,
    op_class_symbol,
    op_class_nil,
    op_class_boolean,
    op_class_true,
    op_class_false,
    op_class_magnitude,
    op_class_char,
    op_class_number,
    op_class_complex,
    op_class_simple_number,
    op_class_int,
    op_class_float,
    op_class_method,
    op_class_fundef,
    op_class_stream,
    op_class_func,
    op_class_frame,
    op_class_process,
    op_class_main,
    op_class_class,
    op_class_string,
    op_class_collection,
    op_class_sequenceable_collection,
    op_class_arrayed_collection,
    op_class_array,
    op_class_int8array,
    op_class_int16array,
    op_class_int32array,
    op_class_floatarray,
    op_class_signal,
    op_class_doublearray,
    op_class_symbolarray,
    op_class_list,
    op_class_linkedlist,
    op_class_bag,
    op_class_set,
    op_class_identityset,
    op_class_dictionary,
    op_class_identitydictionary,
    op_class_sortedlist,
    op_class_synth,
    op_class_ref,
    op_class_environment,
    op_class_event,
    op_class_wavetable,
    op_class_env,

    op_class_routine,
    op_class_color,
    op_class_rect,

    op_NumSpecialClasses
};

But it’s not terribly clear to me which of those “special” classes are actually unboxed…

However since Synth is on that specials list, for practical purposes it means that you have to (pre)compile most of the library in order for it to “work”, in the sense of not crashing the VM by accessing unloaded classes. I mean, other Smalltalk derivatives manage dynamic class loading (from user classes), but probably have a less extensive list of “specials”. Now even for SC, after the “monster classlib” is loaded, there’s no apriori reason you could not load more classes, as far those specials are concerned…

As far as devirtualization goes, there are a bunch “special selectors”, actually put in 3 different tables in the header file (unary, binary, and the rest), but initialized all in one method in PyrParseNode.cpp

void initSpecialSelectors() {
    PyrSymbol** sel;
    long i;

    sel = gSpecialUnarySelectors;
    sel[opNeg] = getsym("neg");
    sel[opRecip] = getsym("reciprocal");
    sel[opNot] = getsym("not");
    sel[opIsNil] = getsym("isNil");
    sel[opNotNil] = getsym("notNil");
    sel[opBitNot] = getsym("bitNot");
    sel[opAbs] = getsym("abs");
    sel[opAsFloat] = getsym("asFloat");
    sel[opAsInteger] = getsym("asInteger");
    sel[opCeil] = getsym("ceil"); // 5
    sel[opFloor] = getsym("floor");
    sel[opFrac] = getsym("frac");
    sel[opSign] = getsym("sign");
    sel[opSquared] = getsym("squared");
    sel[opCubed] = getsym("cubed"); // 10
    sel[opSqrt] = getsym("sqrt");
    sel[opExp] = getsym("exp");
    sel[opMIDICPS] = getsym("midicps");
    sel[opCPSMIDI] = getsym("cpsmidi");
    sel[opMIDIRatio] = getsym("midiratio");
    sel[opRatioMIDI] = getsym("ratiomidi");
    sel[opAmpDb] = getsym("ampdb"); // 15
    sel[opDbAmp] = getsym("dbamp");
    sel[opOctCPS] = getsym("octcps");
    sel[opCPSOct] = getsym("cpsoct");
    sel[opLog] = getsym("log");
    sel[opLog2] = getsym("log2"); // 20
    sel[opLog10] = getsym("log10");
    sel[opSin] = getsym("sin");
    sel[opCos] = getsym("cos");
    sel[opTan] = getsym("tan");
    sel[opArcSin] = getsym("asin"); // 25
    sel[opArcCos] = getsym("acos");
    sel[opArcTan] = getsym("atan");
    sel[opSinH] = getsym("sinh");
    sel[opCosH] = getsym("cosh");
    sel[opTanH] = getsym("tanh"); // 30
    sel[opRand] = getsym("rand");
    sel[opRand2] = getsym("rand2");
    sel[opLinRand] = getsym("linrand");
    sel[opBiLinRand] = getsym("bilinrand");
    sel[opSum3Rand] = getsym("sum3rand");
    /*
        sel[opExpRand] = getsym("exprand");
        sel[opBiExpRand] = getsym("biexprand");
        sel[opGammaRand] = getsym("gammarand");
        sel[opGaussRand] = getsym("gaussrand");
        sel[opPoiRand] = getsym("poirand");
    */
    sel[opDistort] = getsym("distort");
    sel[opSoftClip] = getsym("softclip");
    sel[opCoin] = getsym("coin");

    sel[opRectWindow] = getsym("rectWindow");
    sel[opHanWindow] = getsym("hanWindow");
    sel[opWelchWindow] = getsym("welWindow");
    sel[opTriWindow] = getsym("triWindow");

    sel[opSCurve] = getsym("scurve");
    sel[opRamp] = getsym("ramp");

    sel[opDigitValue] = getsym("digitValue");
    sel[opSilence] = getsym("silence");
    sel[opThru] = getsym("thru");


    sel = gSpecialBinarySelectors;

    sel[opAdd] = getsym("+");
    sel[opSub] = getsym("-");
    sel[opMul] = getsym("*");

    sel[opFDiv] = getsym("/");
    sel[opIDiv] = getsym("div");
    sel[opMod] = getsym("mod");
    sel[opEQ] = getsym("==");
    sel[opNE] = getsym("!=");
    sel[opLT] = getsym("<");
    sel[opGT] = getsym(">");
    sel[opLE] = getsym("<=");
    sel[opGE] = getsym(">=");
    // sel[opIdentical] = getsym("===");
    // sel[opNotIdentical] = getsym("!==");
    sel[opMin] = getsym("min");
    sel[opMax] = getsym("max");
    sel[opBitAnd] = getsym("bitAnd");
    sel[opBitOr] = getsym("bitOr");
    sel[opBitXor] = getsym("bitXor");
    sel[opLCM] = getsym("lcm");
    sel[opGCD] = getsym("gcd");
    sel[opRound] = getsym("round");
    sel[opRoundUp] = getsym("roundUp");
    sel[opTrunc] = getsym("trunc");
    sel[opAtan2] = getsym("atan2");
    sel[opHypot] = getsym("hypot");
    sel[opHypotx] = getsym("hypotApx");
    sel[opPow] = getsym("pow");
    sel[opShiftLeft] = getsym("leftShift");
    sel[opShiftRight] = getsym("rightShift");
    sel[opUnsignedShift] = getsym("unsignedRightShift");
    sel[opFill] = getsym("fill");
    sel[opRing1] = getsym("ring1"); // a * (b + 1) == a * b + a
    sel[opRing2] = getsym("ring2"); // a * b + a + b
    sel[opRing3] = getsym("ring3"); // a*a*b
    sel[opRing4] = getsym("ring4"); // a*a*b - a*b*b
    sel[opDifSqr] = getsym("difsqr"); // a*a - b*b
    sel[opSumSqr] = getsym("sumsqr"); // a*a + b*b
    sel[opSqrSum] = getsym("sqrsum"); // (a + b)^2
    sel[opSqrDif] = getsym("sqrdif"); // (a - b)^2
    sel[opAbsDif] = getsym("absdif"); //
    sel[opThresh] = getsym("thresh"); //
    sel[opAMClip] = getsym("amclip"); //
    sel[opScaleNeg] = getsym("scaleneg"); //
    sel[opClip2] = getsym("clip2");
    sel[opFold2] = getsym("fold2");
    sel[opWrap2] = getsym("wrap2");
    sel[opExcess] = getsym("excess");
    sel[opFirstArg] = getsym("firstArg");
    sel[opRandRange] = getsym("rrand");
    sel[opExpRandRange] = getsym("exprand");


    sel = gSpecialSelectors;

    sel[opmNew] = getsym("new");
    sel[opmNewClear] = getsym("newClear");
    sel[opmNewCopyArgs] = getsym("newCopyArgs");
    sel[opmInit] = getsym("init");
    sel[opmAt] = getsym("at");
    sel[opmPut] = getsym("put");
    sel[opmNext] = getsym("next");
    sel[opmReset] = getsym("reset");
    sel[opmValue] = getsym("value");
    sel[opmCopyToEnd] = getsym("copyToEnd"); // used by multiple assignment
    // sel[opmIsNil] = getsym("isNil");
    // sel[opmNotNil] = getsym("notNil");
    sel[opmSize] = getsym("size");
    sel[opmClass] = getsym("class");
    sel[opmIf] = getsym("if");
    sel[opmWhile] = getsym("while");
    sel[opmFor] = getsym("for");
    sel[opmAnd] = getsym("and");
    sel[opmOr] = getsym("or");
    sel[opmCase] = getsym("case");
    sel[opmSwitch] = getsym("switch");
    sel[opmIdentical] = getsym("===");
    sel[opmNotIdentical] = getsym("!==");

    sel[opmPrint] = getsym("print");
    sel[opmAdd] = getsym("add");
    sel[opmRemove] = getsym("remove");
    sel[opmIndexOf] = getsym("indexOf");
    sel[opmWrapAt] = getsym("wrapAt");
    sel[opmClipAt] = getsym("clipAt");
    sel[opmFoldAt] = getsym("foldAt");
    sel[opmWrapPut] = getsym("wrapPut");
    sel[opmClipPut] = getsym("clipPut");
    sel[opmFoldPut] = getsym("foldPut");
    sel[opmDo] = getsym("do");
    sel[opmCollect] = getsym("collect");
    sel[opmSelect] = getsym("select");
    sel[opmReject] = getsym("reject");
    sel[opmAny] = getsym("any");
    sel[opmEvery] = getsym("every");
    sel[opmFind] = getsym("find");

    sel[opmChoose] = getsym("choose");

    sel[opmValueList] = getsym("valueList");
    sel[opmAddFirst] = getsym("addFirst");

    sel[opmPrimitiveFailed] = getsym("primitiveFailed");
    sel[opmSubclassResponsibility] = getsym("subclassResponsibility");
    sel[opmShouldNotImplement] = getsym("shouldNotImplement");
    sel[opmDoesNotUnderstand] = getsym("doesNotUnderstand"); // not really needed
    sel[opmNotYetImplemented] = getsym("notYetImplemented");

    sel[opmAtSign] = getsym("@");
    sel[opmWrapAtSign] = getsym("@@");
    sel[opmClipAtSign] = getsym("|@|");
    sel[opmFoldAtSign] = getsym("@|@");

    sel[opmMultiNew] = getsym("multiNew"); // UGens
    sel[opmMultiNewList] = getsym("multiNewList"); // UGens
    sel[opmAR] = getsym("ar"); // UGens
    sel[opmKR] = getsym("kr"); // UGens
    sel[opmIR] = getsym("ir"); // UGens

    sel[opmEnvirGet] = getsym("envirGet");
    sel[opmEnvirPut] = getsym("envirPut");

    sel[opmHalt] = getsym("halt");
    sel[opmForBy] = getsym("forBy");
    sel[opmForSeries] = getsym("forSeries");
    sel[opmReverseDo] = getsym("reverseDo");
    sel[opmLoop] = getsym("loop");
    sel[opmNonBooleanError] = getsym("mustBeBoolean");

    sel[opmCopy] = getsym("copy");
    sel[opmPerformList] = getsym("performList");
    sel[opmIsKindOf] = getsym("isKindOf");
    sel[opmPostln] = getsym("postln");
    sel[opmAsString] = getsym("asString");

    sel[opmPlusPlus] = getsym("++");
    sel[opmLTLT] = getsym("<<");
    sel[opmQuestionMark] = getsym("?");
    sel[opmDoubleQuestionMark] = getsym("??");
    sel[opmExclamationQuestionMark] = getsym("!?");

    sel[opmYield] = getsym("yield");
    sel[opmName] = getsym("name");
    sel[opmMulAdd] = getsym("madd");

    sel[opmSeries] = getsym("series");

    for (i = 0; i < opNumUnarySelectors; ++i) {
        gSpecialUnarySelectors[i]->specialIndex = i;
    }
    for (i = 0; i < opNumBinarySelectors; ++i) {
        gSpecialBinarySelectors[i]->specialIndex = i;
    }
}

Now, the fact a selector is on that list, doesn’t necessarily mean it’s devirtualized, but often enough it is. E.g. when (even user-subclassed) Environments are accessed, the calls to envirGet and envirPut (which do appear in that specials list) are devirtualized by default.

On the other hand, there does not seem to a generalized devirtualization process for stuff in the classlib, i.e. besides what’s baked into the specials’ implementations.

Side note: in GHC (Haskell) you can access the unboxed types with the “magic hash” extension, but even there, you cannot pass unboxed types to polymorphic functions.

the closest analogue is the set of tags that can be used on a PyrSlot besides PyrObject.

those are:

  • integer
  • float
  • char
  • raw pointer (used inside handle classes like File, not typically encountered in everyday SC)
  • true
  • false
  • nil
  • symbol

there are also several other types (i think mostly subclasses of PyrObject) that can go in a PyrSlot which you can see from the union below. those are sometimes used inside the interpreter when it is already certain what is contained in the slot.

1 Like

the list of classes you found is the list of “intrinsic classes” in SuperCollider. these are classes which are in some way baked into the C++ implementation; the implementation assumes they must be there.

1 Like

Some values are just “special” :slight_smile:

The opcode PushLiteral makes a reference to the list of literal values stored in the nearest activation frame. (The opcode 40 literally means grab the frame’s first literal, 41 grabs the second, and so on. Examine the byte codes of { 3.0 + 4.0 } to see this.) But to save space and time some common values are encoded directly in opcodes. +1, -1, 0, and their floating-point counterparts are some of them.

I have a long draft markdown file that documents and explains the behavior of each opcode. I hadn’t published it because it is not complete yet. I can share it as a gist if that would help you.

1 Like

no devirtualization for SendSpecialBinaryArithMsg – the call still goes through a large switch-case on the tag type. i’m trying to think of an opcode that assumes the type of arguments, but coming up blank. there are some primitives that assume the type of stack arguments, but that is more out of optimism than certainty, haha.

1 Like

Yes that would be helpful.

1 Like

Done, in a separate thread