Are there any loop functions inlined in sclang?

Avid_Reader · December 13, 2021, 8:11pm

Calls to ifs are inlined to conditional jumps in bytecode when both “branch blocks” (functions) have no arguments and no own-frame variables.

Are there any loop construct that get similarly inlined? For useable loops you can’t expect the argument function to have no arguments whatsoever. So I don’t expect the same check function to be called as for if.

jamshark70 · December 13, 2021, 8:19pm

while (which I had noted in the other thread).

hjh

rdd · December 13, 2021, 8:27pm

I don’t know for Sc, but the Ansi Smalltalk draft has a list of “traditionally open-coded messages” (p.17).

It includes to:do:, to:by:do: and timesRepeat:

https://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf

The rationale paragraph (which is nice) notes that:

Other messages such as #whileTrue: have also been traditionally
opencoded but typically their receivers are block constructors. Thus,
it is feasible to only open-code messages with compile-time verifable
receivers and there is no conflict with the polymorphic use of the
messages.

The BlockClosure>>whileTrue:comment in Squeak says:

“Ordinarily compiled in-line, and therefore not overridable.
This is in case the message is sent to other than a literal block.”

Avid_Reader · December 13, 2021, 8:38pm

I can see how while was able to leverage the no-arg-and-no-vars function-block inlining. That unfortunately means you have to write efficient loops in SC in something similar to C89/C90 style with the loop var declared outside the loop body.

i = 0; // var must be from outside scope
while { i < 10 } { i = i + 1 }
i

{ while { i < 10 } { i = i + 1 } }.def.dumpByteCodes
/*
BYTECODES: (15)
  0   1A       PushInstVar 'i'
  1   2C 0A    PushInt 10
  3   E8       SendSpecialBinaryArithMsg '<'
  4   F9 00 07 JumpIfFalsePushNil 7  (14)
  7   1A       PushInstVar 'i'
  8   6B       PushOneAndAdd
  9   07 0A    StoreInstVarX 'i'
 11   FD 00 0C JumpBak 12  (0)
 14   F2       BlockReturn
-> < closed FunctionDef >
*/

It’s the JumpBak opcode that allows this (not needed for if).

ArrayedCollection.do says

	do { arg function;
		// special byte codes inserted by compiler for this method
		var i=0;
		while ({ i < this.size }, {
			function.value(this.at(i), i);
			i = i + 1;
		})
	}

I’m not sure if the comment refers to just the while is uses or to do itself. I’m guessing based on the discussion here it’s just the inner while that’s being inlined. Actually I’m right, it’s only the while that’s inlined

{ (1..5).do { i = i + 1 } }.def.dumpByteCodes

/*
BYTECODES: (10)
  0   64       PushSpecialValue 1
  1   6E       PushSpecialValue nil
  2   2C 05    PushInt 5
  4   04 00    PushLiteralX instance of FunctionDef - closed
  6   B0       TailCallReturnFromFunction
  7   C4 40    SendSpecialMsg 'forSeries'
  9   F2       BlockReturn
*/

So it does seem to use an actual function for { i = i + 1 } which itself is not inlined here. But there’s interestingly no msg-call to any do, so that’s also inlined… a little bit.

That’s almost the same bytecode generated as if you’d had written

{ forSeries (1, 2, 5) { i = i + 1 } }.def.dumpByteCodes

/*
BYTECODES: (10)
  0   64       PushSpecialValue 1
  1   65       PushSpecialValue 2
  2   2C 05    PushInt 5
  4   04 00    PushLiteralX instance of FunctionDef - closed
  6   B0       TailCallReturnFromFunction
  7   C4 40    SendSpecialMsg 'forSeries'
  9   F2       BlockReturn
*/

There’s a special msg for for-style series generation, but it still
uses an actual function as its last argument, i.e. the function that makes the “block”
is itself not inlined for these constructs.

jamshark70 · December 13, 2021, 9:22pm

Not exactly. The optimization in (start .. end).do is not inlining, but rather to avoid physically constructing the array.

I’m curious though, what are your goals in using SC? Are these discussions helping you achieve those goals, or distracting attention away from them?

hjh

Avid_Reader · December 13, 2021, 9:32pm

I was trying to understand why sclang loops are so slow in relation to https://scsynth.org/t/much-faster-channelpeaks-client-side/

Herman · December 13, 2021, 10:41pm

I for one do enjoy these topics and the in-depth analysis and am grateful for your work @Avid_Reader

jamshark70 · December 13, 2021, 11:20pm

The deeper answer is that SC isn’t designed for, and never was designed for, mass number crunching. It’s designed for a high degree of flexibility over short bursts of calculation, spread out in time. I think the flexibility is why James McCartney chose Smalltalk as a model rather than, say, Java.

Non-inlined looping is only one reason (for channelPeaks to be slow). The other (which I think is at least as significant) is that math operators (being methods) dispatch at runtime, not at compile time. This is a big performance drain, but it also allows simple user code to handle math over arrays with any geometry (mixing data types, mixed sub-arrays of different sizes etc.).

SC’s primary use case, in the control layer, is for audibly discrete events, which in most cases implies, oh, more than 40-50 ms between events. If an Event is doing, let’s say, 100-200 math operations, and runtime dispatch means that it takes an extra half-ms or ms to complete those, this is not significant compared to the usual inter-onset interval. It is significant if you’re trying to process 100 million audio samples… leading to the conclusion that sclang is a poor choice for that task.

For the specific case of finding the max sample or normalizing, an NRT server is much faster than anything sclang can do, even if loops were somehow inlined. I have some code that does this but I have to run out the door, can share it later.

hjh

rdd · December 14, 2021, 12:52am

This is perhaps a bit off topic, but Smalltalks can be (and many are!) very fast.

For instance, the OpenSmalltalk VM (https://github.com/OpenSmalltalk/opensmalltalk-vm) on an oldish laptop reports:

“2,600,000,000 bytecodes/sec; 230,000,000 sends/sec”

I’m completely inexpert, but I think the key optimisations are “dynamic translation” and “in-line caching”, both from 1984:

L. Peter Deutsch and Alan Schiffman
“Efficient Implementation of the Smalltalk-80 System.”
Proc. Principles of Programming Languages,
Salt Lake City, UT, 1984
http://web.cs.ucla.edu/~palsberg/course/cs232/papers/DeutschSchiffman-popl84.pdf

and “polymorphic inline caches”, from 1991:

Urs Hölzle, Craig Chambers, and David Ungar
“Optimizing Dynamically-Typed Object-Oriented Programming Languages with Polymorphic Inline Caches”
ECOOP ‘91 Conference Proceedings,
Geneva, Switzerland, July, 1991
https://bibliography.selflanguage.org/pics.html

Anyways, late bound object systems can be very fast!