Discrete logarithm

Yann_Ics · March 20, 2024, 1:20pm

Hi all,
In the frame of the extension/quark cycle, I am working to implement a new cyclic algorithm: the discrete logarithm. Apriori, it works like it should, only it returns weird result when the big numbers are involved. I know this is due to the limit of the SuperCollider implementation, but I would like either add a condition with warning when numbers reach the limit of the SC implementation, or to implement this algorithm in C++, in order to remove any whimsical result.
Here is the draft code:

(
~discreteLog = { |g, p, i=1, res|
	var sm = ~squareAndMultiply.(g,i)%p;
	// or
	// var sm = (g**i).asInteger%p;
    if (res.isNil.not && res.asArray.includes(sm))
	// or
	// if (res.isNil.not && (res.asArray[0] == sm))
	{
		res
	}
	{
		if ((i>1000) || (sm<=0))
		{ "you reached the limits of ...".error; res }
		{ res = res.add(sm); ~discreteLog.(g, p, i+1, res) }
	}
};

~squareAndMultiply = {
	|n, exp|
	var res=1;
	exp.asDigits(2).do{|it| res = res * res; if(it == 1) {res = res * n}};
	res
}
)

~discreteLog.(27, 23);
// the result should be: [ 4, 16, 18, 3, 12, 2, 8, 9, 13, 6, 1 ]

Thanks, Y.

smoge · March 20, 2024, 1:56pm

Have you tried using Float instead of Integer? 27.0 instead of 27 in this case. Then, you will have a 64-bit floating-point number, and the range will be much, much larger.

(Or maybe change the code to convert it to Float. I haven’t checked the quark yet)

Nowadays, there are good algorithms for arbitrary big numbers; it may be possible in the future to implement in sclang.

Yann_Ics · March 20, 2024, 2:01pm

Yes, it seems that it works with float. Let me check more further…

Yann_Ics · March 20, 2024, 2:31pm

I confirm it works with Float.
Only, SuperCollider for this kind of computation in comparison with C++ or Common Lisp is rather limited.
Yes, let see in the future if algorithms for arbitrary big numbers will be relevant within SC.

smoge · March 20, 2024, 2:47pm

Ya, some languages have different types (like Int for fixed-precision integer, and Integer for arbitrary big numbers).

See what happens in the binary side of things with fixed-precision (<< shifts bits left):

(1 << 30).postln.asBinaryString(32).postln
// 1073741824
// 01000000000000000000000000000000

(1 << 31).postln.asBinaryString(32).postln
// -2147483648
// 10000000000000000000000000000000

jordan · March 20, 2024, 4:26pm

…wait, that means Float has a larger integer range than Integer by 2^22 (~4mil).

okay I just checked the source code and I'm so confused...

Here is some code displaying this oddity.

(1 << 62).postln.asBinaryString(64).postln
// 0000000000000000000000000000000000000000000000000000000000000000

(1 << 63).postln.asBinaryString(64).postln
// 0000000000000000000000000000000000000000000000000000000000000000

// so at this point you'd probably expect it just to be zeroing 
// out everything after 32.... nope

(1 << 64).postln.asBinaryString(64).postln
// 0000000000000000000000000000000000000000000000000000000000000001

Int is 64bits.
supercollider/lang/LangSource/PyrSlot64.h at 36eca40ccf50f735f03a3e7dabced6ee718cf0ac · supercollider/supercollider · GitHub

but it only returns to sclang and is set through a 32bit interface

supercollider/lang/LangSource/PyrSlot64.h at 36eca40ccf50f735f03a3e7dabced6ee718cf0ac · supercollider/supercollider · GitHub

I wonder if there is a reason for this, or if it was forgotten?

Also, if there was a compiler flag to set int to 64bits (which would be completely valid in the C++ spec) would all sclang ints suddenly be 64bit? Could you do this with macros?

Make ScLang's integer 64bits by JordanHendersonMusic · Pull Request #6242 · supercollider/supercollider · GitHub

smoge · March 20, 2024, 9:27pm

. It just happens you have more bits (and significand figures) with Float (actually Double / 64bit) right now.

smoge · March 21, 2024, 12:30pm

The thing about Float 32 vs. Doubles is that, as I understand, there is not much point in using 32-bit floating-point.

Floats (probably 32-bits) are almost always a bad idea, unless you Really Know What You Are Doing. Use Doubles. There’s rarely a speed disadvantage—modern machines will use the same floating-point unit for both. With Doubles, you are much less likely to hang yourself with numerical errors. One time when Float might be a good idea is if you have a lot of them, say a giant array of Floats.

I’m not sure about Int32 vs Int64. (is it ?)

jordan · March 21, 2024, 1:27pm

That test is on a 64bit architecture, I imagine the performance on 32bits system (I’m thinking about raspberry pi in particular) might be very different. I haven’t been able to find one though…

smoge · March 22, 2024, 5:52am

A function in the files you edited could be safer even with int32, sc_lcm int function in include/plugin_interface/SC_InlineBinaryOp.h

Instead of this:

       return (a * b) / sc_gcd(a, b);

This?

      return (a / sc_gcd(a, b)) * b;

(Avoid multiplying before dividing, minimizing the risk of overflow. That’s just a tiny detail, of course)

smoge · March 22, 2024, 6:12am

Not exactly it, but

Spacechild1 · March 22, 2024, 7:37am

I think the primary reason why sclang uses 64-bit floating point numbers, but only 32-bit integers, is because the original 32-bit interpreter uses a technique called NaN-tagging (AKA NaN-boxing). The idea is to use some NaN bit patterns in the upper 16 bits of a 64-bit float to designate the object type, so the lower 48 bits can be used to store other types. On a 32-bit systems, pointers are 32 bit wide, so that is no problem, but you are also confined to 32 bit integers *). See lang/LangSource/PyrSlot32.h.

On 64-bit systems, sclang actually uses simple tagged unions, see lang/LangSource/PyrSlot64.h. **) Theoretically, sclang could have “upgraded” to 64-bit integers, but it would have also required updating the whole code base, which probably would have been too much work and error prone, so they just kept the 32-bit integer type.

Here’s a nice article that explains this concept (and alternatives): NaN boxing or how to make the world dynamic - Blog by Piotr Duperas

*) Actually, with NaN-tagging you could use 48-bit integers (= 64-bit integer with the upper 16 bits masked off).

**) Actually, NaN-tagging does work for 64-bit systems, assuming that only the lower 48 bits of pointers are valid and the OS does not use the upper 16 bits (which is typically the case).

smoge · March 22, 2024, 8:12am

Thank you for explaining.

The different subclasses of RawArray do not imply other types in the language, right? One could not use a 32-float, for example, since it would create problems with all algebraic operations without design changes. Maybe the same reason an additional Integer type would be a problem too?

Int8Array - 8 bit integer
Int16Array - 16 bit integer
Int32Array - 32 bit integer
FloatArray - 32 bit floating point
DoubleArray - 64 bit floating point
SymbolArray - symbols

Spacechild1 · March 22, 2024, 9:06am

The different subclasses of RawArray do not imply other types in the language, right?

That’s a good point! They do not necessarily refer to actual types in the language. There is still only a single floating point and integer type (not counting Char).

Note that it is always safe to cast between float and double; in the worst case you just lose some bits of precisions. In the case of FloatArray, the floats are internally kept as an array of (32-bit) float, but whenever you access individual members, they are necessarily converted back to (64-bit) double.

With integers it’s a bit more complicated: signed integer overflow is undefined behavior in C/C++, so you cannot simply save any 32-bit number in, say, an 8-bit signed integer. I just quickly checked and sclang does not do any range checks, so the following code would actually trigger undefined behavior:

a = Int8Array[1000, 2000, 3000];

smoge · March 22, 2024, 10:01am

A good behavior could be to wrap around, stay within the type range, and print a warning.

jordan · March 22, 2024, 10:33am

Ah I saw something odd with the floats, thanks!

What I find confusing is that the slot stores an int64, but all functions use just the normal int (which could be as low as 16bit according to the standard…). Also that the char is big?

Do you think a change to fixed width integers might be worthwhile?

I’ve made a draft pr of this, it’s only a few hundred lines, mostly using Clion’s refactoring of function signatures and took an hour or two, I’ve run the unit tests and haven’t found any bugs (on Linux)… which probably means there aren’t enough tests.

I haven’t got the windows 32bit ci to work yet, but that might be more interesting.

smoge · March 22, 2024, 10:53am

One would need to check each integer assignment throughout the code, maybe hundreds.

Lot of work, but I think it would be worth it.

(and an opportunity to write more tests)

Spacechild1 · March 22, 2024, 4:12pm

Ah, you’re right, I didn’t catch this! But yeah, they didn’t bother to update the whole code base to 64-bit integers – which is understandable in a way because it would be a massive undertaking.

(which could be as low as 16bit according to the standard…)

Only on some microcontrollers

Also that the char is big?

Yes. What is even more confusing is that the setter (SetChar) uses a char parameter, but the getter (slotRawChar) returns an int…

The case of Char is a bit curious in general. Is there any other (popular) scripting language that has a dedicated integer type for characters? I guess the idea was that Char should have its own set of methods and not behave as a number; in particular, it should not participate in arithmetic operations. That’s why it only inherits from Magnitude, but not SimpleNumer. However, it would be very inefficient if it was a regular (heap-allocated) object, so it had to be a built-in type.

Spacechild1 · March 22, 2024, 4:41pm

A good behavior could be to wrap around, stay within the type range, and print a warning.

At least that’s what happens for Char, see the _AsAscii primitive:

SetChar(a, slotRawInt(a) & 255);

.asAscii resp. .asDigit are the only methods for explicitly creating a Char.

However, this is the current implementation of .put for Int8Array and Int16Array, see putIndexedSlot:

case obj_int16:
        if (NotInt(c))
            return errWrongType;
        ((int16*)(obj->slots))[index] = slotRawInt(c);
        break;
    case obj_int8:
        if (NotInt(c))
            return errWrongType;
        ((int8*)(obj->slots))[index] = slotRawInt(c);
        break;

As you can see, there are no range checks. (Not to mention the pointer casts that violate strict aliasing rules.)