Choosing how to choose with self-normalising weights

semiquaver · May 28, 2024, 5:09pm

I think everyone on the thread has been willing at one point or another to accept @julian’s suggestion of the mirrored option. Some (myself and OP) would prefer fewer methods, some more(@julian), some happiest with this option (@mike). Perhaps this is a consensus?

Regarding mixing weights and choice, as Eric pointed out this is the interface of TWChoose.

smoge · May 28, 2024, 5:12pm

I kind of like this option:

choose 
wchoose(weights) // accepts any list (or maybe positive numbers to avoid ugly errors)
wchooseN(normalizedWeights)  // only normalizedSum arrays

Because you don’t need wchooseN unless you’re engaged in a performance-critical operation and need to preprocess all arrays before. So one more letter is not that bad compared to preprocessing (normalizing) the arrays before the time-critical moment.

The time to perform once .normalizeSum in a human-readable (less than a thousand numbers, for example) small array is not perceptible, in a modern machine that would be in the order of microseconds in the worst scenario.

semiquaver · May 28, 2024, 5:43pm

one issue here is that it could degrade the performance of existing code (a little at least!). The mirrored solution leaves existing code untouched.

smoge · May 28, 2024, 5:45pm

To degrade performance, one would need to use non-human-writable arrays (not a list you would type with the keyboard) and use them in a loop in a granular synthesis scheduling kind of situation. Those are not common cases.

That’s the use-case of numpy, for example (huge lists, crazy operations etc). That’s why numpy requires a preprocessed normalized list, but the standard python library doesn’t, since it does not make a difference. It’s just not a significant difference.

rdd · May 28, 2024, 11:03pm

About performance, the Walker/Vose “Alias Method” algorithm is very nice, i.e. https://en.wikipedia.org/wiki/Alias_method.

About notation, Mathematica uses an association to distinguish the weighted and neutral cases, which is quite nice too, i.e.

// Weighted random choice
([1, 3, 5] -> [$a, $b, $c]).choose

// Weighted random 16-vector
([1, 3, 5] -> [$a, $b, $c]).choose(16)

// Weighted random 8x8-matrix
([1, 3, 5] -> [$a, $b, $c]).choose([8, 8])

// First 16 items from weighted random stream
([1, 3, 5] -> [$a, $b, $c]).chooseStream.nextN(16)

dscheiba · May 28, 2024, 11:14pm

Personally, I think this thread is a good example of why some things in sclang have come to a halt, and I personally look at it too often from a frustrated POV, because I would prefer a more fluid, consistent, or elegant API to create something interesting and new together as a community, but on the other hand, a more conservative approach also has its advantages, such as having a stable API. Having a stable API is actually quite important for a medium like generative music, which is hard to archive compared to, say, recorded music.

And while this approach may actually be boring - in a good way - I think anyone who has worked with (Node)JS professionally understands the problems that instability creates which a small community like SC could not handle - projects would stop working and people would just turn away.
The development cycles of sclang are not weeks like in JS, but rather years - code/components/compositions are often stored for years or decades, making it harder to adapt to changes, as it is not code that is run and developed on a daily basis and not exclusively by professional coders.

There is this very interesting talk about Go: Go has decided that it can never remove anything from its 1.0 implementation, so while they can add new features, they can never remove anything, which I think somehow matches SuperCollider

Inspired by this talk, maybe we shouldn’t worry so much about the standard library, and not try to turn sclang into a different language or force things on it, but focus more on improvements within the language itself, the tools, the bugs, the docs, the tutorials, the community?
There is a lot of work to be done, and there has been some very interesting momentum on various issues recently.
And if one still wants to break stuff, many things can be done via Quarks

Sorry if this is too meta/OT.

Asmatzaile · May 29, 2024, 7:59am

Hi!
In the post you linked about Go, they say the following about breaking changes:

It is intended that programs written to the Go 1 specification will continue to compile and run correctly, unchanged, over the lifetime of that specification. At some indefinite point, a Go 2 specification may arise, but until that time, Go programs that work today should continue to work even as future “point” releases of Go 1 arise (Go 1.1, Go 1.2, etc.).

Which is something that software I know follows; i. e., only allowing breaking changes in major releases.

About preserving older SC code:

I don’t know whether this is being done, but a version number could be stored in supercollider files metadata. Code is also shared via pasting; a recommended good practice could be to put the version in the header.

I do think this is maybe too meta for this thread, for the suggestion wasn’t to remove anything, but to have a nicer API without breaking other code.

I also think that it may be time for voting as @mike suggested. How do you handle that in the SC forums?

julian · May 29, 2024, 8:03am

Thank you for your meta-comments, I also have things to say about this, but I didn’t want to decenter the thread. Could we keep it about the topic itself for a moment, please?

A general discussion on backward compatibility will most probably be far more complicated than one on choose.

For now, I would say: we surely don’t want to break any code for a little convenience like the one we are discussing here.

julian · May 29, 2024, 8:11am

Out of interest, I was asking the community for advice in a difficult but relatively unimportant matter, I don’t think voting would help. It is about learning together.

It may well be that we change nothing!

Asmatzaile · May 29, 2024, 9:35am

Ok, thanks!
Could we maybe center then the discussion between

// 2.
arr.choose
arr.wchoose(normalizedWeights)
arr.wchoosen(relativeWeights)

and

// 6.
arr.choose(weights, normalize=true)
arr.wchoose(weights, normalize=false)

?

Those share many of the pros mentioned in the post of mike.

But I think 2 has more cons, namely:

doesn’t unify the interface between wchoose and twchoose.
isn’t as discoverable as 6.
the behaviour suggested by default is the unexpected one. In his C++ book, Bjarne Stroustrup makes the point that code should be expressed in a correct, simple and efficient manner, in descending order of importance. Because of that, I think that the most succint / default option should be the one that makes the expected results, rather than the most efficient one.
doesn’t fit so well the iterative workflow outlined by @semiquaver; there’s more changes on each step
the n is not that expressive; does it mean that the weights have to be normalized or that the method will normalize the weights?

The con of 6 would be:

Non sequenceable collections can’t use weights

mike · May 29, 2024, 10:51am

To clarify, putting quotes on “votes” was just a nudge toward consensus, not a request to actually vote I think we have plenty of info to go on at this point, but happy also if the discussion continues…

I agree it’s a fun and educational discussion, and in some cases that leads to no change and that’s ok. But here I think we have a couple of good options, both well motivated, backwards compatible, natural extensions of current behavior. So I would hope we could agree on one, or be ok with someone making the call if there aren’t strong objections to either option.

It would be a shame if the primary issue—namely, just having a way for weighted selection to use relative rather than pre-normalized weights—wasn’t addressed in the end.

(btw thanks @Asmatzaile for clarifying the correct meaning of the arguments in the options you listed)

smoge · May 29, 2024, 11:44am

Yes, I thnk so. BTW, that’s the one Knuth analyzed in his book.

github.com/supercollider/supercollider

Make wchoose and windex use relative weights (normalize by default)

opened 04:09PM - 15 May 24 UTC

Asmatzaile

enhancement

## Motivation `wchoose` and `windex` don't normalize; that makes it easy to g…et unexpected results. ``` ( var list, pick; list= [\red, \green, \blue]; pick = list.wchoose([100,200,300]); ) ``` Looking at the above code, one would expect to get `\blue` thrice as often and `\green` twice as often as `\red`. In reality,`\red` is the only element that is picked. The correct option would have been to use `list.wchoose([100,100,100].normalizeSum)`. In practice, all lists with weights that are not normalized (and I assume that they are the majority) will need to have `.normalizeSum` applied. ## Description of Proposed Feature The proposal is to make those functions normalize by default, as other languages as [python](https://docs.python.org/3/library/random.html#random.choices). If efficiency is a concern, a flag can be added for the arguably rare cases where the sum of the list is already equal to one.

smoge · May 29, 2024, 2:02pm

This hybrid SC/Mathematica would be even better than both of them.

scztt · May 29, 2024, 2:39pm

Not sure if this has come up but, fwiw the documentation for wchoose states explicitly that the weights MUST be normalized, and indeed the output is basically undocumented and nonsense if they are not. So purely in terms of behavior, we would get:

array.wchoose(weights.normalizeSum); // same behavior
array.wchoose(nonNormalWeights); // old behavior was undefined, new behavior is the intuitively correct result

The only code we would break is code that relies on undocumented and unintuitive implementation details, whereas we would almost for sure fix existing bugs in user code and make new code more straightforward and more efficient.

scztt · May 29, 2024, 3:09pm

Also, on the performance tip, it’s worth noting that the most expensive thing BY FAR when considering old code, array.wchoose(weights.normalizeSum) is the fact that normalizeSum copies the entire array.

(
var a = (0.0..10.0);
var asig = Signal.newFrom(a);
bench { 10000.do { asig.peak } };
bench { 10000.do { a.normalizeSum } };
bench { 10000.do { |i| a[i] } };
)

On my machine, the second is 5x as slow than the first. I used peak because it approximates what we would do for wchoose normalization - namely, visit every item in the array and perform one-ish instruction on it. The third is for scale - just indexing into an array is roughly the same time-cost as .peak. So the result would be: old user code gets slower by the cost of one array index, and new code is 5x as fast.

rdd · May 29, 2024, 10:51pm

About the cost of normalizeSum, ordinarily one only normalizes once per distribution, not once per choice?

I.e. array.wchoose(weights.normalizeSum) is precisely the thing to avoid?

The idiom is rather var weights = distribution.normalizeSum; { population.wchoose(weights) } ! shape?

Which is why adding a normalize stage at wchoose is a somewhat eccentric idea?

Ps. The timings below compare the Walker/Vose algorithm (randomChoice) and wchoose.

var k = 1E4;
var w = ({ 0.1.rrand(1.0) } ! k).normalizeSum;
var e = (1 .. k);
{ { e.wchoose(w) } ! k }.bench; // k=1E4=0.186 k=1E5=17.7
{ (w -> e).randomChoice(k) }.bench // k=1E4=0.015 k=1E5=0.197

scztt · May 30, 2024, 6:18am

Good point, my example was a bit biased - almost all of the times I use wchoose and related functions/classes, the weights are calculated rather than pre-assigned or constant, and can change from invocation to invocation. This is different if you’re choosing a fixed distribution once, and re-using it many times. I guess I don’t really know how people use this more generally.

My overall point was mainly: normalizing every time is on the order of the cost of a single array access - given the way that wchoose is used, this is trivial and so (I’d argue) performance isn’t an important factor in determining the API. (Not that there aren’t good arguments for several of the different options, just that performance isn’t really a differentiator).

julian · May 30, 2024, 8:29am

I get different results on my computer (otherwise I would hav loved to just make wchoose auto-normalising i the first place!)



// small lists (about 2x / 7x)
(
var a = (0.0..5.0);
var w = a.normalizeSum;
bench { 10000.do { a.wchoose(a.normalizeSum) } };
bench { 10000.do { a.wchoose(w) } };
bench { 10000.do { |i| a[i] } };
)

time to run: 0.0064919769993139 seconds.
time to run: 0.0020883699999104 seconds.
time to run: 0.00056842400044843 seconds.


// large lists (about 6x / 80x)
(
var a = (0.0..300.0);
var w = a.normalizeSum;
bench { 10000.do { a.wchoose(a.normalizeSum) } };
bench { 10000.do { a.wchoose(w) } };
bench { 10000.do { |i| a[i] } };
)

time to run: 0.051243785000224 seconds.
time to run: 0.0088985180000236 seconds.
time to run: 0.00070611899991491 seconds.


// very large lists (about 6x / 650x)
(
var a = (0.0..3000.0);
var w = a.normalizeSum;
bench { 10000.do { a.wchoose(a.normalizeSum) } };
bench { 10000.do { a.wchoose(w) } };
bench { 10000.do { |i| a[i] } };
)

time to run: 0.42306504999942 seconds.
time to run: 0.063606649999201 seconds.
time to run: 0.0006869299995742 seconds.

scztt · May 30, 2024, 9:34am

interesting - to be even more organized about it:

(
var test = {
    |array, weights|
    var asig = Signal.newFrom(weights);
    
    bench { 10000.do { |i| array.wchoose(weights) } };
    bench { 10000.do { weights.normalizeSum } };
    bench { 10000.do { asig.normalize } };
    bench { 10000.do { |i| array[i] } };
};

"small: ".postln;
a = (0.0..5.0);
test.(a, a.normalizeSum * 0.9); // 0.9 to avoid fallthrough cases where normalization isn't necessary

"middle: ".postln;a = (0.0..300.0);
test.(a, a.normalizeSum * 0.9);

"large: ".postln;a = (0.0..3000.0);
test.(a, a.normalizeSum * 0.9);
)

I get:

small: 
time to run: 0.0017068750000817 seconds.
time to run: 0.002644499999974 seconds.
time to run: 0.00071850000006179 seconds.
time to run: 0.0005948329999228 seconds.
middle: 
time to run: 0.0066897919999747 seconds.
time to run: 0.032546832999969 seconds.
time to run: 0.0054264579999881 seconds.
time to run: 0.0005314580000686 seconds.
large: 
time to run: 0.050658249999969 seconds.
time to run: 0.24656954199997 seconds.
time to run: 0.048866041999986 seconds.
time to run: 0.00055279200000768 seconds.

Conclusions?
For small arrays, normalizing every time is trivial (comparable to 1 array access / fraction of the wchoose time. Since normalizing is O(n), for large arrays it’s proportionally the same as the cost of the wchoose calculation (e.g. 1/6 of the time), but clearly not trivial in the same way.

Thinking about a real-world “bad” case - here’s an example that balances it against the cost of sending an OSC message (which is a small but functionally useful thing you could do, where performance could matter):

(
var test = {
    |count, array, weights|
    var asig = Signal.newFrom(array);
    var addr = NetAddr("127.0.0.1", 12345);
    
    bench { 
        var values;
        count.do { 
            |i| 
            addr.sendMsg('/rand', array.wchoose(weights));
        };
    };
    bench { 
        var values;
        count.do { 
            |i| 
            asig.normalize; 
            addr.sendMsg('/rand', array.wchoose(weights));
        };
    };
};

"real-world: ".postln;a = (0.0..3000.0);
test.(1000, a, a.normalizeSum * 0.99);
)

// time to run: 0.011487041000237 seconds.
// time to run: 0.016208582999752 seconds.

So, we go from 11ms to 16ms for the “convenient” implementation (e.g. normalize by default) - this starts to feel non-trivial. (This is a pretty extreme example, 3000 osc messages per second pulling from a massive weight table - but I guess it’s an open question how much we design for this kind of use specifically.)

I suppose my intuition here is still: someone doing this is writing advanced code, and it seems reasonable to expect that - if they want to incrementally optimize that code - they can add a false argument to wchoose. An optional-but-not-neccesary burden on an advanced and uncommon use-case seems better than a functionally-required burden on the majority of simple use-cases. The mistake / oversight case (e.g. wrong/missing flag) for the former is slightly slower performance - for the latter, it’s undefined behavior, a far worse outcome.

julian · May 31, 2024, 7:14am

I’d say, if we would have an in-place version of normalizeSum that is much more efficient, then we could just auto-normalize wchoose and add a method wchooseN that doesn’t do it, for the more extreme cases.

The disadvantage is that it might make some code less efficient, but if the normalisation is much cheaper than now, it may still be acceptable.

Hm, many pieces are ear-tuned rather than correct, and I am not in the position to tell a composer what is right. For me this is an argument for not touching wchoose, but having a wchoosen that does it.

Also, normalising in place may break code that continues to use the weights array for other purposes and assumes it is unchanged.

P.S.

array normalizeSum is less expensive than array normalize (which makes sense)

// normalizeSum
(
f = { |n| 
	var w = { 1.0.rand }.dup(n);
	var wn = w.normalizeSum;
	var a = { bench({ 100.do { wn.windex } }, print: false) }.dup(10).mean;
	var b = { bench({ 100.do { w.normalizeSum.windex } }, print: false) }.dup(10).mean;
	b/a
}
)

f.(5)
f.(15)
f.(115)


// normalize
(
f = { |n| 
	var w = { 1.0.rand }.dup(n);
	var wn = w.normalizeSum;
	var a = { bench({ 100.do { wn.windex } }, print: false) }.dup(10).mean;
	var b = { bench({ 100.do { wn.normalize.windex } }, print: false) }.dup(10).mean;
	b/a
}
)

f.(5)
f.(15)
f.(115)