Choosing how to choose with self-normalising weights

julian · May 25, 2024, 8:17pm

Thank you, yes, (low) friction is important.

So you’d transition as follows

// less friction, but opaque `true` in the end result
[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].choose([0.125, 0.625, 0.25], true); // finish

The alternative would be:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

Or, another one:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].wchooseN([1, 5, 2]); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

As we have it now:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].wchoose([1, 5, 2].normalizeSum); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

semiquaver · May 25, 2024, 8:43pm

Yes @Julian I would prefer the first version. For my purposes I would rarely end up doing the tweak as choose operations are usually happening for me at sub-audio rates and usually selecting from a small number of choices. Where I would optimize would be in those cases with large Array sizes or very frequent calls.

For reading I agree the 'true` flag is opaque - but I would still prefer this functionality grouped together in a single method in the help and the choices clearly laid out when the IDE provides the signature.

@mike looking at prArrayNormalizeSum first we have to sum the array - calculate the reciprocal and then multiply each element in the array by that reciprocal. For Array size n that’s n adds n mults and one divide. prArrayWIndex uses a random number between 0.0 and 1 for the index and then sums the Array elements until that index is exceeded - If we implement the equivalent of prArrayWIndex for non-normalized weights we could at least simply sum the array and then scale the index by that sum - the rest of the procedure would be the same - so that’s just n additional adds and one mult. which should be something like 4x faster (I would guess!)

mike · May 25, 2024, 9:53pm

julian:

// less friction, but opaque `true` in the end result
[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].choose([0.125, 0.625, 0.25], true); // finish

My understanding is that this usage is actually

[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweaking
// no need for a finishing step (default behavior is normalize: true)
// optional optimization (special cases):
[100, 200, 300].choose([0.125, 0.625, 0.25], false);

// less friction, but opaque true in the end result

By the same logic any argument to choose is opaque, including the list of weights. (This is the motivation for adding wchooseN instead.) This opaqueness becomes more apparent when you write it in functional notation:

choose([100, 200, 300], [0.125, 0.625, 0.25], false);

In practice, I don’t consider it opaque if you write:

[100, 200, 300].choose([0.125, 0.625, 0.25], normalize: false);

which is accomplished with just one more keystroke with tab-complete.

mike · May 25, 2024, 10:07pm

Nice
I suppose that optimization could be considered a follow-up to the PR at hand, which is primarily concerned with getting the functionality implemented and the interface worked out, as opposed to reworking primitives.

smoge · May 26, 2024, 5:13am

I tried an alternative implementation, complexity increases linearly for this operation, with no climbs or thresholds.

github.com/supercollider/supercollider

Make wchoose and windex use relative weights (normalize by default)

opened 04:09PM - 15 May 24 UTC

Asmatzaile

enhancement

## Motivation `wchoose` and `windex` don't normalize; that makes it easy to g…et unexpected results. ``` ( var list, pick; list= [\red, \green, \blue]; pick = list.wchoose([100,200,300]); ) ``` Looking at the above code, one would expect to get `\blue` thrice as often and `\green` twice as often as `\red`. In reality,`\red` is the only element that is picked. The correct option would have been to use `list.wchoose([100,100,100].normalizeSum)`. In practice, all lists with weights that are not normalized (and I assume that they are the majority) will need to have `.normalizeSum` applied. ## Description of Proposed Feature The proposal is to make those functions normalize by default, as other languages as [python](https://docs.python.org/3/library/random.html#random.choices). If efficiency is a concern, a flag can be added for the arguably rare cases where the sum of the list is already equal to one.

Your test is an O(n+numIter), but I don’t think would be the normal scenario.

Eric_Sluyter · May 26, 2024, 7:37am

Would these ideas also somehow apply to Pwrand and Dwrand?

And is there a good reason to break from the interface that TWindex and TWChoose use?

TWChoose.ar(trig, array, weights, normalize: 0)

When the interfaces are consistent, I am more likely to remember how to use them…

julian · May 26, 2024, 12:11pm

mike:

In practice, I don’t consider it opaque if you write:
[100, 200, 300].choose([0.125, 0.625, 0.25], normalize: false);
which is accomplished with just one more keystroke with tab-complete.

Yes, this is not opaque. What I am a little bit uneasy about is that:

it suggests the inefficient way as the default (this could be alleviated if we have a very efficient normalizeSum implementation)
it is not logical together with wchoose.
the defaults are inverted from TWChoose.

So maybe adding a normalize flag to wchoose, by default off, is the most logical and clear solution?

Or making the situation completely symmetric, but with different defaults?

choose(weights, normalize=true)
wchoose(weights, normalize=false)

julian · May 26, 2024, 12:42pm

One more thing I just noticed: there are many classes which implement choose, but where weights wouldn’t make any sense, e.g. Dictionary and Set. While it we have such variance in arguments in a few places (… plot!), it is at least nicer to have a uniform interface.

smoge · May 26, 2024, 1:26pm

For reasons like those, I have the feeling it would make sense not to change choose. I don’t have a strong opinion on syntax change, but consistency seems important.

Python was mentioned as an example, but numpy offers a preprocessed option (sum should be 1.0), and this is the one to use for performance-critical tasks and large lists.

random.choices()
numpy.random.choice()

semiquaver · May 26, 2024, 3:24pm

This is the case for polymorphism! The IDE will not offer us weights where they are not needed…

As a sidenote I do wish that Prand was called Pchoose and Pwhite was called Prand (though that would be breaking sadly!) In any case I think it might not be a bad idea to do the same thing in these cases as well . There are far too many pattern variants with acronyms that are really impossible to remember.

mike · May 26, 2024, 8:00pm

Nice! I think this is the best solution yet.

Pros:

addresses the primary issue raised by the PR author (a way to normalize weights)
- relatedly: choose, perhaps the more commonly used/remembered, defaults to “expected” behavior—normalizing by default
unifies the interface between choose, wchoose, and TWChoose (and the t/wchoose defaults match)
both methods allow for optimization by the user
backwards compatible - nothing breaks, including old habits (wchoose)
fits the quick iterative workflow outlined by @semiquaver (common use)
leaves room for future optimization without disrupting the interface (as alluded to by @semiquaver and @smoge’s ideating)

Cons

the curious user wonders why the redundant functionality and mismatched defaults (so docs explain the legacy behavior, logic of pre-normalization, and accompanying examples show a simple illustrative benchmark)

Missing anything?

A separate initiative/PR could introduce this new interface where it makes sense. Weighted distributions is very “at home” in collections, so it seems unlikely there would be objections to the new feature introduced here and not elsewhere (yet).

Yes!

mike · May 26, 2024, 8:07pm

Implementation detail:

choose(weights, normalize=true)

Should probably explicitly specify the default weights:

choose(weights = nil, normalize = true)

An explicit default of nil is unconventional, but will signal that weights are not necessary. Not doing this might imply it’s weights are necessary.

julian · May 28, 2024, 10:05am

I see no way to implement a weighted choice on a collection that has no way to access a particular item with at. In Dictionary it would be possible, however, by passing a dictionary of weights.

julian · May 28, 2024, 10:05am

Your pros and cons make me think that even more it would be better to do the following:

wchoose(weights)
wchoosen(weights)

No flag, just add a single letter if you come from wchoose.
If you come from choose, you add one at the beginning and you’ll be prompted two possibilities which you can quickly choose from. Not much to explain, easy to remember.
It smoothly integrates into habits, separates concerns, is idiomatic, and breaks nothing.

mike · May 28, 2024, 10:54am

I agree this solution shares many of the advantages. At this point it seems to come down to a matter of precedent and personal preference. The mirrored-interface option you proposed was appealing for consolidating rather than proliferating methods/functionality. That said, wchoosen has been my “runner up” choice, so I’d personally be ok with it

Are we at a point where it makes sense to take “votes”?

Asmatzaile · May 28, 2024, 1:36pm

The interface for TWChoose is a bit strange though, it takes an integer for normalize where it should really take a boolean value… I also think that the default should be true, for the reason @semiquaver mentioned of rapid iteration. Because of that, I also think that it should be, as @mike wrote:

[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].choose([0.125, 0.625, 0.25], normalize: false); // finish

A mirrored wchoose like wchoose(weights, normalize=false) would make sense I think… I am indifferent to that.

And with the previous thing in mind, i actually think TChoose could be extended to allow for weights, like this:
TChoose.ar(trig, array, weights=nil, normalize=true).

Why a single method instead of two? For me, it’s about conciseness: a single method like this requires less explanation than two separate methods. I am writing notes while learning SuperCollider and I really value these kinds of things. Also, the n is not that apparent: is that a n for normalize (as in perform the operation) or for normalized (as in the array is already normalized, no need for normalization)?

semiquaver · May 28, 2024, 1:43pm

strongly prefer the mirrored choice over wchoosen!

We need to keep in mind discoverability for new users! Once a user has seen choose they will see an option to add weights and the flag as well - perhaps they will use these in the future. They will only know about the other options wchoose and wchoosen from slogging through the already overlong list of methods for Array and Sequenceable Collection.

Then there is cognitive load - there are so many methods/objects differing by a letter or two - Pbindf Pbindef - I couldn’t tell you now which of these does which - that drive us to consult the docs…

julian · May 28, 2024, 4:45pm

Yes, I think, there is no consensus on this.

smoge · May 28, 2024, 4:49pm

I think when it comes to this kind of situation re: syntax, there will never be consensus.

That’s why the haskell committee, in the early days, appointed one person per year to have the final word on syntax impasses. Otherwise, they would never agree with all the small details .

julian · May 28, 2024, 4:49pm

This is just because you are not yet used to the way things are on the UGen side. There are no boolean values for UGens, which is why audio signals and numbers are interpreted as flags. This is the same everywhere.

As far as I am concerned, I am not so much in favour of mixing unweighted and weighted choice, because they are conceptually different: for choice you don’t have to be able to index the elements.