Choosing how to choose with self-normalising weights

Thank you, yes, (low) friction is important.

So you’d transition as follows

// less friction, but opaque `true` in the end result
[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].choose([0.125, 0.625, 0.25], true); // finish

The alternative would be:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

Or, another one:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].wchooseN([1, 5, 2]); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

As we have it now:

// more friction but clearer
[100, 200, 300].choose; // sketch
[100, 200, 300].wchoose([1, 5, 2].normalizeSum); // tweak
[100, 200, 300].wchoose([0.125, 0.625, 0.25]); // finish

Yes @Julian I would prefer the first version. For my purposes I would rarely end up doing the tweak as choose operations are usually happening for me at sub-audio rates and usually selecting from a small number of choices. Where I would optimize would be in those cases with large Array sizes or very frequent calls.

For reading I agree the 'true` flag is opaque - but I would still prefer this functionality grouped together in a single method in the help and the choices clearly laid out when the IDE provides the signature.

@mike looking at prArrayNormalizeSum first we have to sum the array - calculate the reciprocal and then multiply each element in the array by that reciprocal. For Array size n that’s n adds n mults and one divide. prArrayWIndex uses a random number between 0.0 and 1 for the index and then sums the Array elements until that index is exceeded - If we implement the equivalent of prArrayWIndex for non-normalized weights we could at least simply sum the array and then scale the index by that sum - the rest of the procedure would be the same - so that’s just n additional adds and one mult. which should be something like 4x faster (I would guess!)

My understanding is that this usage is actually

[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweaking
// no need for a finishing step (default behavior is normalize: true)
// optional optimization (special cases):
[100, 200, 300].choose([0.125, 0.625, 0.25], false);

// less friction, but opaque true in the end result

By the same logic any argument to choose is opaque, including the list of weights. (This is the motivation for adding wchooseN instead.) This opaqueness becomes more apparent when you write it in functional notation:

choose([100, 200, 300], [0.125, 0.625, 0.25], false);

In practice, I don’t consider it opaque if you write:

[100, 200, 300].choose([0.125, 0.625, 0.25], normalize: false);

which is accomplished with just one more keystroke with tab-complete.

1 Like

Nice :slight_smile:
I suppose that optimization could be considered a follow-up to the PR at hand, which is primarily concerned with getting the functionality implemented and the interface worked out, as opposed to reworking primitives.

1 Like

I tried an alternative implementation, complexity increases linearly for this operation, with no climbs or thresholds.

Your test is an O(n+numIter), but I don’t think would be the normal scenario.

Would these ideas also somehow apply to Pwrand and Dwrand?

And is there a good reason to break from the interface that TWindex and TWChoose use?

TWChoose.ar(trig, array, weights, normalize: 0)

When the interfaces are consistent, I am more likely to remember how to use them…

1 Like

Yes, this is not opaque. What I am a little bit uneasy about is that:

  • it suggests the inefficient way as the default (this could be alleviated if we have a very efficient normalizeSum implementation)
  • it is not logical together with wchoose.
  • the defaults are inverted from TWChoose.

So maybe adding a normalize flag to wchoose, by default off, is the most logical and clear solution?

Or making the situation completely symmetric, but with different defaults?

choose(weights, normalize=true)
wchoose(weights, normalize=false)
2 Likes

One more thing I just noticed: there are many classes which implement choose, but where weights wouldn’t make any sense, e.g. Dictionary and Set. While it we have such variance in arguments in a few places (… plot!), it is at least nicer to have a uniform interface.

For reasons like those, I have the feeling it would make sense not to change choose. I don’t have a strong opinion on syntax change, but consistency seems important.

Python was mentioned as an example, but numpy offers a preprocessed option (sum should be 1.0), and this is the one to use for performance-critical tasks and large lists.

random.choices()
numpy.random.choice()

This is the case for polymorphism! The IDE will not offer us weights where they are not needed…

As a sidenote I do wish that Prand was called Pchoose and Pwhite was called Prand (though that would be breaking sadly!) In any case I think it might not be a bad idea to do the same thing in these cases as well . There are far too many pattern variants with acronyms that are really impossible to remember.

2 Likes

Nice! I think this is the best solution yet.

Pros:

  • addresses the primary issue raised by the PR author (a way to normalize weights)
    • relatedly: choose, perhaps the more commonly used/remembered, defaults to “expected” behavior—normalizing by default
  • unifies the interface between choose, wchoose, and TWChoose (and the t/wchoose defaults match)
  • both methods allow for optimization by the user
  • backwards compatible - nothing breaks, including old habits :slight_smile: (wchoose)
  • fits the quick iterative workflow outlined by @semiquaver (common use)
  • leaves room for future optimization without disrupting the interface (as alluded to by @semiquaver and @smoge’s ideating)

Cons

  • the curious user wonders why the redundant functionality and mismatched defaults (so docs explain the legacy behavior, logic of pre-normalization, and accompanying examples show a simple illustrative benchmark)

Missing anything?

A separate initiative/PR could introduce this new interface where it makes sense. Weighted distributions is very “at home” in collections, so it seems unlikely there would be objections to the new feature introduced here and not elsewhere (yet).

Yes!

1 Like

Implementation detail:

choose(weights, normalize=true)

Should probably explicitly specify the default weights:

choose(weights = nil, normalize = true)

An explicit default of nil is unconventional, but will signal that weights are not necessary. Not doing this might imply it’s weights are necessary.

I see no way to implement a weighted choice on a collection that has no way to access a particular item with at. In Dictionary it would be possible, however, by passing a dictionary of weights.

Your pros and cons make me think that even more it would be better to do the following:

wchoose(weights)
wchoosen(weights)

No flag, just add a single letter if you come from wchoose.
If you come from choose, you add one at the beginning and you’ll be prompted two possibilities which you can quickly choose from. Not much to explain, easy to remember.
It smoothly integrates into habits, separates concerns, is idiomatic, and breaks nothing.

I agree this solution shares many of the advantages. At this point it seems to come down to a matter of precedent and personal preference. The mirrored-interface option you proposed was appealing for consolidating rather than proliferating methods/functionality. That said, wchoosen has been my “runner up” choice, so I’d personally be ok with it :slight_smile:

Are we at a point where it makes sense to take “votes”?

The interface for TWChoose is a bit strange though, it takes an integer for normalize where it should really take a boolean value… I also think that the default should be true, for the reason @semiquaver mentioned of rapid iteration. Because of that, I also think that it should be, as @mike wrote:

[100, 200, 300].choose; // sketch
[100, 200, 300].choose([1, 5, 2]); // tweak
[100, 200, 300].choose([0.125, 0.625, 0.25], normalize: false); // finish

A mirrored wchoose like wchoose(weights, normalize=false) would make sense I think… I am indifferent to that.

And with the previous thing in mind, i actually think TChoose could be extended to allow for weights, like this:
TChoose.ar(trig, array, weights=nil, normalize=true).

Why a single method instead of two? For me, it’s about conciseness: a single method like this requires less explanation than two separate methods. I am writing notes while learning SuperCollider and I really value these kinds of things. Also, the n is not that apparent: is that a n for normalize (as in perform the operation) or for normalized (as in the array is already normalized, no need for normalization)?

strongly prefer the mirrored choice over wchoosen!

We need to keep in mind discoverability for new users! Once a user has seen choose they will see an option to add weights and the flag as well - perhaps they will use these in the future. They will only know about the other options wchoose and wchoosen from slogging through the already overlong list of methods for Array and Sequenceable Collection.

Then there is cognitive load - there are so many methods/objects differing by a letter or two - Pbindf Pbindef - I couldn’t tell you now which of these does which - that drive us to consult the docs…

1 Like

Yes, I think, there is no consensus on this.

I think when it comes to this kind of situation re: syntax, there will never be consensus.

That’s why the haskell committee, in the early days, appointed one person per year to have the final word on syntax impasses. Otherwise, they would never agree with all the small details .

This is just because you are not yet used to the way things are on the UGen side. There are no boolean values for UGens, which is why audio signals and numbers are interpreted as flags. This is the same everywhere.

As far as I am concerned, I am not so much in favour of mixing unweighted and weighted choice, because they are conceptually different: for choice you don’t have to be able to index the elements.

2 Likes