Custom Class as a key in an identity dictionary isn't working?

jordan · March 18, 2023, 11:44am

Hello,
I can’t get identity dictionary to work with a custom class as a key. I’ve tried many things, I though I might need some magic method, so I’ve implemented what I think would be correct, however it still does not work.

Does any one have any ideas?

Class:

KeyTest {
	var symbol;
	*new {|sym| ^super.newCopyArgs(sym) }
	asSymbol { ^symbol }

	prBinaryOpOnSymbol {|other, opsym|
		^this.asSymbol.perform(opsym, other.asSymbol)
	}
	== {|other| ^this.prBinaryOpOnSymbol(other, '==')}
	!= {|other| ^this.prBinaryOpOnSymbol(other, '!=')}
	==={|other| ^this.prBinaryOpOnSymbol(other, '===')}
	!=={|other| ^this.prBinaryOpOnSymbol(other, '!==')}
	<  {|other| ^this.prBinaryOpOnSymbol(other, '<')}
	>  {|other| ^this.prBinaryOpOnSymbol(other, '>')}
	<= {|other| ^this.prBinaryOpOnSymbol(other, '<=')}
	>= {|other| ^this.prBinaryOpOnSymbol(other, '>=')}

	hash{ ^symbol.hash }
	basicHash{ ^symbol.basicHash }
	identityHash{ ^symbol.identityHash }
}

Test:

i = ( KeyTest('/test') : 1, KeyTest('/test/2') : 2) 

i //returns :  ( KeyTest.new: 2, KeyTest.new: 1 )
i.at(KeyTest('/test')) // nil  -- should be 2



// these work as expected
KeyTest('/test') == KeyTest('/test')
KeyTest('/test') === KeyTest('/test')
KeyTest('/test') != KeyTest('/test/2')
KeyTest('/test') !== KeyTest('/test/2')

Spacechild1 · March 18, 2023, 12:05pm

IdentityDictionary, as the name implies, compares keys by identity – not by equality! It definitely works with all kinds of objects, but you have to use the same object instance as the key, e.g.:

a = Foo();
e = ();
e[a] = 1;
e[a]; // yields 1

// the following works as well:
e = ( a : 1 );
// but it is different from
e = ( a: 1);
// so better don't use that notation because in the future someone (maybe yourself) might remove the seemingly redundant whitespace before the colon and break the code.

jordan · March 18, 2023, 12:25pm

Is there no way for KeyTest to deffer its identity to the underlying symbol?

I thought that was what overriding identityHash and === would do?

jordan · March 18, 2023, 12:29pm

This works, but seems dumb…

KeyTest {
	classvar keyTestIdentityStore;
	var symbol;
	*initClass { keyTestIdentityStore = () }
	*new {|sym|
		if(keyTestIdentityStore.includesKey(sym).not, {
			keyTestIdentityStore[sym] = super.newCopyArgs(sym);
			keyTestIdentityStore[sym].freeze;
		});
		
		^keyTestIdentityStore[sym]
	}
	asSymbol { ^symbol }
	
	prBinaryOpOnSymbol {|other, opsym|
		^this.asSymbol.perform(opsym, other.asSymbol)
	}
	== {|other| ^this.prBinaryOpOnSymbol(other, '==')}
	!= {|other| ^this.prBinaryOpOnSymbol(other, '!=')}
	==={|other| ^this.prBinaryOpOnSymbol(other, '===')}
	!=={|other| ^this.prBinaryOpOnSymbol(other, '!==')}
	<  {|other| ^this.prBinaryOpOnSymbol(other, '<')}
	>  {|other| ^this.prBinaryOpOnSymbol(other, '>')}
	<= {|other| ^this.prBinaryOpOnSymbol(other, '<=')}
	>= {|other| ^this.prBinaryOpOnSymbol(other, '>=')}
	
	hash{ ^symbol.hash }
	basicHash{ ^symbol.basicHash }
	identityHash{ ^symbol.identityHash }
}

Edit: made the cached keys frozen

scztt · March 18, 2023, 12:34pm

=== not generally used for dictionary storage, because it’s not efficient (you’d need to compare a key to every key in the dictionary, so it’s order(n), meaning bigger dictionaries get slower and slower) - but of course that’s a bit of a deep data structures fact and not at all intuitively obvious if you’re not thinking about it from an efficiency perspective.

identityHash should probably be used by IdentityDictionary - this would make conceptual sense, so you’re not wrong to assume this. But instead, the primitives for IdentityDictionary are calculating the identityHash of the key itself rather than deferring to the method for that type.

It’s actually an exceedingly small change to fix this (e.g. to use the identityHash method), but this is such deep functionality in SuperCollider that… the risk of bugs or performance degradation might be too high for it to be worthwhile?

Spacechild1 · March 18, 2023, 12:40pm

identityHash should probably be used by IdentityDictionary - this would make conceptual sense, so you’re not wrong to assume this.

Hashing alone is not sufficient, you also need to compare the object (to resolve hash collisions), so the IdentityDictionary primitives would also need to consider the === method.

the risk of bugs or performance degradation might be too high for it to be worthwhile?

Very likely

Spacechild1 · March 18, 2023, 12:40pm

@jordan instead of trying to abuse IdentityDictionary for something it hasn’t been designed for, why not use a plain Dictionary instead?

scztt · March 18, 2023, 12:45pm

For anyone that wants to explore SuperCollider internals in a relatively easy way, this would probably “fix” IdentityDictionary to work like @jordan expects. Again, it MIGHT not be advisable to make this change purely from a risk perspective, but it’s a worthy exploration.

Calculate the identityHash on the language side, as part of the three core IdentityDictionary access methods (at, put, and putGet):

put { arg key, value; ^this.prPut(key, key.identityHash, value }

prPut { arg key, identityHash, value;
	_IdentDict_Put
	value ?? { this.removeAt(key); ^this };
}

Refactor arrayAtIdentityHashInPairs to take the passed-in identity hash rather than calculating:

int arrayAtIdentityHashInPairs(PyrObject* array, PyrSlot* key, PyrSlot* identityHash) {
    PyrSlot *slots, *test;
    unsigned int i, start, end, hash, maxHash;

    // hash = calcHash(key); // old implementation....
    hash = slotRawInt(identityHash);
    maxHash = array->size >> 1;

    // ...

… and then pass in your identityHash argument.

Probably this should be done nearly everywhere calcHash is called, since each use of this COULD be implicitly skipping identityHash behavior - I don’t know how easy this change would be, or whether it would be desirable in every case.

There’s another piece of this, which is that — identityHash is not really safe for hashing Dictionary keys in case where you have different types of object as keys in one dictionary. identityHash only really guarantees identity comparisons for the same primitive type (e.g. object <> object or integer <> integer), it’s really easy to have hash collisions across types:

$a.identityHash == 97.identityHash // oops!

jordan · March 18, 2023, 12:47pm

What I am trying to do is validate that all the keys (which are ultimately Symbols) are osc addresses. Since the set OscAddress is a subset of the Symbol set, I would argue this is the intended purpose, and a violation of the Liskov substitution principle. Now, I haven’t been able to inherit from Symbol as the *new method cannot be used. Perhaps altering this could be another option?

jordan · March 18, 2023, 12:52pm

I probably will, thanks! Or since, I’m just after validation, I might just let the *new method return the symbol.
Although, then I’d have to run the validater everywhere.

Spacechild1 · March 18, 2023, 12:52pm

@scztt You would also need to override slotEq for hash collision resolution.

Generally, I don’t think it’s a good idea to let the user redefine the meaning of object identity in a language.

Spacechild1 · March 18, 2023, 12:55pm

It’s also very much possible to have hash collisions with the same type! That’s why hash table operations also need to compare the (raw) object values (see slotEq)

jordan · March 18, 2023, 1:02pm

Thank you @Spacechild1 & @scztt for your help. I’m going to go with the cache method, although ideally, I’d like to be able to inherit from Symbol. Overriding the identityHash was just a hack to get around that.

I’ll leave this without a correct solution marked because I don’t think there is a clear cut one.

scztt · March 18, 2023, 1:03pm

I agree that 95% of the time i would advise someone not to do this, because it’s so easy to get things wrong. But ---- custom hashing is a pretty standard feature with hash map data structures, and is useful for things like the original problem case (e.g. container objects whose identity should just be the identity of their contents). I realize now that I’ve also run into this issue, and I may even still have one or two bugs floating around in one project of mine related to this?

scztt · March 18, 2023, 1:08pm

Eesh, you’re right. That would probably block this feature, since that slotEq should PROBABLY be routed to an object’s === method, which gets is back to the original problem.

This is a meta-problem that pops up in sclang sometimes… namely that the language design implies that it’s OO all the way down, and everything is overridable. This is only mostly true, except when it’s not - which creates very confusing problems like this and a few others.

Spacechild1 · March 18, 2023, 1:09pm

(e.g. container objects whose identity should just be the identity of their contents)

but… this is object equality. We already have a hash map that supports custom hashing and uses equality: Dictionary.

Object identity has a very specific meaning and is deeply tied to the underlying object model.

This is a meta-problem that pops up in sclang sometimes… namely that the language design implies that it’s OO all the way down, and everything is overridable. This is only mostly true, except when it’s not - which creates very confusing problems like this and a few others.

Well said!

Spacechild1 · March 18, 2023, 1:19pm

Generally, IdentityDictionary and Events should be viewed as a (faster) specialization of Dictionary and they are primarily designed to be used with Symbol keys.

Personally I would only use IdentityDictionary if
a) my keys are only symbols and numbers or
b) my objects may have a custom equality operator and I really want object identity.

scztt · March 18, 2023, 1:22pm

Yeah, sub-classing Symbol is definitely a no-go because it’s a primitive type.

If your goal is to place some limits / validation on what can be used as a key, I would maybe suggest instead sub-classing Dictionary or IdentityDictionary, and validating in the put method, to ensure all the keys you use are of the expected type / contain the right kind of thing (OSC address).

Spacechild1 · March 18, 2023, 1:31pm

As another side note: different languages may define object identity slightly differently.

Here’s a fun example:

// sclang
a = 100;
b = 100;
a == b; // equality --> always true
a === b; // identity --> always true

vs

# Python
a = 1000
b = 1000
a == b # equality --> always True
a is b # identity --> False

The reason is that in Python object identity is defined as pointer equality and Integers are heap allocated. In sclang, however, Integers are always value types and therefore always considered identical.

It gets even more confusing:

# Python
a = 5
b = 5
a is 5 # --> True

This is because of small integer caching (python - What's with the integer cache maintained by the interpreter? - Stack Overflow).
This does not apply to Floats, though:

a = 5.0
b = 5.0
a is b # --> False

Summary: be careful with object identity

jordan · March 18, 2023, 3:02pm

Something else that is related and a little annoying is when you try to convert from an Event to a Dictionary, it seems like .asDict would be the correct call, but this returns an IdentityDictionary.
This seems odd, because these are sibling classes, so I can understand an Event being returned (because it all ready was a subtype of Dictionary), or a Dictionary (the parent type), what doesn’t make sense is why IdentityDictionary is returned. If IdentityDictionary should be reserved for symbols only, why does this default to it?