Copying Dictionary seems to "link" contained array

Hi,

I’m instantiating several Dictionaries using one as a blueprint and I noticed that Arrays contained in the Dictionary seem to be somehow liked.

Here’s an example of what I’m observing

(
var p, q;
p = Dictionary.with(*[
	\values -> [],
	\name -> "yo"
]);

q = Dictionary.newFrom(p);

"CHECK VALUES\n---".postln;
p[\values] = p[\values].add([1,2,3]);
q[\values].postln;
p[\values][0] = [4,5,6];
q[\values].postln;

"\nCHECK NAME\n---".postln;
q[\name].postln;
p[\name] = "hello";
q[\name].postln;
)

and this is what I’m getting on the console

CHECK VALUES
---
[ [ 1, 2, 3 ] ]
[ [ 4, 5, 6 ] ]

CHECK NAME
---
yo
yo

So, the first element contained in p[\value] (which is another Array) is exactly what I’m getting when calling q[\value][0].
This doesn’t seem to happen with other Objects (eg Strings).

I’m sure I’m missing something obvious here but, atm nothing springs to mind tbh, so I thought I’d ask here.

I should add the only the first elements of p[\values] and q[\values] are linked

EDIT: this is NOT correct, they all seem to be linked

Hello,

is this what you want ?

(
var p, q;
p = Dictionary.with(*[
	\values -> [],
	\name -> "yo"
]);

q = p.deepCopy;

"CHECK VALUES\n---".postln;
p[\values] = p[\values].add([1,2,3]);
q[\values].postln;
p[\values][0] = [4,5,6];
q[\values].postln;
p[\values].postln;

"\nCHECK NAME\n---".postln;
q[\name].postln;
p[\name] = "hello";
q[\name].postln;
p[\name].postln;
)

from the documentation:
.deepCopy:
Recursively copies the object and all of the objects contained in the instance variables, and so on down the structure. This method works with cyclic graphs.

The problem here is:

I hope it helps

1 Like

amazing!!!
I suspected it was passing a reference to the Array but, I wasn’t sure.
I mean, in SC everything’s an Object and so I naively thought “well, if Integer, String, etc. are fine it means it’s not passing things by reference!”.

I’ll check this out and update the thread once it’s done.

Thanks, this does work but, I still think that based on what I described and said in my previous post, this might be seen as a bit of a counter-intuitive behaviour that Dictionary.newFrom exhibits.
Simply because you get that with Array (and probably Collection and other subclasses) but, not with other Objects (eg Integer and String).

A week or two ago, I said in another thread that it’s becoming more common for people to find fault with SC over confusion with its behavior. Here’s a good example ^^.

Confusion is a natural part of learning any language, including programming languages.

Similar case:

a = [1, 2, 3];
b = a;
b.put(1, 10);

a  // what is this answer going to be?

In this case, some percent of people would expect [1, 2, 3] (“b is a different name, so it should be a distinct object”) and some percent would expect [1, 10, 3] (“b just points to the same thing a points to”). (The latter is correct in SC, but IIUC the former would be correct in Haskell – sort of – really correct in Haskell would be that mutating the array creates a new array that is distinct from the old one still referenced under a.)

The point is, this is just how we learn. We make assumptions, try it, and if the result doesn’t match the assumption, then we have to discard the assumption. That is: learning a programming language will, at some times, make you feel stupid (just like I felt dumb on multiple occasions while learning Pure Data in the last couple years) – and this isn’t necessarily a problem with the language design.

Not true.

// (the string needs to be mutable for this demo)
a = Dictionary["a" -> 1, "b" -> ("a" + "string")];

b = a.copy;

b["b"].put(0, $A);  // mutate the string

a
-> Dictionary[ (a -> 1), (b -> A string) ]  // changed

The distinction is between atomic values (integer, float, char, symbol, Boolean) and slotted objects (collections – Strings are collections of characters!, and every object with instance variables). “Slots” are always by reference. This isn’t going to change in SC3.

hjh

I had precisely the opposite problem to this. I had a dictionary with an array, d=Dict[(a->[ ] )] and somehow x=d["a"] ; x.add('jkjk'); didn’t stick. So I readded a new array. I assumed it was a mistake on my part rather than a supercollider thing. Basically does a realloc mean that the user should put x=x.add("jkjk") just to be safe?

I absolutely agree. Let me be clear, my post wasn’t about a bug in SC or SC needing to change.
As part of a learning curve I just need to understand why things happen, that’s all.
Just checking what works and what does not work wouldn’t cut it. I need to see what happens under the hood if you know what I mean.
Hence why I wrote here, instead of reporting a bug on GitHub :wink:
And the answer I was looking for is this

The distinction is between atomic values (integer, float, char, symbol, Boolean) and slotted objects (collections – Strings are collections of characters!, and every object with instance variables). “Slots” are always by reference

But again, I never expressed the desire to change SC3 :wink: I think this behaviour makes perfect sense, once you know what’s going on.

Fair – glad to help. I was thrown by the word “counterintuitive,” which could perhaps imply that some other behavior would not be “counter” (though it doesn’t necessarily imply that).

To check your reasoning…

// EDIT: Fixed typos that I shouldn't have copied/pasted
// edit edit: revise the example to *actually* reproduce the behavior...
d = Dictionary[("a" -> [\a, \b, \c, \d])];
x = d["a"] ;
x = x.add('jkjk');

Now, what is the final state of this system? What is d, and what is x?

hjh

1 Like

This may add to any confusion, but Barbara Liskov’s remarks on the CLU evaluation strategy are precise and I think helpful!

We call the argument passing technique call by sharing, because the argument objects are shared between the caller and the called routine. The technique does not correspond to most traditional argument passing techniques (it is similar to argument passing in LISP). In particular it is not call by value because mutations of arguments performed by the called routine will be visible to the caller. And it is not call by reference because access is not given to the variables of the caller, but merely to certain objects. (Liskov, 1979)

Maybe a preamble, in some languages adding to an array gives a new array, x=x.add('new entry') is a must. The part with the reference passing is a little tricky because you get this:

(
d = Dictionary[(\a -> [\a, \b, \c, \d]),(\b->[])];
x=d[\a];
x.add("wefwef");
"added wefwef to x. x is % \nd is %\n\n".postf(x,d);
d[\k]=[];
"added empty array k to d. d is %\n\n".postf(d);
x = d[\k] ;
x.add('jkjk');
"added jkjk to x. x is % \nd is %\n\n".postf(x,d);
x.add("jojoba");
"added jojoba to x. x is % \nd is %\n\n".postf(x,d);
x=d[\b];
x.add("b addition");
"added b addition to x. x is % \nd is %\n\n".postf(x,d);
x=x.add("another b addition");
"added another b addition to `x is % \nd is %\n".postf(x,d);
0
)

Which is, getting the part of the dictionary with an empty array returns the reference to the array, but if that array has content, results in a copy instead. This can be a little confusing when I skim through code and the first impression i get arrays are passed by reference and not by copied values.

Hm, didn’t quite get the point… the point is that d["a"] = d["a"].add(...) is a must! Not x = .

Presumably you wanted to preserve the new a array in the dictionary. (Otherwise, what’s the point of having the dictionary at all?) But…

(
d = Dictionary[("a" -> [\a, \b, \c, \d])];
x = d["a"];

"x = %, d = %\n".postf(x, d);
// prints: x = [ a, b, c, d ], d = Dictionary[ (a -> [ a, b, c, d ]) ]

x = x.add('jkjk');

"x = %, d = %\n".postf(x, d);
// prints: x = [ a, b, c, d, jkjk ], d = Dictionary[ (a -> [ a, b, c, d ]) ]
)

In the last line, x points to the new array, but d[“a”] points to the old array, without ‘jkjk’. I’m pretty sure this is not what you wanted.

If you want to update the contents of the dictionary, then you have to assign into the dictionary. It is not sufficient to assign to a different variable which happens to have been assigned to an object that was held in the dictionary. There is no magical link between x and d["a"]. x = d["a"] only means that these two storage locations happen to point to the same, identical object – for now (it is not guaranteed that this will be true later, in case of reassignment). It does not mean that they are now the same storage location.

hjh

yes you are right x=x.add("another b addition"); wouldn’t add into the dictionary. I added that in to cover the case. The part to highlight is this:

d[\k]=[];
"added empty array k to d. d is %\n\n".postf(d);
x = d[\k] ;
x.add('jkjk');
"added jkjk to x. x is % \nd is %\n\n".postf(x,d);
x.add("jojoba");
"added jojoba to x. x is % \nd is %\n\n".postf(x,d);

In this case x is not reassigned to a new array. But exhibits the ( return array if array empty, return copy otherwise ) behaviour. I didn’t read the documents carefully, so it might have been explained, but this nuance made me look through my code for some time.

The correct explanation of the behavior is:

  • If the array includes slots that are unused, but allocated (x.size < x.maxSize), then .add mutates and returns the existing array.
  • If the array is already using all allocated slots (x.size == x.maxSize), then .add create a new array where newArray.maxSize == x.maxSize.nextPowerOfTwo (I think) and returns this new array.

Now… if you use List instead of Array, then this problem almost goes away (except that .collect, .select, .reject etc then return an Array, because a List’s .species is Array – I usually don’t disagree with JMc’s decisions, but this decision, to revert Lists to Arrays for no reason that has ever been clearly explained, is baffling).

Much of the problem, then, is that Array really should be considered a low-level, implementation object, and there should be a higher-level interface that hides the implementation details. But unfortunately, since you can’t rely on a List to remain a List, it’s inconsistent. (In many cases in object-oriented programming, it would be natural and expected to get confused when inappropriately using low-level objects for high-level purposes. One problem here is that we habitually do exactly that, instead of fixing the class library to make it feasible to use List more generally.)

hjh

2 Likes

:slight_smile: I see you’ve also had that feeling …

Actually, sclang is the only high-level language I know that does this. (Does anyone have a counter-example?) Personally, I do think that the behavior of Array is counter-intuitive/surprising for anybody coming from other programming languages.

And in case anybody has been wondering about the reason for this behavior:

Unlike List, Dictionary, etc. – who own an array of objects/values – , Array really is an array, meaning that the object itself and its elements are layed out consecutively in memory. This makes array operations a little bit more efficient because you save a pointer look up (= possible cache miss). The downside is that reallocation is not really possible since it might move everything to a different memory location, which would in turn change the object’s identity; that’s why any operation that increases an Array’s capacity has to return a new Array. Note that this is not necessary for operations that decrease an Array’s size; in this case only the size member is decremented, but the actual memory is not deallocated.

2 Likes

It’s a mistake in the language design. Even in the case of arrays it could have been easily avoided with foresight. It could also be remedied without much downside. Array is not a low level object, not in design or usage. It’s a confusing mix of fixed size array, fixed size vector, and dynamic size vector. No other language should or would want to emulate that.

1 Like

WRT arrays, still, though, there are two rules that handle the mess to enough of an extent that you can at least get stuff done: 1/ always reassign the result of .add and 2/ reassign it to the place where you need it.

Also that’s a different issue from the issue of multiple references to the same object, which is where this thread began. (The OP expected multiple references to point to distinct objects, but they don’t.) References in general seem to be confusing to new users and I’m not sure there’s really any way to clear that up: no matter which way it works, someone will imagine it to be the other way.

hjh

1 Like

The OP expected multiple references to point to distinct objects, but they don’t.

That’s correct!
One thing I’d like to say about expectations though. I’m absolutely familiar with the concepts of passing by value and passing by reference having used other high level programming languages.
What I found unusual at first is that Arrays were getting this treatment and things like Strings weren’t (that I assumed were mutable Arrays of chars) but, they didn’t behave the same (I’ll get back to this at the end of this message :wink: ).
I’m genuinely new to SC, started looking at it 2 months ago, so I’m still making a lot of assumptions and getting mixed up with other languages like Lua and Python. And I was thinking well, everything’s an Object so… Of course I was wrong, and now I know why.
About Strings, you shared an interesting case where they get the same treatment as Arrays but, that seems to (only?) happen when you concatenate them.
I was curious about that, why is that the case?

I was probably thinking of strings in java. Since they’re supposed to be immutable and concatenating them requires a new string, also known as an array of chars in other languages. And i guess arrays in java are also immutable, so they don’t work like std::vector.