Splitting very large arrays

I’m trying to generate arrays containing all possible permutations of n elements. That I got, the problem is that when n gets to, say 10, that’s already above 3.6 million permutations, which means roughly 36 million indices, so my computer (2020 MBP) starts spinning its fans like crazy and that’s about it. n = 11 is nearly 40 million sets, and so on, so it gets much worse :slight_smile:

var a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11];
~b = (a.size.factorial).asInteger.collect{arg i; a.permute(i)};
~b = ~b.flatten.asStream;

Is there a way to calculate/collect that in blocks, making sure each block does not exceed whatever value seems reasonable for array size? that way i could (hopefully) load block-arrays rather than the supermassive one that seems very unmanageable.

Thanks!

start by collecting your results in a single List instead of in Arrays. that’ll give you a massive increase in performance.
a List is what you want for anything that’s growing. see its helpfile for a bit of explanation and how it compares to Array.

( //~8.5sec
{
var a = [1, 2, 3, 4, 5, 6, 7, 8];
~b = (a.size.factorial).asInteger.collect{arg i; a.permute(i)};
~b = ~b.flatten.asStream;
}.bench;
)

( //~0.09sec
{
var a = [1, 2, 3, 4, 5, 6, 7, 8];
var size = (a.size.factorial).asInteger;
var result = List.new(size);
size.do{arg i; result.addAll(a.permute(i))};
~b = result.asStream;
}.bench;
)

then, if/when you run into new problems, you can split up your code into blocks using a Routine. a .wait now and then will give the Interpreter some time for itself to collect garbage and what not.


(
Routine.run({
var a = [1, 2, 3, 4, 5, 6, 7, 8];
var size = (a.size.factorial).asInteger;
var result = List.new(size);
size.do{arg i;
if(i%10000==0, {
"breathe! collected % numbers".format(result.size).postln;
0.1.wait;
});
result.addAll(a.permute(i));
};
~b = result.asStream;
"done".postln;
result.size.postln;
});
)

good luck,
_f

21 juli 2021 kl. 08:30 skrev lioness via scsynth <noreply@m.scsynth.org>:

lionqueen
July 21

I’m trying to generate arrays containing all possible permutations of n elements. That I got, the problem is that when n gets to, say 10, that’s already above 3.6 million permutations, which means roughly 36 million indices, so my computer (2020 MBP) starts spinning its fans like crazy and that’s about it. n = 11 is nearly 40 million sets, and so on, so it gets much worse :slight_smile:

var a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11];
~b = (a.size.factorial).asInteger.collect{arg i; a.permute(i)};
~b = ~b.flatten.asStream;

Is there a way to calculate/collect that in blocks, making sure each block does not exceed whatever value seems reasonable for array size? that way i could (hopefully) load block-arrays rather than the supermassive one that seems very unmanageable.

Thanks!

#|
fredrikolofsson.com musicalfieldsforever.com

1 Like

oops, i now realised that in my example below the initial capacity for the List should’ve been bigger…

var result = List.new(size*a.size);

not a big deal but it’ll reduce the need to allocate more space during the growth.

but i also realised that, as you know the total size in advance, you can just create a single empty Array and .put your numbers in at the correct position.
that then allow you to use a special array class that’s more memory efficient - like Int8Array.


( //~0.08sec
{
var a = [1, 2, 3, 4, 5, 6, 7, 8];
var size = (a.size.factorial).asInteger;
var result = Int8Array.newClear(size*a.size);
size.do{arg i; a.permute(i).do{|b, j| result.put(i*a.size+j, b)}};
~b = result.asStream;
}.bench;
)

so this is probably the better solution if you’re only permuting 8bit values. uses a lot less RAM and you won’t need a growing List because with .newClear we create a fixed size Int8Array in advance.
_f

( //~0.09sec
{
var a = [1, 2, 3, 4, 5, 6, 7, 8];
var size = (a.size.factorial).asInteger;
var result = List.new(size);  //should really be size*a.size
size.do{arg i; result.addAll(a.permute(i))};
~b = result.asStream;
}.bench;
)

1 Like

Not sure what I’m doing differently (“nothing”, i would say!) but when I first tried your code in the morning it was working like a charm, even with a (1…11) array. I’ve been going at it for hours (again, haven’t changed your code one bit other than the initial array) and it seems totally stalled. Tried a few times already. Any clues?

a List is what you want for anything that’s growing

There are a couple of concerns here: 1, ease of adding elements (List is the winner here), and 2, performance (List will never beat Array).

A List is just a wrapper for an array. It’s easier to add things because it does a = a.add(thing) internally, for you, saving you from the trouble of having to remember the reassignment for yourself. (The overhead of reallocating arrays is still there, just hidden from the visible code.) But add dispatches to Array’s add method – two method calls for each add – so it will always be slower.

Five benchmarks:

  1. Wrong use of Array:add. This is the worst in terms of performance, because every expanded array created by add is discarded = huge garbage-collector load. And the final array is the wrong size.
  2. Correct use of Array:add, without pre-allocating the number of elements. a = a.add(...) is necessary.
  3. Correct use of Array:add, with pre-allocation. This is a bit faster because it doesn’t have to garbage-collect intermediate arrays.
  4. List, without pre-allocating.
  5. List, with pre-allocation. For 4 and 5, I believe the overhead of the double method calls outweighs other considerations – comparison of these two is probably random.
(
var n = 10000000;

bench {
	var a = Array.new;
	n.do { a.add(1.0.rand) };
	a.size.debug("unallocated, no reassignment");
};

bench {
	var a = Array.new;
	n.do { a = a.add(1.0.rand) };
	a.size.debug("unallocated, with reassignment");
};

bench {
	var a = Array.new(n);
	n.do { a.add(1.0.rand) };
	a.size.debug("pre-allocated, no reassignment");
};

bench {
	var l = List.new;
	n.do { l.add(1.0.rand) };
	l.size.debug("List, unallocated");
};

bench {
	var l = List.new(n);
	n.do { l.add(1.0.rand) };
	l.size.debug("List, pre-allocated");
};
)

unallocated, no reassignment: 1    // BZZZT wrong
time to run: 2.589702371 seconds.
unallocated, with reassignment: 10000000
time to run: 1.229955393 seconds.
pre-allocated, no reassignment: 10000000
time to run: 0.95807274900005 seconds.
List, unallocated: 10000000
time to run: 2.022845863 seconds.
List, pre-allocated: 10000000
time to run: 2.051731603 seconds.

In the 8.5 second example, it’s actually flatten that kills it. Merely restructuring the code (as fredrik did) to avoid flatten is enough. The distinction between List and Array isn’t really significant here.

( //~8.5sec
var a = [1, 2, 3, 4, 5, 6, 7, 8];
bench { ~b = (a.size.factorial).asInteger.collect{arg i; a.permute(i)}; };
bench { ~b = ~b.flatten.asStream };
)

time to run: 0.067028030000074 seconds.
time to run: 15.500484482 seconds.

hjh

PS Apparently my laptop is not very good :laughing:

1 Like

that’s a fantastic explanation. thank you James.

and just in case - i’d like to mention lazy evaluation here too. as the OP had a .asStream in the example i wonder if maybe all this collecting into arrays and lists could be avoided in the first place…


(
r= Routine({
var n= 11;
n.factorial.do{|i| (1..n).permute(i).do{|x| x.yield}};
}).asStream;
)

r.next;
r.next;
//etc

r.nextN(4); //get chunks

_f

22 juli 2021 kl. 05:08 skrev James Harkins via scsynth <noreply@m.scsynth.org>:

jamshark70
July 22

a List is what you want for anything that’s growing

There are a couple of concerns here: 1, ease of adding elements (List is the winner here), and 2, performance (List will never beat Array).

#|
fredrikolofsson.com musicalfieldsforever.com

1 Like