Tracking key-values over time

shiihs · August 27, 2018, 2:59pm

I’m trying to think of a good enough way to accomplish the following requirements.
Alternative ideas welcome!

I have a collection of approximately 4096 key - value pairs. Both keys and values are integers. (In case you wonder: the keys are parameter numbers (midi nrpn numbers), and the values are the corresponding parameter values.)
Occasionally one or more of the values corresponding to one of the keys changes. This can be around 5 times per second during “busy” moments, but can also be once every 15 minutes in “idle” moments.
How can I best store this data so I can perform lookups like: “Give me all parameter values as they existed 34 seconds ago”? or less important for my use case, but nice to have: “Give me all parameter values as they existed at 14:23:01.365”?

I’m considering creating some kind of “TemporalDictionary” that stores not just a value for each key but instead collects a list of [relative timestamp since creation of the collection, list of reversible operations] (think: “[after 23 seconds, [replace value nil for key 34 with value 26, replace value 67 for key 23 with value 87]]”) instead, so at any point in time I can calculate forward or backward to a desired time stamp, reconstructing the entire parameter space, but perhaps something like that (or something better) exists already?

spacetimed · August 27, 2018, 4:32pm

Sounds like a fascinating project, I’d love to hear how this sounds. Reversible operations make me think of the command pattern. It’s essentially as you describe so it’s probably the right solution, you’d have your state and a controller that knows how to mutate/create a new state according to commands and associated undos. It’d then store all of the commands in order with a time stamp and observe a main time variable for when to do or undo these.

If you’ve ever heard of Redux, it’s a popular state management library for JS, it seems to basically works like this but without your observable controlled time variable. I don’t know if there’s a dedicated state management quark, but one could certainly make this sort of pattern. A solution that doesn’t reinvent the wheel might be to make use of a state management system in another language for all of this data stuff and then just communicate the changes in value at any given time via OSC for SC to make sound with.

Please do update on this if you don’t mind, it sounds really cool!

dkmayer · August 27, 2018, 8:04pm

There are tools and theories for state information data structures, but if you want to build something from scratch in SC I think it shouldn’t be too hard with a compound data structure involving PriorityQueue.

.) (very rough): store current changes and whole state for each time key
.) (still rough): store current changes and changes since start for each time key
.) store current changes for each time key and modify current state by forwarding and backwarding

cian · August 28, 2018, 3:06pm

There’s no easy solution to this unfortunately. The main issues you have are (1) you don’t have a key value for time (you want the value that matches a particular time range) and (2) not running out of memory. Fortunately you also know in advance what your data type is (a bounded array of 4096 key values) which simplifies things considerably (believe me - you don’t want to recreate the diffing algorithm from React).

Base data type is an array of length 4096. Each element of the array corresponds to the nrpn number. Each array in turn contains a list (it has to be a list as you need a data structure that will grow over time) where each element in the list is an array with a maxsize of 2 (this array will store the time and value). Each time you have a new time/value pair, add it to the end of the list in one of these 2-element arrays.

Now comes the tricky bit. You have to write an algorithm that can search these lists and find the right value for a particular time. You will want to use a binary search that starts in the middle and goes left if the time is newer than the time it is looking at, and right if the time is older. Your termination point in the algorithm is either when the time matches the time you’re looking for, or when the time lies between the time for the element to the left and the current element, or the element to the right and the current element.

Then your main algorithm (used to get a snapshot of the state at a particular time) will have to go serially through the different lists, get the corresponding values for the chosen time value and return a new array to the user.

Problems you will run into. At some point you will run out of space if this program is running constantly. If it isn’t running constantly just make sure this isn’t going to be a problem for you.

shiihs · August 28, 2018, 4:21pm

Hmm I’ll try to remain a bit lighterweight for now. I’ve used OSC in the past to communicate between processes but it may be a bit of overkill for what I’m trying to do here (which is just an experiment anway).

After looking at the docs for PriorityQueue, I’m not sure it is my best option here. (But very cool that I got to know it exists!) It seems to allow for forward iteration only, and only allows iteration by popping stuff in order. Unless I start to mess with its private methods I’m not sure it will be what I need. I think I’ll stick to an array for now. As events are added chronologically, the array will automatically remain sorted by time, and it should therefore be fairly easy (or at least possible ) to do a binary search to find an array index corresponding to a time value.

Thanks, this is a nice idea for speedup of what I considered the most problematic part in my proposed approach.

True. Although probably it’s possible to squash the oldest events into some “initial state” to remain below a configurable maximum size if I hit that problem. But anyway I don’t expect to collect Gigabytes of information (certainly not in my initial experiments).

Thanks all for the nice ideas so far!
I’ll update this thread when I make some progress.

cian · August 28, 2018, 5:01pm

PriorityQueue also has a fixed size that you have no control over, so it wouldn’t be appropriate here.

Geoffroy · August 28, 2018, 6:26pm

You could also install a time-series database on your desktop, and query it with a REST interface from SC Lang.

Examples

Redis
Prometheus
Elastic Search
Rethink DB
but there are many others…

scztt · August 28, 2018, 7:21pm

I think the data structure you want is probably Order, in either a fixed array of 4096 (if more than a trivial number of them will be filled and memory usage isn’t a concern).

~paramHistory = 4096.collect { Order() };
~addValue = {
   | id, time, value|
   ~paramHistory[id][time] = value;
}
~getValues = {
  | id, startTime, endTime |
  var result = List();
  ~paramHistory[id].doRange({ |value, time| result.add(time -> value) }, startTime, endTime);
  result;
}

It’s not very clear in the documentation but Order can have floating point indexes, which covers your time use-case. The one caveat is - I believe Order only wants one item per index, so if you have the possibility of more than one parameter value per time per controller ID, you’ll need to store arrays of values rather than single values.
This could probably be optimized some - the implementation details of Order are entirely inside SparseArray.sc file. If I remember correctly, Order is O(1) for adding things to the end of the list (e.g. times are always increasing), which is probably the case for you. Getting things back out is much slower
I believe lookups of items are already doing a binary search via a primitive (though this is worth double-checking - it’s the _ArrayIndexOfGreaterThan primitive), so no need to re-implement that yourself.

If your lists are getting very large, you’ll probably want to swap out Orders every N minutes to make it easier on allocation and the GC. That should be easy - just something like ~paramHistory = [4096.collect { Order() }], where you could e.g. ~paramHistory = ~paramHistory.add(4096.collect { Order() }) every ten minutes or whatever. Back-indexing single items would be easy - ranges that spanned over multiple blocks would take a little doing and be slower - probably it would just take some blackbox testing to find the method that is most performant for your use case.

dkmayer · August 28, 2018, 9:10pm

It’s dynamically adapted (as List), it can handle large sizes and you don’t have to care about that.

(
p = PriorityQueue.new;
a = List[];

(2 ** 24).do { |i| p.put(i,i) };
)


while { p.notEmpty }{
    var next = p.topPriority;
    a.add(next);
    p.pop;
};

a.size;

a.last;

But I think that @scztt is right that Order would be better suited here. Practically a solution with Array / List would certainly also be fine.

shiihs · August 29, 2018, 8:58am

Oh that looks very useful. I’ll look into it.

cian · August 29, 2018, 12:54pm

Oh you’re right @dkmayer - I guess I misremembered, or was thinking of the server queue maybe. It’s a great class to use when one can use it as it’s super optimized.

cian · August 29, 2018, 1:44pm

I’m not sure that Order is the right approach.

g = Order.new;
g.put(7, 100); // put a value (100) at index 7

g.put(0.0, 45);
g.put(0.4, 56);
g.put(1.1, 48);

g.at(0.5)

This returns: nil
Which is probably not what you want. Order (and SparseArray) are for implementing discrete data structures, whereas what you want here is something that can represent a continuous data structure.

But that’s pretty easy. Write a new class that is derived from Order and just change ‘at’ so that it returns a value, rather than nil.

For performance/space - honestly if that became a real problem (and I suspect if you went with the approach I sketched out above, combined with Scott’s idea of periodically swapping out your latest data structure, you’d be fine) - I’d start looking using some kind of tree/heap data structure. Clojure uses an immutable array, which is backed by a tree), and that’s perfect for this kind of problem.

I believe lookups of items are already doing a binary search via a primitive (though this is worth double-checking - it’s the `_ArrayIndexOfGreaterThan` primitive), so no need to re-implement that yourself.

This is correct. Wish I’d know that there was a binary search hidden in SuperCollider before. Because obviously re-implementing my own binary search is my favourite thing to do on a Saturday night.

scztt · August 29, 2018, 7:41pm

I guess what I meant is: Order is the correct existing data structure to use, since it allows arbitrary float keys and fast lookup. There’s nothing else that provides this, and any synthetic solution will basically do the same thing (keep a sorted list of indices, plus a map-like list of values for each index).

The additional step you’d need to implement “hold” behavior (e.g. finding the value at the nearest data point before a time) is to use something like:

order.array[order.indices.indexOfGreaterThan(time) - 1]

I checked and the _ArrayIndexOfGreaterThan does NOT look like it’s doing a binary search, since it can’t know if the list it’s searching is ordered (a prereq for binary…). There’s a definite speed improvement by doing a version of _ArrayIndexOfGreaterThan that does a binary search and assumes order (I tested a simple binary search and it IS faster - I can post code if it would help).

I don’t think it’s really the right approach to add another class for this - all you need is a indexOfGreaterThan method (or indexOfLessThan, interpolateAt(time, \hold), something like that) that is optimized for Order. This is a reasonable piece of information to request from an Order as it is. Of course, it’s up to @shiihs what works best for their own private codebase, since they probably have more functionality to add than just grabbing values.

scztt · August 29, 2018, 7:45pm

The indices array in Order should probably be a SortedList - inserts to SortedList are I think O(log(n)) whereas it looks like inserts to Order:indices are O(n) now.
SortedList:indexForInserting already provides a binary search for an index, so this could be used as a faster alternative to indexOfGreaterThan.

Even with this, the implementation of Order still feels non-optimal… I haven’t fully thought it through but, shouldn’t it be something like this?

indices = SortedList(); // preferably a more optimized SortedFloatList, if it existed
array = IdentityDictionary();

Then you get:

put {
    | index, object |
    indices = indices.add(index);
    array[index] = object;
}

…which is O(log(n)) for the indices add (or could be optimized to O(1) if you’re adding to the end), and O(1) for array add. doRange is O(log(n)) to find the starting point of your iteration, and O(1) to step through it. at is just an IdentityDictionary:at, which is O(1)-ish.

shiihs · August 30, 2018, 7:12pm

I’ve created a small quark with my attempt here:

It uses my proposed approach of storing a list of reversible commands, and in addition uses a
binary search to map time to index. It comes with some unit tests to check the functionality (but limitations and bugs may still surface once I integrate it in a bigger system). Performance-wise there’s probably some room for improvement, but I expect it to be good enough for what I need.

Thanks all for a fruitful discussion.