Flucoma how to save and load fitted kdtree?

muzikman · April 3, 2023, 5:00am

In plotter-5-complete.scd, after I’ve fitted the FluidKDTree, how do I save the fitted kd tree? Are they buffers that can be written out to disk and reloaded? Is there a generalized serialization method in supercollider?

muzikman · April 3, 2023, 7:22am

Looks like the FluidDataSet can write and read to file as json.

muzikman · April 3, 2023, 7:26am

But it seems like I cannot print to console the entire data set. It is always truncated.

Mike_McCormick · April 3, 2023, 9:04am

Yup, .write and .read to save and load. If you want to access the stuff inside the DataSet, KDTree, MLPClassifier, etc. use .dump:

// .dump gets passed a dictionary whose keys are strings, they're slightly different for each Fluid object
~yourKDtreeGoesHere.dump({ |d| d["data"].do(_.postln) });

tedmoore · April 3, 2023, 2:11pm

Calling print is just for eyeballing the data–make sure it is the right number of columns, right number of points, and looks generally correct.

muzikman · April 3, 2023, 5:43pm

How do I get an entry of the normed fluiddataset? I’m trying to verify the 2 features and the label “1”, “28”, etc. I assume that the index numbers “241”, “1”, “123”, etc from ~indices.loadTofloatArray is used to label the points? I want to relate the sample time when a hit was detected, n, the next hit, n+1 , the slice (n, n+1 ) , the 2 feature scores of ~normed. I can get the startsamp and stopsamp from the ~play_slice function, but how do I do that with a fluiddatasetquery of ~indices and ~normed?

muzikman · April 3, 2023, 5:58pm

Say for instance the kdtree.knearest gives the index of the nearest point as A, how do I query ~tree to retrieve the 2-features of that point?

Mike_McCormick · April 3, 2023, 6:14pm

It’s hard to guess what your ~variables refer to without seeing any code, perhaps you want to post a snippet?

I’m guessing you’re working through some examples included with the Flucoma library? I suggest you breakdown the code into smaller chunks; liberal use of .postln, .print, and .dump will help you understand what each line of code is doing.

For example (I’m speculating as I don’t know what your code looks like), you “ assume that the index numbers” are used to label the points…why don’t you double check? Which line of code assigns the labels? What happens if you use Strings (or Symbols? or Arrays?!) instead of integers? These things are really easy to test and the payoff is twofold: you learn something about your tools and you get an answer to your question quicker than posting in the forum.

Speaking of forums: there is a Flucoma forum which is super enthusiastic and supportive - some of your questions (also future questions you will have) might find a more appropriate audience there.

muzikman · April 4, 2023, 4:52am

I’m running through the plotter-5-complete.scd. I did search through the flucoma discourse site, but it didn’t yield much. What I want to do is to do a join on the ~indices, which have the sample time of when the algorithm detected a hit, and the fitted kdtree ~tree, which has a 2-tuple of the features, on the index number “1”,“28”, etc. Also I want to query the ~tree dataset where I can say give me points “12” to “45”. Because the kdtree structure has a strange tree structure in the json dump, which i assume is how it is represented in the fluiddataset for a kdtree ( ~tree.print doesn’t work ), the datasetquery methods don’t seems to be able to handle something of that type.

Mike_McCormick · April 4, 2023, 8:44am

I don’t have that file on my computer, so without a code snippet I’ll have to speculate a bit - take my advice with an appropriate amount of salt!

Presumably you populated a FluidDataSet with .addPoint - the next instance method listed FluidDataSet Helpfile is .getPoint; could this be what you’re looking for?

Again, you don’t have to assume!!! The code I posted earlier will work on a FluidDataSet as well - I’ll post it here again but reformat my comment so it’s easier to read:

// .dump gets passed a dictionary whose keys are strings
// they're slightly different for each Fluid object
~yourFluidObject.dump({ |d| d["data"].do(_.postln) });

So if .getPoint doesn’t do what you want, you could access the data of a dataSet with:
~yourDataSet.dump({ |d| d["data"]["indexGoesHere"] });

muzikman · April 4, 2023, 9:36am

No the fluidkdtree is not a fluiddataset. The progression of the plotter-5 is, there’s a normed fluidDataSet that is generated from the umapped dataset by doing a FluidNormalize.fitTransform(~umapped,~normed). This ~normed data set is fitted into a kdtree by ~tree=FluidKDTree(s).fit(~normed);

This ~tree is used by fluidPlotter to generate the onscreen 2D positions. Now that i think about it, the ~normed fluiddataset would probably give the X,Y coordinates. I had the impression that the FluidKDTree would be a queryable dataset. I wanted to generate a set of points, get nearest neighbours from the KDtree and from the index of the points returned, retrieve the sample starts from ~indices. I could do that programmatically, but there was the FluidDataSetQuery api, and thought I could do that with the kdtree. I just had to confirm if the ids in the FluidKDTree.dump are consistent with the ones generated from the FluidBufOnsetSlice. ( They should be because they’re used to index into the ~indices buffer ).

In the end I did a ~tree.write, used jq to split the ids and data, combined them back with a python zip so I can get a format like { id: <id>, startsamp:<x>, stopsamp:<y>, features: [ <k>,<j>] }. I probably should just add them to a fluiddataset from the beginning.

tedmoore · April 4, 2023, 9:53pm

Hi @muzikman,

It seems you’ve already confirmed for yourself, but yes, they are.

I often do use one data set to hold analyses and another dataset (with corresponding identifiers) to hold the startsamp, stopsamp, etc kind of data.

Also, the YouTube video you referenced earlier has a lot of this info in it, it might be worth digging in to that deeper!

Best,

t

muzikman · April 5, 2023, 3:50am

Noting this down just in case anyone is trying this:
Write the ~normed outputs into an external file, and cat normed | jq '.data| to_entries | map( select(.key=="25"))' to select the 2-features of the detected hit.

The FluidKDTree doesn’t generate a FluidDataSet, so ~tree.dump.getPoint doesn’t work on it. To get any points, do a nearest neighbour lookup. To get the sample start frames, there’s sample code in the FluidKDtree documentation that uses FluidKrtoBuf and FluidBuftoKr. Or correlate the index returned with the data set generated during the detection phase.

tremblap · April 5, 2023, 8:15am

There is a simpler way. actually, 2.

if you want to retrieve stuff, just save it. So you would write the FluidDataSet and read it back, as well as the FluidKDTree - they will share identifyier so you retrieve the data (or any other data as @tedmoore suggested) from the the query - the tree’s identifier(s) spat out.

This is both true and false. I don’t know why we decided long ago to store the data points separate to the ids in there, but there was a good reason. but it is all in there.

so if you don’t want to save 2 files (the fitted dataset and the tree) then you can reconstruct the former from the latter. Dump the tree, and make a dict from a lacing (array.lace) tree[“ids”] and tree[“data”] and you have it back without going to the server.

I hope this helps

tremblap · April 5, 2023, 8:16am

Indeed I hope people here enjoy the cross-pollination. If not, moderators, please get in touch.

muzikman · April 5, 2023, 9:24am

Array.lace. Totally forgot about this. I went the python zip route instead.
Ok. so what I’m trying to do is:

~normed.dump({
 |x|
 x["data"].do({
  |y |
 ~points=~points.add([y[0],y[1]]);
});

~points.do({
  |py |
var point=Buffer.alloc(s,2);
point.setn(0,[py[0],py[1]);
~tree.kNearest(point,4,{
 arg nearest;
 ~results=~results.add([py[0],py[1],nearest]);
 });
});

Basically it takes every point in the normed space, looks for the 4 nearest neighbours, then stores the query point and the results. But, the dump function and the knearest function basically forks during execution and getting ~results of the form [ query point, [ 4 nearest neighbours ] ] this way doesn’t seem to work.

I managed to get [ wrong query point, [ correct 4 nearest neighbours] ] by encapsulating the Knearest search as a Task, but the query point is wrong. Should I use the ~point: Buffer instead? Is this a consequence of the processing being handled on the server? How would I get the result from a Buffer instead? Using Index.kr? I can work around this by ignoring the query point, since the returned knearest neighbours must by definition contain that point, but I’d like to know what went wrong.

tedmoore · April 9, 2023, 10:05pm

Hi @muzikman,

It sounds like you’re dealing with some async headaches. Just eyeballing your code, you might try adding a wait condition using Condition or the newer CondVar. Also, rather than bringing data from the server back to the language via .dump you could just call .getPoint on the dataset to get the point into a buffer.

T

muzikman · April 10, 2023, 1:28am

The getting of the ~points, yes getPoint on ~normed would work. But it’s the second part where putting it into the buffer and sending it to ~tree that is the real problem. I’ve noted the solution here

https://scsynth.org/t/how-to-get-the-lang-to-wait-for-a-server-process-to-finish-flucoma-knearest-and-buffers/7455/5

Somehow the buffer doesn’t get written to in time for ~tree to finish it’s previous knearest search, so a search over all points doesn’t give the complete solution. a wait to test if the buffer changes would work and a nested call back to a knearest search.

muzikman · April 11, 2023, 9:33am

And, ~tree cannot be called with getPoint. Yeah that was the other thing.

tremblap · April 25, 2023, 6:54am

Hello

I hope this post helps your async issues?