Search for a specific string in a file, print the index

Faster array approach:

(
~makeDict = { |stringPairs|
	var dict = Array.newClear(27);
	stringPairs.pairsDo { |str, hyphenation| ~putOneString.(dict, str, hyphenation) };
	dict
};

~putOneString = { |dict, str, hyphenation|
	var level = dict, sublevel;
	str.do { |char|
		char = char.toLower;
		sublevel = level[char.ascii - 97];
		if(sublevel.isNil) {
			sublevel = Array.newClear(27);
			level.put(char.ascii - 97, sublevel);
		};
		level = sublevel;
	};
	// level[\wholeString] = str;
	level[26] = hyphenation;
};

~matchString = { |dict, str|
	var level = dict;
	var stream = CollStream(str);
	var char;
	while {
		char = stream.next;
		if(char.notNil) {
			level = level[char.ascii - 97];
		};
		char.notNil and: { level.notNil }
	};
	// and here, retrieve hyphenation
	level.tryPerform(\at, 26)
};
)

d = ~makeDict.([
	"hello", "hel-lo",
	"there", "there",
	"thereupon", "there-up-on"
]);

d looks really funny!

> [ nil, nil, nil, nil, nil, nil, nil, [ nil, nil, nil, nil, [ nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, [ nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, [ nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, [ nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, hel-lo ], nil, nil, nil, nil, ... ]

But the ~matchString calls in my earlier post all still work this way.

Edit: This way might be a little trickier with accented characters though – it assumes 26 letters only. Accents would need to map onto indices… “exercise for the reader.”

hjh

Thanks for the hint! I didn’t know about the existence of this. Seems quite appropriate for what I’m doing, but I’m not sure how I would make it communicate real-time with Supercollider, maybe OSC?
The thing of having the hyphenation happen in supercollider is to having a fast lane for writing → decomposing text → recomposing rhythms. TBH I’m not sure where i’m going ahhahahah, I’m checking out PyPhen to figure out how to make them talk together

Assuming you can use unixCmdGetStdOut.

Run
pip install pyphen in the terminal first,
then in supercollider:

~get_sylabs = {|text|
	format(
		"python -c \"import pyphen; out = pyphen.Pyphen(left=1, right=1, lang='it_IT').inserted('%'); print(out)\"",
		text
	).unixCmdGetStdOut.split($-)
}

~get_sylabs.("Esempio, questa Stringa").do(_.postln)

returns:

E
sem
pio, 
que
sta 
Strin
ga

Seems pretty snappy to me.

Edit: added the split($-)

Edit 2: just fixed the fact it split Esem wrong

"Esempio, questa Stringa".reject(_ == $,).split($ ).collect(~get_sylabs)

This looks really elegant but I’m not quite grasping it. Would you mind walk me through your steps a bit?
The way I understand it I would mainly have to create a dictionary of associations string (whole word) with hyphenation (word syllables). Is that what you did with the variable d?
I’ve never used IdentityDictionary before, so perhaps I will need to take a deep dive there to properly get it. Trying it out, it does work in a pretty neat way (except for accented vowels) but I’m not gonna lie, your coding is VERY ESOTERIC for me :joy:

The main problem may be understanding the data structure.

This is basically guaranteed to be inefficient.

Instead what I did was to create a dictionary of first letters: keys are a, b, c … z. The “a” entry points to a dictionary of second letters coming after an initial a; the “b” entry points to a dictionary of second letters coming after an initial b and so on. Then for example the “a” entry underneath “b” points to a dictionary of third letters coming after “ba” and so on.

So the original words exist in the dictionary only as chains of associations.

At the end of each chain, there’s the hyphenated variant.

So instead of searching every entry, I thought to walk through the tree, where each letter narrows down the territory until you find either a matching end of the chain, or you run out of letters in the search string (failure) or you get to the end of the chain without exhausting the search string (also failure). This is exactly the process you use when you open up a paper dictionary.

If you’re looking for an 8 letter word, you’ll either find it, or not, in 8 steps, not potentially thousands of iterations.

hjh

1 Like

Thanks for the tips, am I doing something wrong?

returns [ ]

Did you install pyphen and get their demo working from their website? Also are you on windows?

i installed PyPhen and it works fine using the .runInTerminal method in SC instead of .unixCmdGetStdOut (it opens the terminal and prints the output there). I’m on OSX 10.13.
oh boy, oh my

Weird…
does this return anything?

"ls".unixCmdGetStdOut;

Ok I can follow this. My only question is about the d = ~makeDict. As far as I understand, that’s where you created a tiny dictionary of 3 words with their corresponding hyphenation - is that correct?

If so, I was just wondering if it makes sense to create an array of associations containing whole words interleaved with their hyphenation like you did, something like ~array = ["aba", "a-ba", "abaca", "a-ba-ca"..."zaino", "zai-no"], and then simply throw it into d = ~makeDict.([~array])

I understand that you’re not iterating through the whole thing every time but only chaining letters together, it’s just not clear to me how these associations are stored initially. Hope this question makes sense and as always appreciate a lot your explanations

yes, it returns

-> Applications
Documents
Library
Network
SuperCollider
System
Users
Volumes
WaveShell-AU 9.3.component
WaveShell-VST 9.3.vst
Waveshells
bin
cartella senza titolo
cores
dev
etc
home
ixilang
libtensorflowlite_c.dylib
net
opt
private
sbin
tmp
usr
var

Then I’m stuck because it works here and since, both, pyphen and get from terminal are working it should work for you.

Try running this in the terminal directly? does it return E-sem-pio

python -c "import pyphen; out = pyphen.Pyphen(left=1, right=1, lang='it_IT').inserted('Esempio'); print(out)"

Maybe try copying the function again, I edited it a few times.

~get_sylabs = {|text|
	format(
		"python -c \"import pyphen; out = pyphen.Pyphen(left=1, right=1, lang='it_IT').inserted('%'); print(out)\"",
		text
	).unixCmdGetStdOut.split($-)
};

~get_sylabs.("Esempio, questa Stringa")

actually it does :confused: for some reason it doesn’t work in supercollider. Well, thanks anyway for your help, I’ll try to figure out what’s up with unixCmdGetStdOut

Ok, made a laced array in Python to have it in the form [“word-string”, “word-syllables”, … “wordn-string”, “wordn-syllables”]. I’ve tried the function you wrote and you’re totally right: it’s extremely fast, and accurate. This is the perfect solution for what I was looking for! It takes only a few seconds to initialize the merged array (it contains now way more than a million entries! it was more than 600k words, same amount of hyphenated words) but after that, I can work the way I wanted (typing a word → getting hyphenation). Super cool. Now I’m gonna have fun hooking it up to a bunch of stuff! Hope to post interesting updates soon :smiley:

Yep – alternately, you could read the pairs incrementally and call ~putOneString.(d, word, word_syllables) for each pair (avoiding the need for a very large array of pairs).

Glad to hear that it performs as well as I hoped!

hjh

1 Like

I got nothing from your code in my SC-IDE under MacOS 12.6.5 on the Apple M1 Max chip:

It returns as follows:

-> [  ]

The following works in the terminal:

Last login: Tue May 16 08:18:32 on ttys000
(base) prko@prkoMBP2021 ~ % which python
/Users/prko/opt/anaconda3/bin/python
(base) prko@prkoMBP2021 ~ % which python3
/Users/prko/opt/anaconda3/bin/python3
(base) prko@prkoMBP2021 ~ % which python3

In Supercollider, the following codes return the expected lists:

["ls"].unixCmd
"ls".unixCmd

x = "ls".unixCmdGetStdOut
x

y = ["ls"].unixCmdGetStdOut
y

How could I solve the problem?

Yea i’ve also bailed this for the moment, I’m not sure what’s causing this. There might be a way but I haven’t gotten the facilities to try it lol - it would be trying to force Python to respond to the request. It seems the problem might be that Supercollider is not receiving the print(out) string, but if you have the identical problem as me, if you launch this

~get_sylabs = {|text|
	format(
		"python -c \"import pyphen; out = pyphen.Pyphen(left=1, right=1, lang='it_IT').inserted('%'); print(out)\"",
		text
	).runInTerminal.split($-)
};

and then this

~get_sylabs.("Esempio, questa Stringa")

you should have the output in the terminal, right?
I’m not so skilled in Python either, but I couldn’t find nothing online about this issue. Perhaps avoiding the (print) and just letting python store the content of the string into a variable, it could be possible to use Supercollider to retrieve it?

I solved the problem by finding the path for python. The following works:

(
~get_sylabs = {|text|
	format(
		"/Users/prko/opt/anaconda3/bin/python.app -c \"import pyphen; out = pyphen.Pyphen(left=1, right=1, lang='it_IT').inserted('%'); print(out)\"",
		text
	).unixCmdGetStdOut.split($-)
}
)

x = ~get_sylabs.("Esempio, questa Stringa").do(_.postln)
x

Using ```.runInTerminal`` is ineffective because you cannot assign the result to a variable. Assigning a result to a variable is important for reusing the data.

1 Like

aaaa genius!!! you’re right, correcting the path works. Thanks for figuring this out!

@Robin_Morabito
I have found the reason. Please refer to: