Treat String as File (to parse it as CSV)

Hi there, I’d like to parse a String as though it were a File (primarily so I can perform CSV-type parsing on it). The reason I don’t want to just use String.split is because I want it to handle quoted delimiters properly, as FileReader should.

Is there some way I can make a string appear as an open File – which is a kind of Stream – to achieve this goal? (or is there some other way to do it easily?)


Hmm, maybe it’s not so important… I just tried FileReader (CSVFileReader) (on a test file) and it actually doesn’t handle delimiters in quotes anyhow…so guess I won’t be using it.

Anyone know of a Quark that does handle this properly? e.g. CSV should be able to handle:

5,7,hello,"a comma, in quotes!"

This should return four items:
[ 5, 7, "hello", "a comma, in quotes!" ].size == 4

How bout just writing the string to a file and … opening it? It may seem inelegant, but file operations are dirt cheap, and then you are where you want to be with minimum hassles.


(Yeah, well, i sent this before your second message, but it wasn’t posted til after, so now I seem like a dork. so it goes.)

Thanks, yeah. In the end, I did something like this that works more or less the way I wanted (space separation, but also supporting quotes with spaces in them):

f = { arg str;
	// This regex isn't perfect, but works
	// reasonably well for most of my cases...
	var regex = '".+?"|[^"]+?(?= )|(?<= )[^"]+'.asString;
	var arr = str.asString.findRegexp(regex).collect{ |x|
	if (arr.isEmpty) { arr = [ str ] };

f.("  \"Hello there,\" she said. ").do(_.postln)
// "Hello there,"
// she
// said.

There is a class that can treat a string like a file: CollStream (sadly undocumented, but it pretty much follows the File streaming interface, so it’s easy to use).

Unfortunately the FileReader classes can’t use it. There isn’t a good reason for this restriction – it’s only that, when it was decided to allow the user to pass in a File, this was done by pathOrFile.isKindOf(File) which rejects objects that are not directly underneath the File hierarchy (even if they can masquerade as files).

IMO this is a bug: when this was done, the possibility of using CollStream was overlooked.

The regexp is a nice solution.


Thanks, yeah, I was pretty sure I remembered there being some way to make a string into a stream, although I didn’t find it while searching (the documentation :wink:, silly me). And I did see that FileReader used isKindOf(File), which wouldn’t work with other types, like the CollStream you mention…

BTW, here’s a slightly improved version (the previous version included the quotes in the returned split strings, and also didn’t handle tab or other whitespace separation between “arguments”):

~separateBySpaces = { arg str;
	var regex = '"(.+?)"|(?:([^"]+?)(?=\\s))|(?:(?<=\\s)([^"]+))'.asString;
	var arr = str.asString.findRegexp(regex).flop.last.clump(4).collect{ |x|
	if (arr.isEmpty) { arr = [ str ] };

~separateBySpaces.(" say  \t person  \"hello, there!\"  ").do(_.postln);
// say
// person
// hello, there!

Here is an evil hack:

c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = CSVFileReader.newCopyArgs(c, false, false, $,);;
-> [ 5, 7, hello, "a comma,  in quotes!" ]

This bypasses the normal initialization logic, so you can fill in “file” with anything. The drawback is that you’re responsible for making sure every instance variable has a reasonable value – without keyword arguments – so any code relying on this hack would be subject to breakage if the reader classes ever changed.

In any case, I think I’ll do a quick pull request for the bug.


1 Like

Thanks…that trick works great (for the FileReader), but of course the class itself still doesn’t handle delimiters in quotes, which is a separate issue.

c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = FileReader.newCopyArgs(c, false, false, $,);"size");
// -> [ "5", "7", "hello", "\"a comma", " in quotes!\"" ]
// -> size: 5     (should be 4)

I just opened a bug for this issue:
FileReader (CSVFileReader) doesn’t handle quoted delimiters · Issue #5612