Treat String as File (to parse it as CSV)

Hi there, I’d like to parse a String as though it were a File (primarily so I can perform CSV-type parsing on it). The reason I don’t want to just use String.split is because I want it to handle quoted delimiters properly, as FileReader should.

Is there some way I can make a string appear as an open File – which is a kind of Stream – to achieve this goal? (or is there some other way to do it easily?)

Thanks,
Glen.

Hmm, maybe it’s not so important… I just tried FileReader (CSVFileReader) (on a test file) and it actually doesn’t handle delimiters in quotes anyhow…so guess I won’t be using it.

Anyone know of a Quark that does handle this properly? e.g. CSV should be able to handle:

5,7,hello,"a comma, in quotes!"

This should return four items:
[ 5, 7, "hello", "a comma, in quotes!" ].size == 4

How bout just writing the string to a file and … opening it? It may seem inelegant, but file operations are dirt cheap, and then you are where you want to be with minimum hassles.

Cheers,
eddi

(Yeah, well, i sent this before your second message, but it wasn’t posted til after, so now I seem like a dork. so it goes.)

Thanks, yeah. In the end, I did something like this that works more or less the way I wanted (space separation, but also supporting quotes with spaces in them):

(
f = { arg str;
	// This regex isn't perfect, but works
	// reasonably well for most of my cases...
	var regex = '".+?"|[^"]+?(?= )|(?<= )[^"]+'.asString;
	var arr = str.asString.findRegexp(regex).collect{ |x|
		x.last.trim
	}.reject(_.isEmpty);
	if (arr.isEmpty) { arr = [ str ] };
	arr
};
)

f.("  \"Hello there,\" she said. ").do(_.postln)
// "Hello there,"
// she
// said.

There is a class that can treat a string like a file: CollStream (sadly undocumented, but it pretty much follows the File streaming interface, so it’s easy to use).

Unfortunately the FileReader classes can’t use it. There isn’t a good reason for this restriction – it’s only that, when it was decided to allow the user to pass in a File, this was done by pathOrFile.isKindOf(File) which rejects objects that are not directly underneath the File hierarchy (even if they can masquerade as files).

IMO this is a bug: when this was done, the possibility of using CollStream was overlooked.

The regexp is a nice solution.

hjh

Thanks, yeah, I was pretty sure I remembered there being some way to make a string into a stream, although I didn’t find it while searching (the documentation :wink:, silly me). And I did see that FileReader used isKindOf(File), which wouldn’t work with other types, like the CollStream you mention…

BTW, here’s a slightly improved version (the previous version included the quotes in the returned split strings, and also didn’t handle tab or other whitespace separation between “arguments”):

(
~separateBySpaces = { arg str;
	var regex = '"(.+?)"|(?:([^"]+?)(?=\\s))|(?:(?<=\\s)([^"]+))'.asString;
	var arr = str.asString.findRegexp(regex).flop.last.clump(4).collect{ |x|
		x.drop(1).collect(_.stripWhiteSpace)
	}.flatten.reject(_.isEmpty);
	if (arr.isEmpty) { arr = [ str ] };
	arr
};
)

~separateBySpaces.(" say  \t person  \"hello, there!\"  ").do(_.postln);
// say
// person
// hello, there!

Here is an evil hack:

c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = CSVFileReader.newCopyArgs(c, false, false, $,);

r.next;
-> [ 5, 7, hello, "a comma,  in quotes!" ]

This bypasses the normal initialization logic, so you can fill in “file” with anything. The drawback is that you’re responsible for making sure every instance variable has a reasonable value – without keyword arguments – so any code relying on this hack would be subject to breakage if the reader classes ever changed.

In any case, I think I’ll do a quick pull request for the bug.

hjh

1 Like

Thanks…that trick works great (for the FileReader), but of course the class itself still doesn’t handle delimiters in quotes, which is a separate issue.

(
c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = FileReader.newCopyArgs(c, false, false, $,);
r.next.postcs.size.debug("size");
)
// -> [ "5", "7", "hello", "\"a comma", " in quotes!\"" ]
// -> size: 5     (should be 4)

I just opened a bug for this issue:
FileReader (CSVFileReader) doesn’t handle quoted delimiters · Issue #5612