Hi there, I’d like to parse a String
as though it were a File
(primarily so I can perform CSV-type parsing on it). The reason I don’t want to just use String.split
is because I want it to handle quoted delimiters properly, as FileReader
should.
Is there some way I can make a string appear as an open File
– which is a kind of Stream
– to achieve this goal? (or is there some other way to do it easily?)
Thanks,
Glen.
Hmm, maybe it’s not so important… I just tried FileReader
(CSVFileReader
) (on a test file) and it actually doesn’t handle delimiters in quotes anyhow…so guess I won’t be using it.
Anyone know of a Quark that does handle this properly? e.g. CSV should be able to handle:
5,7,hello,"a comma, in quotes!"
This should return four items:
[ 5, 7, "hello", "a comma, in quotes!" ].size == 4
How bout just writing the string to a file and … opening it? It may seem inelegant, but file operations are dirt cheap, and then you are where you want to be with minimum hassles.
Cheers,
eddi
(Yeah, well, i sent this before your second message, but it wasn’t posted til after, so now I seem like a dork. so it goes.)
Thanks, yeah. In the end, I did something like this that works more or less the way I wanted (space separation, but also supporting quotes with spaces in them):
(
f = { arg str;
// This regex isn't perfect, but works
// reasonably well for most of my cases...
var regex = '".+?"|[^"]+?(?= )|(?<= )[^"]+'.asString;
var arr = str.asString.findRegexp(regex).collect{ |x|
x.last.trim
}.reject(_.isEmpty);
if (arr.isEmpty) { arr = [ str ] };
arr
};
)
f.(" \"Hello there,\" she said. ").do(_.postln)
// "Hello there,"
// she
// said.
There is a class that can treat a string like a file: CollStream (sadly undocumented, but it pretty much follows the File streaming interface, so it’s easy to use).
Unfortunately the FileReader classes can’t use it. There isn’t a good reason for this restriction – it’s only that, when it was decided to allow the user to pass in a File, this was done by pathOrFile.isKindOf(File)
which rejects objects that are not directly underneath the File hierarchy (even if they can masquerade as files).
IMO this is a bug: when this was done, the possibility of using CollStream was overlooked.
The regexp is a nice solution.
hjh
Thanks, yeah, I was pretty sure I remembered there being some way to make a string into a stream, although I didn’t find it while searching (the documentation
, silly me). And I did see that FileReader
used isKindOf(File)
, which wouldn’t work with other types, like the CollStream
you mention…
BTW, here’s a slightly improved version (the previous version included the quotes in the returned split strings, and also didn’t handle tab or other whitespace separation between “arguments”):
(
~separateBySpaces = { arg str;
var regex = '"(.+?)"|(?:([^"]+?)(?=\\s))|(?:(?<=\\s)([^"]+))'.asString;
var arr = str.asString.findRegexp(regex).flop.last.clump(4).collect{ |x|
x.drop(1).collect(_.stripWhiteSpace)
}.flatten.reject(_.isEmpty);
if (arr.isEmpty) { arr = [ str ] };
arr
};
)
~separateBySpaces.(" say \t person \"hello, there!\" ").do(_.postln);
// say
// person
// hello, there!
Here is an evil hack:
c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = CSVFileReader.newCopyArgs(c, false, false, $,);
r.next;
-> [ 5, 7, hello, "a comma, in quotes!" ]
This bypasses the normal initialization logic, so you can fill in “file” with anything. The drawback is that you’re responsible for making sure every instance variable has a reasonable value – without keyword arguments – so any code relying on this hack would be subject to breakage if the reader classes ever changed.
In any case, I think I’ll do a quick pull request for the bug.
hjh
1 Like
Thanks…that trick works great (for the FileReader), but of course the class itself still doesn’t handle delimiters in quotes, which is a separate issue.
(
c = CollStream("5,7,hello,\"a comma, in quotes!\"");
r = FileReader.newCopyArgs(c, false, false, $,);
r.next.postcs.size.debug("size");
)
// -> [ "5", "7", "hello", "\"a comma", " in quotes!\"" ]
// -> size: 5 (should be 4)
I just opened a bug for this issue:
FileReader (CSVFileReader) doesn’t handle quoted delimiters · Issue #5612