PathName, String, and File

VIRTUALDOG · January 26, 2021, 4:09pm

Responding to this post by @jamshark70 – Invalid file name for Pbind recording - #2 by jamshark70

At some point, I think we should remove PathName.

There is almost no reason to use it, ever.

Just about everything you need to do with paths is already supported in String. Going through PathName just introduces complication and it’s unnecessary in your example.

I’d just take it out.

I think the idea that there is no reason to use or want PathName is a bit extreme, and also cuts against the grain of what I see happening in other high-level, cross-platform programming languages. I would advocate for something a little different. Right now, the utilities for manipulating paths as filesystem objects are spread across String, PathName, and File. I think it would be a good move to consolidate them all under PathName, much like C++ and Python’s own filesystem libraries (std::filesystem formerly boost::filesystem, pathlib formerly os.path), deprecating and removing path-manipulation utilities elsewhere. There are good design reasons for doing this, in my opinion.

For one, many operations on paths make no sense in the context of string manipulation – component-wise path equality, component iteration, concatenation, and the operation of taking two absolute paths and making the first relative to the second. Attaching them to the String interface is confusing and also requires them to return defined results even when the operands are nonsensical as paths.

At the same time, many operations on strings make no sense in the context of path manipulation – split, rotate, scramble, string concatenation, padding, etc. Having them available on paths is an invitation for half-solutions where the correct, platform-agnostic version of path manipulation code is missed because abstracting over directory separators is difficult compared to hard-coding the one locally in use. Another possible source of mistakes is treating ++ as “just as good” as +/+; even if one use of ++ works because of local guarantees, copy-pasting it somewhere else may cause issues.

Combining the String and Path/PathName interfaces on the same object leads to awkwardness, because we have to then also make a choice about where to put methods that actually manipulate filesystem objects, like exists, copy, rmdir, and chmod. Putting them on String more or less requires adding the word path to the names to disambiguate and clarify their purpose and avoid naming collisions. It starts to feel like this isn’t really String’s job.

On the other hand, keeping them separate on File as they are now is also unsatisfying, because we also have some filesystem-touching functions on both String (pathMatch) and PathName (files). Plus, these methods on File are pretty unwieldy, just compare File.exists(p) to p.exists for example. After all, if we have a PathName object, it typically represents some possible past, present, or future object on the filesystem that we likely want to interact with.

In my opinion, most of the current awkwardness with PathName comes from the fact that it does not offer a very full interface, and also has bad support throughout the rest of the class library. Neither of these are really problems with the general concept of a Path type which is separate from a String type, just the implementation.

In summary, I think moving all FS ops from File and String to PathName would be a better long term solution than migrating PathName to String regardless of whether File’s ops are kept in File or also moved to String. That is, if any such migration is being considered at all. This would be a complicated migration regardless of how it’s done. Another option would be to start fresh with a more accurately named Path class; if we were going to migrate lots of FS ops there it would be worth it to get the name right, I think.

Btw, I only listed C++ and Python above because I’m most familiar with those. I’d be happy to hear if there are path manipulation paradigms in other languages that people think are worth of praise or imitation.

hemiketal · January 26, 2021, 5:26pm

I have created many bugs in my code working with paths because how the last slash is handled by some methods of PathName

PathName("/etc/cron.d").isFolder; // -> true
PathName("/etc/cron.d").pathOnly; // -> /etc/
PathName("/etc/cron.d/").pathOnly; // -> /etc/cron.d/

I often end up writing a function to normalize the path and add/remove a last slash or not depending if it’s an actual folder. Also there is no method to normalize a path, removing double slash, dots and useless double dots. I have tried to write it based on python equivalent (os.path. normpath) but it is not trivial.

jamshark70 · January 26, 2021, 10:30pm

As a quick response to Brian – indeed, my issue is that it’s unclear where to go for path operations and that the operations are inconsistently divided among multiple classes (such as in the cited issue, where PathName doesn’t have standardizePath ).

If the statement was extreme, it’s because sometimes we just get used to these interfaces being thrown around not very carefully – rather than giving the advice just to convert to a string before standardizing, I was pointing out that the reason for user confusion was because of an inconsistent design (and it had the intended effect of drawing attention to the inconsistency).

I’d have no problem with consolidating these under PathName. I suppose we could even maintain backwards compatibility by having current string methods convert to a PathName, keeping the manipulations in PathName, and then automatically string-ifying for OSC messages and (Sound)File operations.

IOW no objections here – thanks, Brian, for picking up on it

hjh

catniptwinz · January 27, 2021, 12:19am

FWIW as a user I’m happy to see that this has come up; it’s an unintuitive rough edge of sclang that still regularly sends me back to the documentation even after seven years with SC. I’m also broadly in agreement with Brian’s proposal.

VIRTUALDOG · January 28, 2021, 11:05am

glad others agree this is a good direction ^^

yes, this is not a trivial thing to do, fortunately C++'s std::filesystem has standardized functions for doing it, like canonical() and lexically_normal(). i don’t know enough about this API to say the best thing would be to just copy it wholesale in SC, but it has gone through a lot of rounds of design as a boost library, plus ISO standardization following that. It’s at least something to look to for inspiration. The good news is that having it available means implementing a robust and flexible filesystem library in SC is relatively easy.