Wednesday, November 22, 2006

Why Maximum Path Lengths are a Bad Idea

All file systems have a maximum path length which is limited - that's not particularly surprising, as computers are quite finite.  However, the problem is that path lengths have an arbitrary length limit which in no way scales with the computer involved.  In windows' case, it's an especially short one - 260 characters - at that.  Coding Horror doesn't seem to think that's a problem.  His argues that humans dealing with a hierarchy that deep can't really deal with it at all, and you can't remove those human limitations, so why bother removing the technical ones?  And anyway's, it's not such a big deal, you should just choose shorter paths!

That argument just doesn't hold water, unfortunately.  People can deal with huge hierarchies without great difficulty, and the technical issues are more nasty that they look at first sight.

Why can people deal with huge hierarchies?  Because you only tend to only ever look at a very small part of it at the same time.  Explaining the concept of a galaxy will faze people initially, while they try - and fail - to relate that concept to themselves.  But eventually, you stop doing that and it's just this big swirling mass of stars.  No problem there.  Oh yeah, stars, well, they're often the center of a solar system with some planets swirling around them.  And planets are just lumps of rock (or other stuff).  Some planets contain landmasses and water, and on those land masses you might find mountain ranges, rivers, cities, people - and yourself.

Hierarchies aren't the problem, but situations in which you need to deal with the whole thing simultaneously are.  displaying a large part of the hierarchy in the way a file system does is difficult.  And strict tree structures are unnecessarily limiting.  There's definitely room for improvement - but reducing the depth of hierarchy in a file system doesn't solve the problem, it makes it worse, because all of the sudden you actually need to be aware of the root of the file system while dealing with a leaf.

When I take a look at somebody's project, I want to be able to do so without needing to think about how long the path might be.  A common scenario for me is to dump things related in some sub-directory - some place I won't see it unless I'm actually viewing the project itself.  Of course, that sub-directory can contain sub-directories etc etc, and it's not a problem since I never stop to think about it, and hard-drives are huge anyway.  It's common for back-ups of file systems of old computers to end up somewhere in there  (Think, "Documents" > "Work related" > "old job" > "backup of old work PC").  I like to keep these things just in case, as long as it doesn't cost much effort.  Backups of PC's however can easily exceed maximum path lengths and that means, I need to start bothering again.  It's a hassle.

The problem with maximum path lengths doesn't occur when you're able to plan everything out in advance.  It occurs when you combine two existing hierarchies into one.  Maximum path lengths turn the file system into a leaky abstraction.

Basically, any time you move or copy a directory you're potentially in trouble.  And that's simply not acceptable.

The technical nastiness this causes isn't funny either.  For example, using only the normal Win32 API its trivial to make paths that are longer than 260 characters, and probably longer than 32k as well.  You just need to move a directory containing a long path somewhere into a deeper spot of the directory hierarchy.  Moving on the same file system is almost instantaneous because you're just updating a few references.  You don't need to actually touch all the contents of the moved directory.  But that means there's no way to even know that a file path exceeded the maximum!  And so windows will happily let any user create a directory structure which is impossible to delete - even by an administrator or a virus scanner - without first moving all the directories out of their nested structure.

And that's a big security leak.

Windows should evolve a new API without an overt maximum path length, because the current situation causes many many nasty corner cases which rarely occur, but currently have no satisfactory solution.

2 Comments:

At 22 November, 2006 19:40 , Anonymous hwh said...

I heard of this story long ago on the WING system @ RuG: no filename length limit existed or at least a very long name was allowed. Somebody implemented a filesystem on top of this namingscheme to bypass the quota system. So allowing abitrary long names, or levels of nesting directory, in general is not always a good idea.

 
At 23 November, 2006 08:10 , Blogger Eamon Nerbonne said...

What a great hack :-). There are a number of hacks like that which don't depend on file path lengths though; and in any case, quota on linux now also limits the number of inodes which should (?) I believe fix this issue. creating a directory only you can see and use, but then setting the ownership on files to be somebody else used to be a quota workaround at least.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home