May 22, 2022

A Paean to the Filesystem

filesystempae·an – :  a joyous song or hymn of praise, tribute, thanksgiving, or triumph – Merriam-Webster

I returned from the Code4Lib Conference recently chock-full of things I want to investigate. I was also reminded about just how much I love the Unix filesystem. Yes, really.

I’ve long thought that the simplest solution to a problem is often the best. That is, complicated solutions tend to have more things that can go wrong. Plus they can be more difficult to learn, manage, and replace. That is why I’ve developed quite a bit of skepticism towards throwing databases at every problem.

I remember the time I was backing up my server and I neglected to dump the database that backed up one of my web sites. Yes, I really did. And yes, I had to rebuild it from scratch when I went to restore from the backup and realized my error. Running a tar job on the directory is SO inadequate when you lack the database that is required to make sense of it all. Sure, that is a stupid mistake, I’ll admit, but it would have been so much easier and less complicated had the site all been sitting on the filesystem.

And the thing is, often it can. Many web sites that are supported by a database like MySQL don’t really need a database at all. Mostly all they need is a way to search. And you don’t need a database for that. For that, all you need is an index. There are a lot of options out there for indexing, from the simple (such as my go-to favorite Swish-e) to the more complex (for example, XTF or Solr, which both support some very sophisticated sites).

Some benefits of relying on the filesystem include:

  • A tried and true technology that is as old as time. Well, maybe not time, but you get the idea. Filesystem technology has been around as long as there have been computers. You can take the word of an old-timer on that.
  • Drop-dead easy backup. Tar up the directory tree, gzip it, and throw the file on something else. Done and done.
  • Complete transparency. If you want to see a file, just look at it. You don’t have to figure out some complicated SQL query to pull something back out. It’s sitting right there where you can see it.
  • Slower obsolescence. Filesystems age at the rate of mountains. Databases age at the rate of flowers. Pick one to rely upon. No, seriously.

I understand that a number of open source applications such as Drupal and Omeka have made it relatively easy for people to set up a web site that needs to support a variety of user interactions by using the classic stack of Linux/Apache/MySQL/PHP. That is a good thing, and I support it. All I’m saying is that not everything needs to go into such a stack, and using that method comes with real consequences that should be understood from the beginning.

So what does this all mean? For me, it means that I will think long and hard before I set up another instance of the classic stack. I’m actually totally cool with that stack if you remove MySQL. I really don’t want to be a database administrator. I don’t. Just give me the filesystem and a decent indexer. That’s frequently all I need. And it may be all you need for at least some projects. Because, you know, the filesystem rocks.

Roy Tennant About Roy Tennant

Roy Tennant is a Senior Program Officer for OCLC Research. He is the owner of the Web4Lib and XML4Lib electronic discussions, and the creator and editor of Current Cites, a current awareness newsletter published every month since 1990. His books include "Technology in Libraries: Essays in Honor of Anne Grodzins Lipow" (2008), "Managing the Digital Library" (2004), "XML in Libraries" (2002), "Practical HTML: A Self-Paced Tutorial" (1996), and "Crossing the Internet Threshold: An Instructional Handbook" (1993). Roy wrote a monthly column on digital libraries for Library Journal for a decade and has written numerous articles in other professional journals. In 2003, he received the American Library Association's LITA/Library Hi Tech Award for Excellence in Communication for Continuing Education. Follow him on Twitter @rtennant.