Image Image Image ImageImage
Creative Services for
Roughly Drafted
Daniel Eran

Image Image

The Apple Wishlist: Mac OS X 10.5 Leopard
4.2 New Workgroup Services : The Personal server file archive
While Spotlight is great for local desktop searches, it doesn't work for files stored remotely on a file server.
Sure, you can theoretically index anything, but the real magic of Spotlight is that it plugs into the kernel, so every time a file on the system is touched, its records are instantly updated; it doesn't have to drag the entire disk through a long, slow, periodic background re-indexing to stay current.

The problem with shared files on a server is that they are touched by lots of people; Spotlight has no way of being notified when someone else makes changes.

If the server itself were handling Spotlight indexing, it would be burdened with sorting out who had permissions to which files, and who should see what search results. The server would not only need to keep separate indexes for each user, but it would have to potentially update every user's index (or decide not to!) on every file transaction. With more than a few users, that starts to become a huge amount of overhead merely to allow for fast searching.
The solution I envision for Spotlight searching on workgroup file share servers is to replace the file share with a locally indexed working copy of a version control system. The server would host a repository of version controlled files that are indexed for metadata search locally by Spotlight, just like any other local files. Subversion is one example of an open source version control system that already runs on Mac OS X.

Basically, in a version control system, a server defines a repository of files. Individual users check out the files, just like a regular file server share, except that the files are downloaded into the user's local working copy on their hard drive. The user adds, modifies and deletes files, then checks their changes back into the repository. This extra work results in a series of benefits: version control, data archiving, and offline syncing with the file server.

Software on both ends keeps track of file versions and time stamps. That allows a team to work on the same files while preserving their individual changes as different versions of the file. Rather than replacing the old file with a new copy, old copies are all saved. Users can request to see older versions of the file and revert back to them.

This replaces much of the functionality of backup archiving; users can locate archived versions of files without needing an IT staff to go through up backup tapes in an attempt to restore their data, if proper backups were ever maintained in the first place. Also, since files are stored locally, mobile users can work offline and sync files back the network later.

A more automated system that saved all changes to a version control archive server in the background, paired with a smart WayBackMachine interface, could even let users easily roll back undesired system changes of any kind.

Quite obviously, such a version control system does not solve the problem of limited local storage. But that's a problem that hardly exists anymore. Sure, in some situations, a large central file server is the best solution. But for many business workgroups, and for an emerging market of home users, the problem is no longer needing lots of room to store things; cheap local disks now routinely provide more than 200 GB.

A new set of problems start to emerge after you can store everything on your local disk: how to share access to files without worrying that other users might overwrite them; how to manage changing content and still reserve the capacity to access older versions for reference; how to work on the same files from different workstations, particularly when using a laptop.

All of these problems are solved when your server changes from simply being a big remote disk, to being a smart librarian that categorizes your files; checks changes in and out to multiple users; archives all previous versions; and provides a level of data redundancy impractical for individual workstations.

Heres a visual comparison between managing multiple files and using version control:

And here's how you'd manage versioned files:

| | Digg

Idea 3 - The SuperFinder


More Journal Entries | More Tech Articles | Get Tech Support | My Resume | Links | Contact RoughlyDrafted

Articles Copyright © 2006 Daniel Eran. All rights reserved.
Suggestions and comments welcome. Contact RoughlyDrafted.

Read more about:
Click one of the links above to display related articles on this page.