Skip to content
sauber edited this page Jan 21, 2012 · 1 revision

The Storage Problem

Introduction

Sharing files between computers or having shared access to a central file server is essential for using computer today. Most available system fall into other the central file server category, such as Unix NFS or Windows File Sharing, or into peer to peer file sharing between end computers. Both type of system have disadvantages for making large volumes of data available to many compters.

Overloaded central servers

Having data on a central servers, ensures data integrity and makes backup easy. But all load goes to a single server, as well as all disks having to be mounted on a single server. Scaling storage and bandwith is difficult as files and storage clients are added.

Small spare disk space is not utilitized

All computeres these days have harddisks builtin, but with only limited storage capacity. Sharing files just worsen the problem, as same files need to be copied to each computer needing the files. In the end most computers either does not have enough disk space, or has spare disk space that is not really used for anything.

File transfer is not secure.

File servers and peer to peer sharing is unencrypted. File servers typically have weak authentication and allow entire ranges of subnet to mount filesystems. Peer to peer systems have no authentication at all and everybody is allowed to connect to anybody anonymously. It's easy tapping into both type of systems for access to files and watching what individuals are transfering.

Peer to peer filesharing is not a file system.

There is no directory or index to list files available. The file you are looking for may or may not exist, but you have to go through search engines to identify it. Files have to be downloaded before they can be accessed. Client applications need to be started by each user of the system. Normal system calls such as open(), close(), read(), write(), stat() etc. are not supported.

Development of filesystem as kernel module is difficult.

Only limited number of file systems are available for each operating system. Windows has FAT and NTFS. BSD and SysV systems have UFS. Linux have several choices. But development of filesystems is generally very hard and only for experts to do. Traditional file systems are fixed in size and required access to block devices.

Large file collections are difficult to maintain

As storage space is increased and files are added, the burden of cleaning up files increases. Identifying files subject to removal is difficult, and areas where excessive space is used is not easily identified.

Loosing a disk means loosing data

When data is not encrypted on local disk on file server or file server client, then the data on the disk might fall into the wrong hands. When a disk break, data on the disk is usually lost, unless measure have been taken to duplicate the data.

The Storage Solution

The goal of ABFS is to solve the problems mentioned about.

Primary goals

  • Pool together spare storage on several nodes to make very large filesystems. Performance and capacity grows as more nodes are added, rather than sinks
  • File transfer is secure; It cannot be detected which files are transfered, not which nodes are doing file transfers.
  • File storage is secure; Loosing a disk does not mean loss of data, and the data available on a single disk is not useful for extracting any files.

Secondary goals

  • Autocleaning. Least used files are automatically removed when more capacity is required for new files. It's not a design goal to ensure that all files are retained forever. ABFS should not be used for files of critical importance.
  • Distribution by popularity. The more a file is used, the more it is replicated.

Not goals

  • Files cannot forcefully be deleted
  • There is no granular read and write access control
  • No anonymous peer to peer
  • Threading
  • Search
  • Direct peer to peer