1. Introduction A Portable Storage Device (PSD) such as a USB Flash Drive or Minidisk lets people carry personal data in a compact lightweight form. We would like to ensure that personal data remains confidential. To this end the data needs to be managed so that sensitive content has limited exposure, all data can be reliably accessed and routinely backed up. Computing professionals have a difficult time achieving this and it is easy to lose important data as a result. Ease of use is essential for non-professionals to manage their data otherwise they will have no option but to live with non-secure and unreliable data management practices. This paper proposes mechanisms for authenticating access to confidential data stored on a PSD. Two tools for data encryption were implemented to evaluate how the methods in different environments and to evaluate their ease of use. The paper also describes additional tools for managing and backing up data stored on a PSD. 2. Data Security While portable storage is convenient to carry, it is also subject to theft or is easy to lose. People might choose to safeguard certain data such as a mailbox in clear text form that they would normally store unprotected on a trusted system. When stored on a PSD such data can be saved in encrypted form to prevent unwanted access. It should be visible to the file system while working on it, but encrypted while toting it around. Computers are now online much of the time at both work and at home. There is a greater need to secure personal data exposed through internet connections. Just using a PSD to store personal data instead a local file system continuously connected to the internet provides a modest level of security. The PSD only needs to be plugged in while it is in use, which limits the time of exposure. Setting system permissions so that the device can't be accessed over a LAN or WAN limits many forms of malicious access. This is the default setting for Windows systems; users need to explicitly share removable drives. People also store some highly sensitive information that would cause serious problems if it were in the wrong hands. This includes such things as information about financial accounts, computer passwords and confidential personal information. The general advice given to people is to not store this information on a computer at all; which is not a bad idea. However that is not always convenient and has it's own management problems. For instance people can forget or lose passwords or leave them laying around on post-its. People can store their most sensitive information on a PSD safely and conveniently if some simple tools and data management practices are employed. A primary copy can be maintained on the PSD that can be accessed only by the user. 2.1 Security Provided By PSD Vendors When you buy a PSD, vendors usually supply some data access mechanisms. Data access is granted through a password or with a thumbprint scanner. While this form of secure access is sufficient for many people, it has some limitations. Thumbprint scanners provide a high degree of protection, but are overkill for the level of security required by most people. Thumbprint signatures are hard to forge and can't be guessed like passwords. They may also be easier to use than a password as there is nothing to forget and they can be scanned faster than a password can be typed. As a security mechanism they are a functional substitute for passwords. Their cost is intrinsically higher, but may be worth the cost to people who want to use vendor-supplied security. These authentication mechanisms vary between vendors, but generally authentication data is stored on the device itself and access is granted through software provided by the vendor. The authentication data is stored in an area of memory not accessible through the file system. At least some vendors store protected data in encrypted form. Vendors do not publish their authentication mechanisms, so it is unclear how secure they are. Depending on the implementation, the authentication mechanisms may be subject to attack through reverse engineering. Credentials management varies between operating systems. Even amongst different versions of Windows there is no common API for credentials management. Consequently it is difficult for vendors to provide consistent authentication mechanisms for even Windows platforms. Instead vendors develop custom authentication mechanisms so their devices work securely in different environments. Authentication mechanisms that are built into the device do not recognize that the user has logged into a trusted system. The user needs to enter a password each time they want to access their confidential data. Furthermore, restrictions on the password (e.g. it must be numeric) might mean that a familiar password can't be used and as such it can easily be forgotten. Vendor supplied authentication mechanisms have a different user interface for each vendor. When you switch between drives the user interface changes and different features are supported. New device drivers or software might need to be installed in order to use the device. This might not be possible in some environments due to access restrictions or system and software incompatibilities. As a convenience many PSD vendors partition the drive into protected and unprotected partitions [1]. Most personal data is not confidential so it can be stored in the clear text region and accessed without going though an authentication process. The size of the encrypted region is predetermined by the owner and may not be adjusted as space needs change. Given that PSDs have limited space, this restriction becomes a problem when using the devices for long term storage. Once the drive is unlocked data on the drive is exposed to the file system. If the device is your primary device the encrypted data is exposed much of the time. Encrypting data on the device hides data if the device is lost or stolen, but provides no protection while the device is on-line. Consequently, it is not recommended that highly sensitive information on encrypted partitions. 2.2 Encrypting Data Using Utility Programs This section describes utilities written for this project that encrypt files and directories. These utilities either operate in a trusted environment such as a home and office or in a foreign environment such as a friend's computer or a publicly available computer in a library or internet cafe. 2.2.1 Authenticating Access To Encrypted Data In a trusted environment the utilities are installed on the local computer and a password is retained by the operating system and used to access encrypted data. Subsequent execution of the utilities retrieve this password without forcing the user to re-enter the password. If the utility fails to automatically unlock a particular file, then the file has been encrypted with a password other than the saved password. Only then is the user prompted for a password specific to the file. This way access can be restricted on a per-item basis, however it is most convenient to use the same password for all encrypted items. Foreign environments are distinguished by the lack of installation elements (e.g. Windows registry entries or stored credentials) resident in the environment. The utilities are not installed on foreign systems, but are instead installed on the PSD. This introduces the problem of authenticating permission to run software installed on a PSD (see Section 4.1). Since passwords are not saved in a foreign environment the password must be re-entered for each encrypted item. Because we want to minimize the use of confidential information in foreign environments this should not be much of a problem. The utilities were written using the Windows XP credentials manager for authentication [2, 3]. The credentials manager provides the familiar Windows password dialog box and securely saves user name and password combinations. If the user saves the password then the utility can retrieve it without prompting the user. Although the utilities need only a password and not a user name, the interface forces users to enter a password. Future work on authentication for the utilities would be to find a way to avoid this. Additionally, credentials manager only works with Windows XP service pack 2 and Windows 2003. Alternative implementations are required to develop utilities that will persist passwords in other environments [4]. Mechanisms for some operating systems might not provide a secure mechanism for saving passwords or might be easily cracked. For those systems users may be required to re-enter their password for each access. If this happens frequently these utilities may not be appropriate. 2.2.2 File and Directory Encryption A utility was written for this project called Ezip to encrypt and decrypt files and directories. It uses the International Data Encryption Algorithm with 128 bit private keys [5]. If the user has saved their password in a trusted environment the data can be encrypted and decrypted without prompting for a password. Data stored on a PSD is decrypted in-place to avoid retaining clear text on a local file system where it can be retrieved by an untrusted party. Even if the user had intended to delete it, they may be unable to due to a system failure. The decryption process consists of several stages. Directories are unrolled using the gnu tar utility and data compression may also be employed to save storage space; which can be limited on a PSD. To avoid writing intermediate data between stages to any device, data pipes need to be used to connect the stages. This functionality is not yet implemented, but is essential to make the Ezip utility practical. Intermediate files take up space on a PSD, but more importantly could be left behind if saved to a local file system. The suffix, ".ez", is appended to files encrypted with the Ezip utility. In a trusted environment the suffix can be associated with the utility and the password saved. In the Windows implementation this lets users simply click on an an icon for an encrypted file or directory to decrypt it. There is the possibility that a decrypted item will be left in clear text form at the end of a session. A tool could be written to track files that are currently exposed and close them at the end of a session. Before ending sessions with a PSD, software must be run to flush data to the drive. This tool could run as part of the software used to close the session. The list of exposed items should be saved on the PSD so that it could pick up if a session was terminated abnormally and the tool was not run. In some cases the process of encrypting large files and directories may take too long. Also decryption exposes clear text to the files system (and untrusted administrators), so this utility is not useful for highly sensitive information that should never be saved to a file system as clear text. 2.2.3 Text Editor with Encryption Ideally applications can encrypt and decrypt data without storing any clear text on any file system. In this project a plain text editor, Magenta, was retrofitted with the Ezip utility. Highly sensitive data can be stored in encrypted form and accessed without ever being stored as clear text in the file system. Encrypted files with the ".ez" suffix are automatically decrypted when opened with the editor. Any files that were originally encrypted are re-encrypted when saved by the editor. Users don't need to remember to re-encrypt the clear text as they did with the stand-alone Ezip utility. Users need to explicitly save a previously encrypted file in clear text form if that is what they want. If the file was opened with a password other than the saved password the user is prompted for the password again when saving the file. This is required because the password and its hash key were scrubbed when the file was opened. This is also a good idea from the user's perspective. Its clear to the user which password is being used to save the file. If the source file was also compressed, it needs to be run through a compression filter when it is opened or closed. As with the Ezip utility, the filter should not save intermediate copies of the file on any drive. Again pipes can be used to avoid saving intermediate files. This capability is not yet implemented. The editor can decrypt directories and non-text files as well. This way the Magenta editor can be associated with the ".ez" suffix instead of the Ezip utility. The editor will open encrypted text file in an edit session. If instead the editor is given an encrypted binary file or directory the item will be saved on the PSD in clear text form. 3. Data Integrity As personal data migrates, different versions of files are retained on the various computers and PSDs people use, which can cause confusion. Interruptions in service can occur as well. Local disks or PSD drives can fail or be accidentally corrupted. A PSD can be forgotten, stolen, or lost. Connections to LAN or WAN storage may be interrupted. These are all problems of data integrity. 3.1 Reliability of the Drive and Interface USB flash drives are usually preformatted with a Dos Fat16 file system. In an experiment with a Fuji USB 2.0 flash drive using Windows 2000 directories more than seven levels deep, a limitation of the Fat16 file system, caused file system corruption. Concurrent access caused a Windows blue screen crash. To avoid these problems USB flash drives should be reformatted with a file system other than Dos. The Fat32 files system is recognized by all versions of Windows and also works with Linux as well. After formatting a Lexar JumpDrive for Fat32 the directory depth problem went away. Files were still corrupted under Windows 2000 when accessed concurrently, but not under Windows XP. Users need to verify that a reformatted USB drive works properly on any system they intend to use with it. Not all programs work with USB drives formatted with a Fat32 file system. The Microsoft Partner Pack [6] contains a utility, the "Microsoft USB Flash Drive Manager". It performs common operations needed to manage files on a Flash Drive, such as data backup. This utility did not recognize the JumpDrive formatted with Fat32, but did recognize it when formatted with Ntfs. Users also need to check that USB drives work with the software they intend use. Collectively problems with USB drives make them difficult to use in an arbitrary environment and make them untrustworthy. If their use is limited to tested platforms people should be able to reliably use a USB drive. In untested environments, avoid concurrent drive access. The serial ATA drive interface is reliable and eventually may supersede the USB interface. The same ATA device drivers used for internal hard drives are employed which eliminates the concurrent access problems encountered with USB drives. The SATA I standard does not include power for external devices, but the SATA II standard does with the eSATA [7] component. The eSATA standard is currently undergoing specification testing and is expected to be completed in the summer of 2005. The proposed eSATA connector is about 1" by 1/4". The bandwidth of the SATA II interface is 300 MBps; while the peak USB 2.0 speed is 60MBps. In practice the actual bandwidth can be much lower, depending on the device implementation and the computing platform. The next generation of Non-volatile RAM [8] is expected to have access times comparable to main memory. This should allow the creation of PSDs with speeds exceeding that of hard drives. 3.2 Web Distributed Storage A backing storage source can provide access to data in the event that a PSD is unavailable. Several ongoing research efforts use WAN-based storage systems that provide availability at multiple locations. When coupled with a PSD people can have consistent and reliable storage available nearly everywhere. The PersonalRAID system [9] treats a PSD as a cache for multiple local drives. Consistency is maintained using parallel file system index trees called a log-structured file system. Data is replicated over local storage at each location where you work. The file systems are synchronized by replaying updates kept in file logs. Checkpointing is used for disaster recovery. PersonalRAID primarily addresses the problem of slow WAN access by using the PSD as a cache. Data is replicated for increased availability; acting as a reliable backup service. It does not treat the PSD as a principle storage device. The limited capacity of PSDs is not addressed in this work. As with the PersonalRAID system the work by Tolia et. al. [10] uses a PSD as a cache for a file system distributed over a WAN. They cite the reliability problems and availability limitations of PSDs as their reason for restricting them to caching. Unlike PersonalRAID, which has a page size granularity, this approach uses a file granularity. When data is available on the PSD then the whole file is available and not just a fragment. Still the main intent is to improve the inherently slow access times of distributed file systems. Their solution is to use a combination of local storage caches and the PSD provides fast access. In this case the PSD is providing availability for most often used files when the distributed file system is unavailable. The user does not control which files reside on the PSD, so some critical files may not be available through the PSD. The Pastiche system [11] provides a distributed WAN backup system using peer-to-peer network connections. Data is stored redundantly over distributed servers for high availability and reliability. Pastiche is primarily an outbound backup service. The overhead as seen by the user is low. Inbound access is limited to data recovery and synchronization. Pastiche stores data in blocks called chunks. This is transparent to users as the source file system (the PSD) is not affected by the granularity. Hashing is performed over chunks to detect matching or changed copies. Data is replicated on multiple peer-to-peer servers that cooperatively share unused storage. Data is encrypted for privacy in the distributed file system. Users avoids paying for costly remote backup services, but must provide a server and storage space to join the network. The Pastiche model could be further extended to propagate changes between the WAN and a local file system. Changes made on a PSD (say at home) would propagate through the WAN to local storage at another site (say at work). This would keep the local copies at both ends synchronized. The amount of synchronization required when attaching a PSD would be reduced by the synchronization taking place through the WAN channel. Multiple versions of files are kept to compensate for user errors, such as unintended deletions. This could be extended for PSDs to resolve collisions when synchronizing file systems at different locations. Collisions occur when the same file is updated separately on a disconnected PSD and on a computer connected to the distributed file system. Such inconsistencies need to be detected so the user can merge the changes. An ephemeral backup system saves changes to files automatically when they are written. This requires that the operating system provide support to notify the ephemeral backup system whenever a changed file is closed. Secure access to the changed files must also be granted. If the backup system is distributed over a WAN then a coherent and up to date version of files can be accessed from multiple locations. 4. Authenticating Software Packages Software installed on a PSD needs to work on multiple computers. This is contrary to the distribution model where software is deployed on specific computers. This complicates the installation process for software vendors. Users might be unable to install specific software packages on a PSD or have a difficult time getting the software to work in multiple environments. 4.1 Independent Software Vendors When installing software a product key is often required to authenticate your right to use the software. Given that any competent programmer can modify a program to bypass such checks, this is merely a deterrent. The ultimate protection mechanism is the legal copyright. Software vendors use a variety of obfuscation methods that are difficult to reverse engineer [12]. From a security perspective this is a weak method of authentication. Often vendors will hide access information in rarely visited portions of the file system or in the Windows registry. This presents a problem when installing software onto a PSD. There are very few places to hide such information. Some form of standardized hardware and operating system support for authentication may be required to make vendors more comfortable with the idea of deploying to a PSD. Legacy software installation programs may preclude installing software on a PSD. Furthermore software vendors may not want their software installed on a PSD. The traditional licensing model binds installed software to computers and not just the disk drive. As PSDs become more commonplace then perhaps binding software licenses to people and not computers makes more sense. Legacy software can be installed in trusted environments and it can access data on a PSD just like any removable drive. However legacy software is often prohibited from installing software in foreign environments. Either the user doesn't have appropriate privileges or is not allowed to pollute the the environment with outside software. Even if the software was uninstalled after the session; many software packages have poorly written uninstallers that leave behind remnants. To operate in a foreign environment, software needs to be installed on the PSD itself. Software for Windows is typically installed on a local file system and settings are made in the registry. Software installed on a PSD needs to work on multiple platforms; each with their own registry. Vendors need to make explicit changes to their installation programs before software can be deployed on PSDs. 4.2 Installing an Operating System on a PSD Operating systems can be boot loaded from a PSD. When running the OS from a PSD legacy software can also be installed on the PSD just like any drive. Additionally the system will be configured with the user's preferences and security credential wherever it is used. Difficulties arise when booting an operating system in a foreign environment. Operating systems are installed for particular hardware configuration, not multiple configurations. It's unlikely that any arbitrary computer will be able to boot from your PSD. With some effort a person should be able to set up an operating system on a PSD that will boot load in both a home and work environment. To boot load an operating system [13, 14]: * The BIOS running on the mainboard in the host system must support booting from the PSD's interface (e.g. USB). * The PSD must support boot loading. * The operating system must permit installation on a PSD and be configured for hardware on the target host computers. 5. Conclusions Using a PSD as a primary storage medium currently requires careful data management and system administration. Hardware and operating system dependencies prohibit or limit usage to a few controlled environments. With effort a computer savvy person could use a PSD for many primary data storage requirements. Usage must be verified for each environment and data must be carefully secured. The tools developed for this project can provide data security with convenient authorization in trusted environments. In particular, the text editor provides secure access to highly sensitive data even in untrusted environments. In the near term emerging technologies show great promise for making PSDs more reliable and more ubiquitous. Drives using the eSATA interface will not have the data integrity problems of the USB interface. New non-volatile memories will have unlimited rewrites and will be faster and cheaper than current flash memories. Still it will take several years for these technologies to gain widespread use. In the meantime early adaptors will be able to deploy them at home and in the office. Even though the technology is in place, systems integration issues will continue to be the limiting factor. Legacy applications and operating systems will always be problematic. People can work around this if they have ability to cleanly boot load an operating system from arbitrary computers. It's unclear that anyone will be integrating secure ephemeral web-base backup with any operating system anytime soon. Operating systems and PSD vendors will need to implement standards for installing software on a PSD. In a lecture by Bill Poduska, he observed that the computing industry advances in steps. Every decade or so a confluence of three emerging technologies yield a leap in computing capability. This may be happening with the convergence of a new generation of non-volatile memory, miniature hard drives and the external SATA II interface. Low cost and highly reliable PSDs will have capacities on the order of 100GB and speeds far exceeding that of current hard drives. As data shifts from local file systems to PSDs we may say that the pen drive is the computer.