Good stewardship: the preservation of data in the face of adversity

Part of the first imperative of system administration is preservation of the important data that you are the administrator of.

It means that you must have redundancy in terms of the data — (but what about the vast popularity of deduplicatin?) The easy way to do this is to have multiple copies of the data that you are responsible for. The prices of non-volatile media falling (you can buy an external 4 terabyte hard drive for around $140 currently), and the density is such that you can comfortably carry a 4 terabyte drive in one hand.

Two key concepts for data recovery:

A second pair concepts are those of ``archival'' material versus ordinary backups. With archival material, you are creating a historical snapshot. For instance, here in the computer science department, we create archival backups at the end of each semester as a snapshot.

Backup tapes

Traditionally, data redundancy was achieved via tape, and was called a backup tape. There are also newer media, such as RDX, which provide the same sort of physical format, but also have some very desirable additional capabilities.

Backup tapes traditionally were made generally once per day, giving you an RPO generally of sometime in the early morning. Usually the scheme involved making a ``full'' backup only occasionally (perhaps once a week, once a month, or even (page 180 of USAH) once a year!) with one or more levels of ``incremental'' backups which recorded only changed files and directories. More modern schemes, such as rsnapshot uses, make use of hard links instead.

These backups were made on a per machine basis (though today such backups are generally made with a backup server which uses ip to transfer the data from the client machine to the backup server.)

The upsides of such traditional schemes are:

The downsides of such traditional schemes are:

What's wrong with tapes?

In the Unix world, the pair of programs dump and restore have been our mainstays for doing backups. On a machine with a locally attached tape drive, these still make a lot of sense. Another program that has been used is tar (short for tape archive) which is also used for creating general aggregates of files and directories. Another related program is cpio. All of these can also create a normal file instead of a tape. dump and tar allow you to also create compressed images, although with today's tape compression at the hardware level, this is probably not of as much interest as it was in the past.

While all of these allow use to make backups either locally or remotely, using dump and restore allows one to make incremental backups. (I don't recommend using tar's incremental scheme. Also, many default versions of tar have in the past only allowed up to 100 characters in a pathname.)

restore also you to do interactive recovery of data from tape or dumpfiles.

USAH recommends using a single machine for tape backups if feasible by using the network to transfer data to the backup server. I have seen that advice combined with the idea of local backups (in order to have more redundancy) where tapes were made both on the local machine and remotely. (In that case, the remote tape backups were also highly available. The backup server had a tape silo, and you could literally pull any recent file for any machine that was backed up from tapes that were in the tape silo.)

Label your tapes

One of the worst stories that I have ever heard is the story of going into a server room and finding on top of each tape drive 4 tapes labelled ``A'', ``B'', ``D'', and ``E'' -- and no key to explain what the contents of each might be (although it would be a very good guess that ``C'' is currently in the drive.)

If you find that you are doing such schemes where tapes do not have labels that are fully descriptive of the contents, while this is okay if this is because you are using a tape scheme that creates tape sets where the contents are essentially stored in an effectively random access fashion, this is probably not okay if it is because you are simply recycling the tapes so frequently that it is inconvenient to relabel them. Tape reuse is the number cause of tape failure.

Tape management

While with some tape silos it is impossible not to span tapes in what are called ``tape sets'', it is generally not a good idea to span tapes if you are doing individual backups. In addition to the simple hassle of changing tapes when making them, both the tasks of labelling and recovery from multiple tapes is not much fun.

One part of recovery planning has to include that of disaster scenarios. Generally, tapes are a good backup mechanism for such scenarios since they are easy to transport offsite --- though that itself engenders risk if the tapes are not encrypted!

Update: Lost backup tapes prompt IT changes at NY bank

Additional measures, including encryption, will protect tapes transported to off-site facilities

By Brian Fonseca

June 2, 2008 (Computerworld) Bank of New York Mellon Corp. late last week said it has launched a new policy to encrypt data held on storage devices and to limit the amount of confidential client data stored on tape drives. The policy was launched after unencrypted backup data tapes were lost twice by third-party couriers this year.

The bank would not disclose its past storage policies.

The company announced the new policy just days after disclosing that one of 10 boxes of storage tapes being delivered to an off-site facility by storage firm Archive America was lost in February and that another courier, which it did not identify, lost a storage tape during transit in April.

Combined, the two data breaches exposed sensitive information of more than 4.5 million people and 747 companies, according to BNY Mellon officials.

In the February incident, the tapes were being transported from the bank's Mellon Shareowner Services facility in Jersey City, N.J. Those missing backup tapes include names, birth dates, Social Security numbers, and other information from customers of BNY Mellon and the People's United Bank in Bridgeport, Conn.

The tape lost in April was in transit from BNY Mellon's Working Capital Solutions operations in Philadelphia to a branch office in Pittsburgh. That backup tape included images of scanned checks and other documents relating to payments made by BNY Mellon clients, said company officials.

The company late last week also announced that it will provide two years of free credit monitoring, credit-freeze benefits and a $25,000 identity theft insurance policy to those affected by the missing tapes.

``We deeply regret that this occurred and sincerely apologize to all of those impacted,'' said Todd Gibbons, chief risk officer at BNY Mellon, in a statement. Gibbons said there is no indication that data on the missing tapes has been misused or inappropriately accessed.

To bolster its security controls, the bank said it will now require that any confidential data written on tapes or CDs for transport must be encrypted or transported with undisclosed additional data protections. Further, when ``technically feasible,'' the bank will demand that encrypted confidential data be delivered to off-site facilities electronically, noted Gibbons.

BNY Mellon has ended its relationship with Archive America and is cooperating with law enforcement agencies and state and federal officials in the investigation of the incident. A spokesman for the financial institution refused to comment on any details surrounding the circumstances of how the tapes disappeared. Archive America officials declined comment.

Connecticut Attorney General Richard Blumenthal said that the missing BNY Mellon computer tapes have put the personal identities of 497,333 state residents at risk. In a statement released late last week, Blumenthal and the state's Department of Consumer Protection listed 25 companies with Connecticut-based customers affected by the breach. The list includes People's United Financial Inc., John Hancock Financial Services Inc., The Walt Disney Co. and TD Bank Financial Group.

Blumenthal said he is still waiting for answers from the bank about how the data breach occurred, who is responsible for the crime and why BNY waited months to notify customers about the incident. ``We haven't completed our investigation and there are still some important questions that have to be answered,'' he remarked.

This appeared in Computer World at http://computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=security&articleId=9092178&taxonomyId=17&intsrc=kc_top on June 2, 2008.)

Iron Mountain Loses More Tapes

Backup tapes from City National Bank were lost in April, but there's no evidence the data has been compromised, the bank says.

By Steven Marlin

Jul 8, 2005 03:00 PM

City National Bank has become the second company in two months to experience a loss of backup tapes in transit by Iron Mountain Inc. The Los Angeles-based bank disclosed Thursday that two tapes containing sensitive data, including Social Security numbers, account numbers, and other customer information, were lost during transport to a secure storage facility.

The bank said the data was formatted to make the tapes difficult to read without highly specialized skills, but declines to say if they were encrypted. It said there's no evidence that data on the tapes has been compromised or misused.


This appeared in Information Week at http://informationweek.com/story/showArticle.jhtml?articleID=165701015 back in July of 2005.

Stolen University of Utah medical files recovered

By PAUL FOY 07.03.08, 10:58 AM ET

Three people involved in the theft of millions of medical records didn't have the ability, knowledge or equipment to decode personal information from the backup tapes, authorities said.

Salt Lake County sheriff's investigators said Wednesday a caller led detectives to the University of Utah billing records that were burglarized last month from a courier's vehicle.


The backup tapes contained various combinations of social security and driver's license numbers, birth dates, doctors' names, insurance providers and medical procedure codes on about 1.5 million patients who visited the hospital over the past 16 years, a hospital executive said. The hospital originally said the records contained confidential information on 2.2 million patients, but later corrected the figure.

Detectives said the three people didn't have the specialized equipment needed to run or decode the encrypted tapes.


The courier faces no charges and was a victim of the crime, he said.

The courier was supposed to drive the tapes to a storage vault burrowed into the granite of Little Cottonwood Canyon. He went home instead.

Detectives believe the car was a random target of burglary and that at first, the thief didn't realize what was inside a metal canister or on the backup tapes.

With the tapes in safe hands at an FBI lab, Entwistle said there was no merit to a proposed class action lawsuit filed Monday against the hospital on behalf of patients whose information may have been at risk.


Many companies use a secure Internet line to electronically back up business records at a database center, and Entwistle said the hospital is looking at ways to eliminate the risk of deploying couriers to move information.

This appeared on Forbes.com at http://www.forbes.com/feeds/ap/2008/07/03/ap5181656.html on July 3rd, 2008.)

Tape management continued

While I have seen two sites that had adequate on-site vaulting, that is enormously expensive compared to simply storing them offsite, such as with Iron Mountain.

When you are making backups, it is advisable to try to have quiescent filesytems. Back in the 1980s, I used to take all of one facility's machines down to single-user state to make backups on Friday evening.

Some people use three-way disk mirrors, and then break the second mirror off to ensure that the filesystem is quiescent.

However, it is more common to make backups on relatively live machines, and generally there aren't that many obstacles --- except for databases. If you need to back up a database on a live filesystem, the simplest thing to do is probably to create a snapshot file (such as with mysqldump or pg_dump), and then -- once you are certain that file has been made -- make your backup tape.

Tape management, continued

Checking your tapes: This is so important that in some regulated industries it is mandatory.

It is best practice to check your tapes on a regular basis. Either you or a user should periodically choose a file or directory to attempt to recover, and if at all possible, then try to recover that on a different tape drive than the original one. (Some tape units come go out of alignment; while they may be able to read their own tapes, other drives may have trouble with those tapes.)

Refreshing media: Tape technology has very rapid turnover, and technical obsolence is a constant problem. If at all possible, you should try to move your old tapes when you introduce a new tape technology. This is expensive (especially if you have a considerable collection of archives), but paying a recovery service is even more expensive if your tapes fall too far behind the technology curve. Just look at pp. 170-175 in USAH and you can see the technology curve is all too apparent. Only DLT is in all that common usage, and 12 MB/s is no longer ``blindingly fast''.

Incremental schedules

One possibility is to simply do a full dump each day of every machine. That provides a lot of redundancy, and generally will keep you close to your RPO. This is quite common in smaller establishments, or in ones where data preservation is an especially high priority.

However, some establishments have lesser criteria for data preservation. In those, it is common to schemes such as that on page 180 of USAH:

(I haven't seen the suggestion on page 180 implemented that the monthlies be actually level 3s, and reserve level 0s for once per year. That's putting a great deal of faith in that level 0 --- if it fails, you could be looking at going back to level 0 from two years prior!)

Recovery from tape

When you are trying to recover a single file or directory, the first order of business is to find the right tape. If a user comes to you about recovering a file that he has not thought about in a while, you will have to try to ascertain which tapes to go to find the file. Once you have found an appropriate tape, if you are using restore, you can run it in interactive mode instead of trying to extract the whole tape to disk.

If instead you want to recover an entire filesystem, you need to make sure that you aren't simply compounding whatever problem you had to begin with. If, for instance, a disk drive or controller is acting flaky, simply recovering back to the same unit isn't going to do you much good.

For instance, as USAH has it on page 183, if you are using the scheme of levels 0, 3, 5, and 9 as mentioned above, you need to first recover from the most recent level 0 tape, and then from the most recent level 3, then the most recent level 5, and finally, the most recent level 9.

Another reason to make backups which we haven't touched on is when you are doing system upgrades. System upgrades have the distinct possibility of rendering your system unuseable (I was just doing a test upgrade last week and all of the disk information on the boot drive of the test machine rather mysteriously disappeared.) In that case, convenience of recovery is a strong goal, and backing up to disk rather than tape should be a strong consideration.

Tape management -- the mt program (USAH pp. 186-187)

The program mt lets one manipulate tapes that contain more than a single ``EOF'' written to it.

The main commands for mt are

Using rsnapshot

The paradigm for rsnapshot is a simple and clever one.

It's built around rsync; essentially, what it does is keeps a current copy of some remote directory structure (usually at the filesystem level, though that's not a requirement), and for each new historical snapshot, it creates a new set of hard links and rotates the old snapshots.

The configuration file has one primary decision to make; whether or not to "sync_first". My experience has been that it is better to separate the sync and snapshot/rotate functions, though the default is to combine those two.