I would like to start with more generic area. It’s data protection in general. This article is for providers (to know what to ask) but also for data owners (to know what questions to expect). At the beginning there is a discussion between backup provider and data owner (many times just different departments of the same company).

Wrong questions during design of your data protection

The initial phase immediately jumps to questions like:

  • What is frequency and retention of backups?
  • Can you run my backups every 15 minutes?
  • I need my weekly backups in offsite facility.

Everything is wrong. Really, really wrong.

The absolute first question for data owner should be:

What is value of your data?

I know the answer already:

We don’t know.

This is the principial mistake during any negotiation about data protection. Let’s take an example. When you own a car you exactly know what’s its value and how much you are willing to pay for protection. How expensive car alarm you will buy, what insurance package you will select. Whether to pay for parking lot or park it on the street. In case of data very rarely the owner knows the value. But every price of protection seems to be expensive.

To know value of data is a mandatory step. As I have mentioned early people just don’t know. O.K. Ask owner to split data to few categories by importance and ask a simple questions:

  1. How much data (expressed by days, hours, minutes) you can lost to not affect your business? How much money you will lost if your data from the past 10 minutes, 1 hour, 1 day just disappear?
  2. How long you can continue your business without this kind of data? Many data owners believe that they have to have data available everytime. It is not true. Nobody lost a business if invoices was sent 3 days later. Missing boring reports for management Monday meeting really doesn’t send you to bankrupcy. But system tracking your goods delivery is that keeps you live. Late package delivery decrease your reputation, lost package will drive yours customers away.

Many times you invest more to financial reporting system than to system generating money.

The result of initial discussion should be categorization like follows:

  1. Most important data - we cannot lost any of them and have to be available everytime → high availability solution based not only on backup. The price will be really high but still less than possible financial lost.
  2. Data with less importancy - we can survive loosing past 15 minutes of data and can survive 1 hour of outage → system with asynchronous data replication, snapshots, frequent backup of database transaction logs
  3. Not too much important data - loosing data from today or yesterday will really complicate our business
  4. Even less important data - we can recreate system by new installation but configration would be good to have at least from past weekend
  5. Backup will be more expensive than data value itself - we can recreate data from production database during night

Data protection or backup?

Why we should speak about data protection and not about backup? This article is not just for service providers but also for data owners.

The traditional idea of “backup” is to copy some data to tape. To be available if necessary.

Customer: We have a Windows fileserver with DDS tape drive and we are running a backup everyday and put the tape on the shelf. We are rotating tapes every 2 weeks. Pretty simple. Why your backup service is so expensive?

The answer is simple. You are not providing simple data storage on DDS tapes laying on the shelf. You are providing to put back exactly specified data in defined time.

Your product is not “data on tape kept for 3 months”. No. Your product is “service allowing that any data from past 3 months will be back in 1 hour and you never lost more than 24 hours of your history”.

There is a small difference if car rental company offers you

Car rental by 30$ per day.

or

Temporary replacement of your car 24 hours daily. The car will be delivered within 1 hour from 1st call in 20 miles distance around your office.

Do you see the difference?

Let’s go back to data. If you are selling “backup” customers expect cheap storage of data an tapes. Are you able to offer option of maximum lost data history 15 minutes? And putting all data back in 30 minutes from 1st call? Yes, it’s possible. But not guaranteed.

Are you able to backup 100GB in 15 minutes? Probably not. (O.K., possible in some special cases). Can you give 100GB of data back in 30 minutes from tapes? Do you have dedicated tape drive available? Even in case of more parallel restore requests? Is filesystem/database capable to write more than 100MB/s?

Really? Seems that some snapshot technology can do it.

What if you have to restore data to DR location? Ups. Seems that some data replication will be required.

Will you still call it “backup”?

Yes, technically it’s the backup. But slightly more expensive backup than “tape on the shelf”.