Products Why ExaGrid? News/Events Partners Support Company Info Contact Us

ExaGrid's Eye on Deduplication

Current Articles | RSS Feed RSS Feed

Is Disk With Deduplication Really Faster than LTO4 Tape for backups?

Posted by Marc Crespi on Mon, May 03, 2010 @ 11:19 AM
  
  
  

The short answer to the question is yes. However, to really understand why, read on. In fact, there are multiple reasons that disk is faster than tape for backups and one of the reasons is the BIG reason.

Sometimes I will be approached by someone who will cite the LTO4 speeds with compression over a fibre network as they try to convince me that disk is actually slower or at best equal to tape. They cite the fact that most disk-based backup appliances connect over Ethernet and utilize a NAS interface. Therefore, there are bottlenecks that do not exist with tape.

Let's make this easy. Let's not even debate whether a single backup job to tape is slower than a single backup job to disk. For this moment in this blog post, let's call it a tie. Or better yet, for arguments sake let's allow just for this moment that tape could be faster (gasp!). Even in the rare case when this could true, this is not at all the horse race you care about. The key thing in backup is to complete all of your backups in the shortest time possible, specifically within your allowed backup window.

Disk gains a lot of its speed advantage through concurrency-the ability to run a lot of backup jobs in parallel. With tape, the number of backup jobs you can run simultaneously is limited by the number of tape drives in your tape library. With a 4 drive tape library, you can run 4 backup jobs simultaneously. The fifth, sixth, seventh, and eighth jobs patiently wait for a drive to be free. With disk, you do not have this limitation. For example, each ExaGrid appliance can handle between 6 and 20 concurrent backup jobs. A 60 TB ExaGrid System can support 120 concurrent backup jobs or streams whereas other systems may support as little as 18 for that amount of data.

A good comparison for this is draining a swimming pool. Imagine a swimming pool with 100 gallons of water in it. Let's assume I install two drains in the pool, each capable of draining 1 gallon per hour. Obviously, it will take 50 hours to drain the pool (100 gallons divided by 2 gallons/hour = 50 hours).

Now imagine the same pool but instead of two drains, I install 10 drains each capable of draining ½ gallon per hour. Even though each individual drain can only drain at half the rate of the drains in the other example, the pool will drain faster. The 10 drains will drain the pool at a rate of 5 gallons per hour versus 2 gallons per hour or more than twice as fast. The pool will drain in 20 hours versus 50 hours.
So, before you upgrade your tape library to the blazing fast speeds seen with LTO4, remember you will always be limited by the number of tape drives.

Folks, its time for disk-based backup. And with data deduplication in a plug-and-play appliance, it fits within even some of the tightest IT budgets.

Tags: , , , , ,

COMMENTS

Actually, you can run more than 1 simultaneous job to a tape drive. NetBackup supports multiplexing, and I often run 5 or more simultaneous jobs to a single tape drive. I'm not saying disk is better than tape, and that there isn't a tradeoff being made on restore time, but it is possible.

posted @ Tuesday, May 04, 2010 12:08 AM by Seth Bokelman


Hi Seth, 
 
 
 
Thanks for your clarification and you are absolutely right. Multi-plexing does allow you to run more jobs. You are also right, that there is a trade off in restore times as a single backup job now may span many tapes. The other limit is that even with multiplexing, you are limited by the aggregate throughput of tape drives. So multi-plexing helps when you have clients that cannot go faster. However, disk provides a great deal more bandwidth and therefore when you run multiple jobs, you typically gain in throughput, until a point where the number of streams causes too much seeking on the disk array.  
 
 
 
thanks for reading! 
 
 
 
Marc Crespi

posted @ Thursday, May 06, 2010 9:12 AM by Marc Crespi


Marc, 
 
Very good points here, wouldn't argue with any of them... In my opinion though the backup speed of disk vs. tape is really not the sole issue. As Seth points out there are some tricky things that can be done to help tape performance. Also with many backup apps, building up concurrency requires some manipulation of the job to divide them up enough to generate that concurrency, unless you have 100's of powerful clients that can fill up a pipe.  
 
For me it really comes down to understanding what the client is capable of. How much data can that server being protected really push down the network? What is the best way to partition that client's data up so that maximum output to the backup server or target can be achieved? The good news with a disk solution like ExaGid's is that it is more forgiving than tape in under performing client situations, which in my experience dominates the enterprise. Clients that are not capable of maintaining maximum performance are much better served by a random access device (disk) that does not have to reposition itself when the data stream is interrupted. 
 
Backup Optimization is about obtaining the best possible balance between client, network, Backup Server and Target, as we discuss in this article:  
http://www.storage-switzerland.com/Articles/Entries/2010/1/27_Backup_Optimization.html 
 
One of the key values of disk based backup targets is they just make it easier to optimize. 

posted @ Sunday, May 16, 2010 4:01 PM by George Crump


Thanks for the great insights George, we wholeheartedly agree. We appreciate the knowledge you bring to customers as they sort out how to deploy the right technology for their environment. As you mention, disk does indeed open up opportunities to shrink backup windows through concurrency. One thing our support team does with customers at install time is to help them determine if they can make gains as you describe in your comments.

posted @ Tuesday, May 18, 2010 10:01 AM by Marc Crespi


Comments have been closed for this article.

Subscribe by Email

Your email:

Connect with ExaGrid

request budgetary pricing resized 600

Browse by Tag