Products Why ExaGrid? News/Events Partners Support Company Info Contact Us

ExaGrid's Eye on Deduplication

Current Articles | RSS Feed RSS Feed

Questions to Ask About True Scalability in Backup with Data De-duplication

Posted by Bill Andrews on Tue, Oct 28, 2008 @ 10:47 AM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

The word scalability is often applied broadly. Scalability in backup simply means that when your data grows you can add more capacity.

 What are all the questions you need to ask?

  1. Does the additional capacity just plug in and virtualize itself into the existing capacity?  Most vendors have simple plug and play inclusion of additional capacity.
  2. Can all the capacity be managed from a single console? The answer is typically yes within a single scalable system.  However, is the answer still yes if you max out a system and add a second system?
  3. Do I need to change anything in my existing backup application or does it all work together seamlessly? How much work is there: none, 15 minutes or hours?
  4. When I add capacity am I just adding more disk or am I adding more performance?  This may be the most important question because if your backups are 5TB and they grow to 10TB and all you add is disk capacity, then your backups will slow down and explode your backup window. You need to add processor, memory, bandwidth and disk with data growth to ensure that the backup window does not expand.
  5. How big can the current system scale before I need a second system and how many systems can I have in a group? Knowing this ensures you have plenty of room to grow without having to trade out vendors.

 If you go to disk and have a backup window of 6 hours and you double your data you don't want the backup window to go 12 hours. By adding all four resources in tandem with data growth you ensure a fixed backup window versus an expanding backup window. Make sure the system at 10TB has twice the memory, processor, bandwidth and disk of the system at 5TB. Otherwise, you are putting 10TB through the same resources you were putting 5TB through...and as they say in high tech...there is no free lunch.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to read/write comments

Theories and Reality About Offsite Replication in Disk-based Backup with Deduplication

Posted by Bill Andrews on Thu, Oct 16, 2008 @ 10:08 AM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

There is a theory that suggests that if you use an inline de-duplication process versus post process that the replication to an offsite location will be complete at the offsite sooner.

There are a number of elements to be understood to determine if the theory is in fact correct.

With the inline process the de-duplication is occurring on the fly and therefore, the unique blocks can begin replicating to the second site immediately. However, there are two flaws in the theory. Because inline de-duplication takes longer than post process the de-duplications and replication is still running long after the post process de-duplication is complete. Secondly, if you turn on replication in an inline approach, the processor and memory is now shared across the inline de-duplication and the replication further slowing down de-duplication and further expanding the backup window which extends the time to complete replication.

With post process the backups run to the disk at disk speed and the backup is complete long before the inline approach. The post process de-duplication then kicks off de-duplication and replication in parallel. The first step is serial which is to complete the backup first and then the next steps are in parallel, de-dup the data and replicate. The post process de-duplication and replication starts while the inline approach is still backing up. The question is...does the post process complete the de-duplication and replication before the inline completes backup and replication?

Let's try some math...

With inline, a backup window could be 6 hours. If you turn on replication it could expand that backup window to 8 hours. Therefore, the time to backup and replicate to the second site is 8 hours.

With post process, the backup window is shorter, let's say 4 hours. Once the backup is complete the de-dup and replication work begins and may take another 4 hours. Total time to backup and replicate to the second site is 8 hours.

The time is relatively the same on both approaches as there are no free lunches in technology.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to read/write comments

All Posts

Subscribe by Email

Your email:

Connect with ExaGrid