Products Why ExaGrid? News/Events Partners Support Company Info Contact Us

ExaGrid's Eye on Deduplication

Current Articles | RSS Feed RSS Feed

5 Ways Disk Backup Can Help Your Business

Posted by Bill Hobbib on Fri, Jul 23, 2010 @ 07:06 AM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

This post is the first in a series to share with you 5 ways disk backup can help your business:

  1. Faster Backup Times
  2. Reduce or Eliminate Failed Backups
  3. Better, Faster Offsite Disaster Recovery
  4. Time and Cost Savings
  5. Improved Recovery Time (RTO/RPO)

In this post, we will discuss the first topic--how disk backup enables IT organizations to dramatically reduce their backup times.

1.      Faster Backup Times

ExaGrid Sppeds Up Data BackupWhile tape backup systems have been in use for many years, the slow speed of tape backup devices is a major reason many organizations have looked for a better, faster way to back up their data.  One of the key reasons many organizations move to a disk backup solution is in fact the inherent advantage of disk for faster backups.  Moving from tape to disk will get you faster backups, but it is also important to choose the correct disk backup approach, and to ensure that you will continue to get faster backup times as your data grows.

Disk vs. Tape

While some might argue that some tape-based approaches might allow for a particular backup job to move at rate on par with disk, the key advantage of disk-based backup over tape is the ability to run a large number of backup jobs in parallel.  The number of backup jobs you can run via tape-based backup is only as high as the number of tape drives in your tape library – at most four simultaneous backup jobs can target a four-drive tape library, for example.  This limit is much higher with disk – a 50TB ExaGrid system can support up to 100 simultaneous backup jobs – and the ability to run this many backup jobs in parallel will allow for much higher backup speeds with disk over tape. Click here for additional discussion of disk backup vs. tape backup performance.

Post-process vs. Inline Deduplication

Once you have decided to move to a disk-based backup system, to achieve fastest backup times, it is important to consider the various deduplication approaches available when deciding which system to choose. This decision typically comes down to a choice between inline deduplication and post-process deduplication.

With inline deduplication, backup data is sent to the disk backup system and is processed and deduplicated as the data comes into the system.  Since all of the data coming into the system is being deduplicated on the fly, this can slow the backup down significantly, potentially creating a bottleneck at the point of entry into the backup system.

Post-process deduplication allows backup data to be written to disk without any processing interfering with the backup flow.  A post-process system compresses and deduplicates the data after the backup job has landed on the disk – not as the data is coming into the system.  This is the approach that the ExaGrid system uses. Since nothing is done to the data as it comes into the ExaGrid system, the data can be written at the highest possible rate, giving you the smallest possible backup window.

Planning for Growth

ExaGrid Data Backup TechnologyWhile it is desirable to strive for as short a backup window as possible today, it is also important to keep that backup window short over time, as your data grows.  After all, what good is a short backup window if it’s just going to get longer as your backup jobs get bigger?

With some of the current appliance architectures, keeping the backup window short while allowing for data growth means replacing your backup appliance with bigger and more powerful models of that appliance.  This occurs because many of these disk backup systems use what is called a controller-disk shelf model, where all of the processing power, memory, and bandwidth are on a single controller system, and expansion occurs by simply adding shelves of disk (with no incremental addition of controller components) to the system.  This is not the best solution, as eventually your data needs eventually outstrip the original controller component, and you end up migrating to a system with a more powerful controller and start the process all over again. The result is greater expense and greater management costs as you migrate your data to ever-larger appliances as your data needs grow. 

A better alternative is a grid-based system.  With a grid-based system, each appliance in the system brings with it not only additional disk, but also additional memory, bandwidth, and processing power – all the elements needed to maintain high backup performance.  With ExaGrid’s grid-based disk backup system, keeping your backup window short as your data grows is simply a matter of adding additional appliances to the grid.  There is no need to replace less powerful appliances with more powerful ones – you simply add more appliances to the grid as suits your needs. You get the shortest possible backup times, with the ability to easily keep those times short as your data grows, over time.

Toss the Tape with ExaGrid DeduplicationWant to learn more about making the move from tape to disk backup with deduplication? Download '7 Steps for Overcoming Limitations of Tape Backup' here.

0 Comments Click here to read/write comments

Who's Afraid of Next Generation Deduplication Architectures?

Posted by Marc Crespi on Tue, Jun 02, 2009 @ 10:37 AM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

One of the things that continues to trouble me about how vendors use their blogs is that they often degrade into nothing more than competitive mud-slinging.  Don't get me wrong, vendor blogs are perfectly good vehicles to put factual information about your product out there and to compare and contrast it to other approaches.  However, it seems that the factual product content continues to decline while the rumor, innuendo, and FUD continues to rise.  I cite Rich Colbert's recent blog entry over at Data Domain's Dedupe Matters as a prime example.  Given ExaGrid's message to the market of having a next generation architecture for disk-based backup with data deduplication, I do not think it is a stretch to assume he was referring to ExaGrid in his tantrum.

While I plan to briefly respond to some of his points, I will quickly get back to information regarding the ExaGrid product as I think that is the source of the clear frustration Rich is feeling about the challenge by a vendor he will not name.

Rich's first premise is that if you were not first to market, you are "late" or "disorganized" and can never hope to rival the first market mover's technology or adoption.  Thankfully, there are many examples of why this assertion is inaccurate.  Just ask some of the other late and disorganized companies such as DELL, Microsoft, or even Sun Microsystems, none of who were the first entrants into the markets they ultimately dominated.  And while not all of their current fortunes are bright, there is no question they entered markets with already coronated leaders and figured out what they missed and exploited it to great success.  And there have been first entrants that have withered as times change.  Anyone remember Novell?

On another point, contrary to Rich's implication that only Data Domain is established in this market, ExaGrid now has close to 400 customers in its portfolio with over 2,000 installed systems across a wide variety of verticals. We continue to have record quarter after record quarter, even in this tough climate. 

And, our customers love us.  In fact, more than 110 of our customers demonstrated their satisfaction by having their deduplication success with ExaGrid documented with their names and titles. This is more customer success stories than all other vendors in our space combined, including Data Domain.

While Rich's current employer can cite more customers than ExaGrid due to being the first entrant, there certainly is no question that ExaGrid's technology has been validated by the market and is responsible for our rapid growth and greater than 70% competitive win rate.

But with all of that said, IT buyers need information about products not random musings by vendors.   We as vendors need to simply put our products forward so that customers can decide which one better meets their requirements.  On the product front, ExaGrid brings the following unique things to this market that were not present in first generation approaches:

  • Scalability - our GRID based architecture maintains a customer's backup window and restore performance as their data grows and avoids fork lift upgrades when you reach a system's capacity.
  • Backup/Restore performance - our post-process architecture provides for faster backups (maximum of 5 TB/hour) and optimized restores and tape copies of most recent data by eliminating deduplication overhead.
    • Contrary to assertions by exclusively in-line vendors, it is meaningless to compare restore rates for deduplicated data.  If 95% of the time restores will come from the most recent backup which is in non-deduplicated form with ExaGrid , then what is the point?
    • Suggesting that it is important to compare restores from deduplicated data is like saying an airline with a 5% on-time arrival record is equal to one with a 95% on-time record because the better airline is also late 5% of the time.
  • Unified management - ExaGrid's management interface places an entire multi-site installation in a single web interface for all configuration and management reducing management time and complexity.
  • Backup job aware reporting - ExaGrid uniquely can provide deduplication ratios and replication status by backup job so that users can really maximize their space savings and understand exactly which backup jobs are ready for restore at a DR location.

I wonder if it is the above differences that made Rich afraid to "help our organic search results" by mentioning us by name?  If a company is as invincible as he made Data Domain sound, why be afraid your prospective customers will find a later to market company making wild claims such as a markedly better approach?

0 Comments Click here to read/write comments

Assessing the Deduplication Tax

Posted by Marc Crespi on Thu, Sep 25, 2008 @ 05:17 PM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

As we wind through this intense political season, discussions of taxes are everywhere.   Taxes are a fact of life.  But before this post is misunderstood as a political position, let me explain the type of tax I am discussing today:  the dedupe tax.

The fact is that deduplication is an extremely important storage technology, especially in disk-based backup products.  It is deduplication that allows organizations to store large amounts of data in a small amount of disk and to transfer backups over wide area networks to disaster recovery sites by moving a very small amount of data.

However, as with most compelling technologies, there are trade-offs to be made.  Each disk-based backup vendor is deciding which trade-offs are right for your data center and your backups. Given the importance of your backup window and restore times, the most critical trade-offs to be considered are the performance trade-offs -- or what I call the "dedupe tax".  The dedupe tax is a performance hit that could show up as a longer backup window or a dramatically slowed restore.

The assessment of the tax varies with the deduplication method employed by the vendor:

  • Post-process de-duplication (implemented by ExaGrid Systems) - backups are written directly to disk in their entirety and maintained for rapid restore of your most recent backup. Since 90% or more of all restores are done from your most recent backup, this method avoids the de-dupe tax for 90 to 95% of restores and 100% of backups.
  • In-line deduplication - is performed on backup data on its way to disk. This method charges you the de-duplication tax for every backup into the system and every restore out of the system. The promise is the use of less disk (but not lower cost) and simplicity. With in-line, you are in the 100% tax bracket!

So, what is the cost of the dedupe tax? It can be substantial.  It can slow your backups down by as much as 2x to 5x versus raw disk speeds.  Similarly, it can dramatically slow down restores and force your organization to wait much longer to recover data when you can least afford it - during a critical recovery scenario.

Look for implementations that allow you to determine when and how much dedupe tax should be paid.  With ExaGrid's post-processing GRID architecture, all backups and most restores are tax free.  Only restores from older, deduplicated data incur the de-dupe tax and these are generally smaller restores with much less urgency associated with them.

Taxes must be paid. But, do not pay a gigabyte per second more in dedupe tax than needed!

Marc Crespi is the Vice President of Product Management for ExaGrid Systems, Inc.

0 Comments Click here to read/write comments

Clarity about client side, inline and post-process data deduplication

Posted by Bill Andrews on Tue, Sep 16, 2008 @ 04:15 PM
Share on Twitter Twitter | Share on Facebook Facebook | Buzz This  Google Buzz | Submit to Digg digg it |  Add to delicious  delicious |  Share on LinkedIn LinkedIn 

There is great debate around disk-based backup systems with data de-duplication as to whether client level, inline de-duplication or post process de-duplication is better.

The idea of data de-duplication is to avoid storing redundant data. Some only store unique roughly 8KB blocks of data and some only store the actual bytes, at the byte level, that change. Both of these methods deliver similar de-duplication rates. But the question remains... where is the best place to de-duplicate the data?

Client level de-duplicates the data where the "backup agent" lives on each application server. The advantage of this approach is that less traffic is sent over the network and therefore the backup window is the shortest with this approach. The disadvantage of this approach is that you have to replace your existing backup application with the new client-based de-duplication application.

Inline de-duplication is when the disk-based backup appliance, connected to your existing backup server, de-duplicates the data on the way to the disk. The advantage to this approach is that it uses less disk than post process and theoretically should cost less. The disadvantages are that this approach provides for the slowest backup windows as the de-duplication slows the backups down from writing to disk, expanding the backup window. These systems require more memory and processor so they are not necessarily less costly.

Post process de-duplication is when the disk-based backup appliance, connected to your existing backup server, allows the data to write directly to the disk from the backup server, at disk speed. The de-duplication work begins after the backup is complete. The advantage is that backups occur much faster than the inline approach resulting in a shorter backup window. The disadvantage is that more disk is required to land the backup and then compare. However, the cost of the additional disk is no more than the additional processor and memory required for inline process and therefore post process systems do not cost more than inline. In fact, in most cases they cost less. If you choose a post process system, make sure that the system is sized properly to de-dup all your data well in advance of the next backup coming in.

There is a further advantage / disadvantage debate as to which approach allows for replication of changed data to be received at the offsite system the fastest. I plan to expand upon this in a separate post.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

2 Comments Click here to read/write comments

All Posts

Subscribe by Email

Your email:

Connect with ExaGrid