Blog

Current Articles | RSS Feed RSS Feed

Who's Afraid of Next Generation Deduplication Architectures?

Posted by Marc Crespi on Tue, Jun 02, 2009 @ 10:37 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

One of the things that continues to trouble me about how vendors use their blogs is that they often degrade into nothing more than competitive mud-slinging.  Don't get me wrong, vendor blogs are perfectly good vehicles to put factual information about your product out there and to compare and contrast it to other approaches.  However, it seems that the factual product content continues to decline while the rumor, innuendo, and FUD continues to rise.  I cite Rich Colbert's recent blog entry over at Data Domain's Dedupe Matters as a prime example.  Given ExaGrid's message to the market of having a next generation architecture for disk-based backup with deduplication, I do not think it is a stretch to assume he was referring to ExaGrid in his tantrum.

While I plan to briefly respond to some of his points, I will quickly get back to information regarding the ExaGrid product as I think that is the source of the clear frustration Rich is feeling about the challenge by a vendor he will not name.

Rich's first premise is that if you were not first to market, you are "late" or "disorganized" and can never hope to rival the first market mover's technology or adoption.  Thankfully, there are many examples of why this assertion is inaccurate.  Just ask some of the other late and disorganized companies such as DELL, Microsoft, or even Sun Microsystems, none of who were the first entrants into the markets they ultimately dominated.  And while not all of their current fortunes are bright, there is no question they entered markets with already coronated leaders and figured out what they missed and exploited it to great success.  And there have been first entrants that have withered as times change.  Anyone remember Novell?

On another point, contrary to Rich's implication that only Data Domain is established in this market, ExaGrid now has close to 400 customers in its portfolio with over 2,000 installed systems across a wide variety of verticals. We continue to have record quarter after record quarter, even in this tough climate. 

And, our customers love us.  In fact, more than 110 of our customers demonstrated their satisfaction by having their deduplication success with ExaGrid documented with their names and titles. This is more customer success stories than all other vendors in our space combined, including Data Domain.

While Rich's current employer can cite more customers than ExaGrid due to being the first entrant, there certainly is no question that ExaGrid's technology has been validated by the market and is responsible for our rapid growth and greater than 70% competitive win rate.

But with all of that said, IT buyers need information about products not random musings by vendors.   We as vendors need to simply put our products forward so that customers can decide which one better meets their requirements.  On the product front, ExaGrid brings the following unique things to this market that were not present in first generation approaches:

  • Scalability - our GRID based architecture maintains a customer's backup window and restore performance as their data grows and avoids fork lift upgrades when you reach a system's capacity.
  • Backup/Restore performance - our post-process architecture provides for faster backups (maximum of 5 TB/hour) and optimized restores and tape copies of most recent data by eliminating deduplication overhead.
    • Contrary to assertions by exclusively in-line vendors, it is meaningless to compare restore rates for deduplicated data.  If 95% of the time restores will come from the most recent backup which is in non-deduplicated form with ExaGrid , then what is the point?
    • Suggesting that it is important to compare restores from deduplicated data is like saying an airline with a 5% on-time arrival record is equal to one with a 95% on-time record because the better airline is also late 5% of the time.
  • Unified management - ExaGrid's management interface places an entire multi-site installation in a single web interface for all configuration and management reducing management time and complexity.
  • Backup job aware reporting - ExaGrid uniquely can provide deduplication ratios and replication status by backup job so that users can really maximize their space savings and understand exactly which backup jobs are ready for restore at a DR location.

I wonder if it is the above differences that made Rich afraid to "help our organic search results" by mentioning us by name?  If a company is as invincible as he made Data Domain sound, why be afraid your prospective customers will find a later to market company making wild claims such as a markedly better approach?

0 Comments Click here to Read/write comments

Disk Backup with Dedupe Answers Security and Regulatory Threats of Tape Backup

Posted by Marc Crespi on Fri, May 01, 2009 @ 08:50 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

Securing backup data has become an absolute requirement for organizations of all sizes. Sweeping government regulations such as HIPAA, GLBA, Sarbanes-Oxley and many others, have placed more stringent requirements on organizations to back-up and secure a wide range of data, from healthcare and credit records for individuals, to financial and confidential business information for corporations.

Tape Backup - An Ever-present Security Risk
For decades, companies have been backing up data onto magnetic tapes and then transporting those tapes to an offsite location for long term archiving. However, this process is inherently cumbersome and lacks security.  Backup tapes are often moved via an employee's personal vehicle, or in cases where the information is extremely sensitive, by bonded delivery truck. But no matter what the transport method, tapes are often misplaced, lost or even stolen. Cases of lost corporate or consumer data have become all too familiar in today's headlines.

While organizations can resort to encryption to protect the confidentiality of the data, using encryption products can be expensive, time-consuming and complex, especially for mid-sized and small enterprise companies.

Disk-based Backup Cuts the Risk and Enhances Security
Many companies are moving to disk backup to shorten backup windows and to gain faster and more reliable backups and restores, but they're also gaining many security benefits in the process.  Because disk backup solutions reside in the same data center as application servers and primary data storage, backup data is inherently secured by network and physical data center security, leveraging the same security that is used for the rest of the data and network components.

Additionally, many organizations use disk backup systems at secondary sites to eliminate offsite tape transport and storage, and to avoid costly and complex encryption technology. The second site acts as a live disk-based repository, providing fast recovery in the event of a disaster, while also taking advantage of existing network and data center security. Data deduplication technology reduces the amount of data that needs to be transmitted to the offsite location, enabling the high speed replication and creating a secure offsite repository for the data. All data transfers between the primary site and the offsite disk-based system occur over secure, encrypted network connections. 

Disk backup systems with deduplication helps IT departments retain vital data for rapid, reliable retrieval in a highly secure way, in both primary and offsite locations.  All while reducing or even eliminating tapes many challenges and security risks.

Have you lost data you had entrusted to tape backup?  Share your story by adding a comment.

0 Comments Click here to Read/write comments

Achieving Costs Savings by Moving to Disk with Deduplication for Backup

Posted by Marc Crespi on Thu, Feb 12, 2009 @ 07:45 PM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

Examining the budget for tape reveals an opportunity to reinvest for a better outcome. A number of operating costs are associated with maintaining tape libraries. Annual costs for each tape library repository include media replacement, the cost of tape storage, the cost of retrieval, the cost of moving tape to a second location, the cost for tape administration and tape library maintenance. Productivity costs include monitoring during tape back-ups, maintaining equipment such as cleaning heads, loading and changing tapes, labeling and physically transporting to offsite for disaster recovery. All of these must be budgeted and accounted for as they will occur no matter the business environment. In addition, tape libraries age poorly, tape arms will break and the same issues that haunt tape today such as poor security, poor performance, and a lack of data integrity will continue. The question companies are now asking is can the associated dollars and time taken up by the existing tape library be better utilized by moving to disk back-up with deduplication.

Back-up with Deduplication Makes Sense in Tight Budgetary Times
The fact is for many environments, particularly for the small to medium sized enterprise, the total cost of tape backup and disk backup with deduplication are equivalent. The key differential typically occurs in the operating costs of tape vs. backup with deduplication. The following are three examples of ExaGrid customers who have experienced clear savings.

Federal Mediation and Conciliation Service (FCMS):
Federal Mediation is an independent agency whose mission is to preserve labor management peace and cooperation. FCMS IT is responsible for supporting 500 users in 250 offices throughout the United States. The central issues around backup the IT staff was dealing with were the expenses, time and data integrity associated with backing up to tape. The company had been spending $800 per month on tapes and $240 per month to mail the tapes to a DR site. Since installing a disk based back-up system FMCS has been able to reduce its backup costs considerably. "The ExaGrid System was more cost effective than some of the other solutions we looked at. It was less expensive to acquire and we were able to use it along with our existing backup application"

Morningstar:
Morningstar is a leading provider of independent investment research with locations in North America, Europe and Asia Pacific. Morningstar was backing up to tape but with significant data growth its backup windows had increased to a point where they were unmanageable. "Tape was cumbersome to deal with and our backups were just taking too long" said James Richmond, Network Administrator. By moving to an ExaGrid System, Morningstar is saving nearly $3,000 per month in tape costs and has significantly reduced transportation and tape storage fees. The company has also reduced the number of man hours spent on managing and administering tape backups allowing staff to gain the productivity they need to stay ahead of the demands of the business.

Eby-Brown:
Based in Naperville, Illinois, Eby-Brown is the second largest convenience store products distributor in the United States. In evaluating the move to disk backup with deduplication, Eby-Brown's IT Systems Integrator noted the following about ExaGrid: "Before we went with ExaGrid, we performed a cost of ownership analysis that showed installing the ExaGrid systems would cost us less than tape. When you consider the cost of tape, transportation and the amount of time our IT staff had devoted to managing tape and performing restores, purchasing the ExaGrid system is a no-brainer."

All evidence continues to point to accelerated data growth and a greater need for fast, reliable backup systems - despite a downward economic spiral. Though the situation appears complex, the facts reveal a relatively simple but powerful solution is at the ready: dollars and time currently consumed by existing tape libraries can be far better utilized by moving to disk back-up with deduplication. Fast, reliable, highly scalable, and exceptionally cost-effective.

Do you have a story about cost savings and backup?  Share your thoughts with us by commenting on this blog post.

0 Comments Click here to Read/write comments

How fast can a disk-backup system go?

Posted by Bill Andrews on Tue, Nov 11, 2008 @ 12:55 PM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

How fast can a disk-backup system go?

Surprisingly enough this is the wrong question.

The real question is how fast can your backup environment go?

Well over 90% of the time the backup environment is sending data slower than the disk-based backup product is designed to perfom. The bottleneck is more often your environment.

So you buy a disk-based backup system rated at 100meg and you implement it and get 50meg.

That is almost always because the backup server or environment is the bottleneck.

So how do you tell if your environment is the bottleneck or the disk-based backup product is the bottleneck?

The trick is to take your existing backup environment and write directly to good old-fashioned straight disk. Measure the performance. That performance is the benchmark. If the disk-based backup products perform faster, when it comes time to implement, you will not see the rated speed of the disk-based backup product because it can only run at the speed at which your backup environment is feeding it.

What could be slowing the backup environment down?

  • Configuration of agents
  • Configuration of backup servers and media servers
  • Bandwidth
  • NIC cards
  • The server that the backup application is running on

All of these slow down the backups.

Run the test to straight disk to know the performance of your environment and then you can see if the disk-based backup system slows that down or not.

In short...know the performance of your environment before you throw the baby out with the bath water.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to Read/write comments

Questions to Ask About True Scalability in Backup with Data De-duplication

Posted by Bill Andrews on Tue, Oct 28, 2008 @ 10:47 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

The word scalability is often applied broadly. Scalability in backup simply means that when your data grows you can add more capacity.

 What are all the questions you need to ask?

  1. Does the additional capacity just plug in and virtualize itself into the existing capacity?  Most vendors have simple plug and play inclusion of additional capacity.
  2. Can all the capacity be managed from a single console? The answer is typically yes within a single scalable system.  However, is the answer still yes if you max out a system and add a second system?
  3. Do I need to change anything in my existing backup application or does it all work together seamlessly? How much work is there: none, 15 minutes or hours?
  4. When I add capacity am I just adding more disk or am I adding more performance?  This may be the most important question because if your backups are 5TB and they grow to 10TB and all you add is disk capacity, then your backups will slow down and explode your backup window. You need to add processor, memory, bandwidth and disk with data growth to ensure that the backup window does not expand.
  5. How big can the current system scale before I need a second system and how many systems can I have in a group? Knowing this ensures you have plenty of room to grow without having to trade out vendors.

 If you go to disk and have a backup window of 6 hours and you double your data you don't want the backup window to go 12 hours. By adding all four resources in tandem with data growth you ensure a fixed backup window versus an expanding backup window. Make sure the system at 10TB has twice the memory, processor, bandwidth and disk of the system at 5TB. Otherwise, you are putting 10TB through the same resources you were putting 5TB through...and as they say in high tech...there is no free lunch.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to Read/write comments

Theories and Reality About Offsite Replication in Disk-based Backup with Deduplication

Posted by Bill Andrews on Thu, Oct 16, 2008 @ 10:08 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

There is a theory that suggests that if you use an inline de-duplication process versus post process that the replication to an offsite location will be complete at the offsite sooner.

There are a number of elements to be understood to determine if the theory is in fact correct.

With the inline process the de-duplication is occurring on the fly and therefore, the unique blocks can begin replicating to the second site immediately. However, there are two flaws in the theory. Because inline de-duplication takes longer than post process the de-duplications and replication is still running long after the post process de-duplication is complete. Secondly, if you turn on replication in an inline approach, the processor and memory is now shared across the inline de-duplication and the replication further slowing down de-duplication and further expanding the backup window which extends the time to complete replication.

With post process the backups run to the disk at disk speed and the backup is complete long before the inline approach. The post process de-duplication then kicks off de-duplication and replication in parallel. The first step is serial which is to complete the backup first and then the next steps are in parallel, de-dup the data and replicate. The post process de-duplication and replication starts while the inline approach is still backing up. The question is...does the post process complete the de-duplication and replication before the inline completes backup and replication?

Let's try some math...

With inline, a backup window could be 6 hours. If you turn on replication it could expand that backup window to 8 hours. Therefore, the time to backup and replicate to the second site is 8 hours.

With post process, the backup window is shorter, let's say 4 hours. Once the backup is complete the de-dup and replication work begins and may take another 4 hours. Total time to backup and replicate to the second site is 8 hours.

The time is relatively the same on both approaches as there are no free lunches in technology.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to Read/write comments

Assessing the Deduplication Tax

Posted by Marc Crespi on Thu, Sep 25, 2008 @ 05:17 PM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

As we wind through this intense political season, discussions of taxes are everywhere.   Taxes are a fact of life.  But before this post is misunderstood as a political position, let me explain the type of tax I am discussing today:  the dedupe tax.

The fact is that deduplication is an extremely important storage technology, especially in disk-based backup products.  It is deduplication that allows organizations to store large amounts of data in a small amount of disk and to transfer backups over wide area networks to disaster recovery sites by moving a very small amount of data.

However, as with most compelling technologies, there are trade-offs to be made.  Each disk-based backup vendor is deciding which trade-offs are right for your data center and your backups. Given the importance of your backup window and restore times, the most critical trade-offs to be considered are the performance trade-offs -- or what I call the "dedupe tax".  The dedupe tax is a performance hit that could show up as a longer backup window or a dramatically slowed restore.

The assessment of the tax varies with the deduplication method employed by the vendor:

  • Post-process de-duplication (implemented by ExaGrid Systems) - backups are written directly to disk in their entirety and maintained for rapid restore of your most recent backup. Since 90% or more of all restores are done from your most recent backup, this method avoids the de-dupe tax for 90 to 95% of restores and 100% of backups.
  • In-line deduplication - is performed on backup data on its way to disk. This method charges you the de-duplication tax for every backup into the system and every restore out of the system. The promise is the use of less disk (but not lower cost) and simplicity. With in-line, you are in the 100% tax bracket!

So, what is the cost of the dedupe tax? It can be substantial.  It can slow your backups down by as much as 2x to 5x versus raw disk speeds.  Similarly, it can dramatically slow down restores and force your organization to wait much longer to recover data when you can least afford it - during a critical recovery scenario.

Look for implementations that allow you to determine when and how much dedupe tax should be paid.  With ExaGrid's post-processing GRID architecture, all backups and most restores are tax free.  Only restores from older, deduplicated data incur the de-dupe tax and these are generally smaller restores with much less urgency associated with them.

Taxes must be paid. But, do not pay a gigabyte per second more in dedupe tax than needed!

Marc Crespi is the Vice President of Product Management for ExaGrid Systems, Inc.

0 Comments Click here to Read/write comments

Clarity about client side, inline and post-process data deduplication

Posted by Bill Andrews on Tue, Sep 16, 2008 @ 04:15 PM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

There is great debate around disk-based backup systems with data de-duplication as to whether client level, inline de-duplication or post process de-duplication is better.

The idea of data de-duplication is to avoid storing redundant data. Some only store unique roughly 8KB blocks of data and some only store the actual bytes, at the byte level, that change. Both of these methods deliver similar de-duplication rates. But the question remains... where is the best place to de-duplicate the data?

Client level de-duplicates the data where the "backup agent" lives on each application server. The advantage of this approach is that less traffic is sent over the network and therefore the backup window is the shortest with this approach. The disadvantage of this approach is that you have to replace your existing backup application with the new client-based de-duplication application.

Inline de-duplication is when the disk-based backup appliance, connected to your existing backup server, de-duplicates the data on the way to the disk. The advantage to this approach is that it uses less disk than post process and theoretically should cost less. The disadvantages are that this approach provides for the slowest backup windows as the de-duplication slows the backups down from writing to disk, expanding the backup window. These systems require more memory and processor so they are not necessarily less costly.

Post process de-duplication is when the disk-based backup appliance, connected to your existing backup server, allows the data to write directly to the disk from the backup server, at disk speed. The de-duplication work begins after the backup is complete. The advantage is that backups occur much faster than the inline approach resulting in a shorter backup window. The disadvantage is that more disk is required to land the backup and then compare. However, the cost of the additional disk is no more than the additional processor and memory required for inline process and therefore post process systems do not cost more than inline. In fact, in most cases they cost less. If you choose a post process system, make sure that the system is sized properly to de-dup all your data well in advance of the next backup coming in.

There is a further advantage / disadvantage debate as to which approach allows for replication of changed data to be received at the offsite system the fastest. I plan to expand upon this in a separate post.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

2 Comments Click here to Read/write comments

Confusion about VTL and Disk-based Backup

Posted by Bill Andrews on Mon, Sep 08, 2008 @ 10:05 AM
Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon 

There seems to be a lot of confusion around disk-based backup and VTL. Many use these words interchangeably as to imply that disk-based back up is VTL and VTL is disk-based backup. The truth is that VTL is an interface between the backup server and the disk.

All backup applications can  write to three targets:

  1. Tape library
  2. NAS shares (network attached storage device share)
  3. Disk volume - any disk

If you want to backup to disk you have 3 choices:

  1. If you want to write in tape mode then you need to put VTL between the backup server and the disk. The VTL emulates a tape library on the front end and writes to disk on the back end. Up until a few years ago this was the only way you could write to disk. But then the backup applications all added the ability to write natively to disk by adding the ability to write to a NAS share or disk volume. Therefore, VTL has gone away in the mass market as it was a stop gap. However, it still has value in Fibre SAN environments.
  2. You can point back up jobs at NAS shares. Simply plug a NAS server behind your backup server and point your backups at NAS shares.
  3. You can point backup jobs at disk volumes. This is the least common method in the industry...as all the products with data de-duplication use either NAS or VTL.

The industry is dividing into two camps as the disk-based backup systems with data de-duplication in the market offer either NAS or VTL.

  1. If your backup server is on a Fibre SAN and you want the disk-based backup product on the Fibre SAN, VTL can handle SAN block level traffic. This tends to be the case mostly in the large enterprises.
  2. For the mass market of mid market to small enterprise customers where they don't have a Fibre SAN--or if they do have a Fibre SAN, their backup application is on the Ethernet network and not Fibre, then the solution of choice is to connect a NAS based disk-based backup system. NAS is connected via Ethernet to the disk-based backup system. This can be over the Ethernet network or to keep the traffic off the Ethernet network this can be a private Ethernet connection between the backup server and the disk-based backup system.

Bill Andrews is President and CEO of ExaGrid Systems a company that provides fast, low cost and scalable disk-based backup with data de-duplication solutions.

0 Comments Click here to Read/write comments

All Posts