Backup Storage ChallengesIn 1990, the hard disk of a personal computer was 10 megabytes. You should always analyze your needs and infrastructure before implementing deduplication. In some use cases traditional non-deduplicated storage may be more cost-effective than deduplicated. Reduce network load because less data is transferred, leaving more bandwidth for your production tasksRemember, however, that deduplicated storage may require more computing resources, such as RAM and/or CPU. Eliminate the need to invest in data deduplication-specific hardware The “sheer volume of data” was given as one of the primary reasons why.For example, let’s look at a company with 400 employees who use desktops and laptops. Otherwise your company can lose money, reputation, time — your entire business can even shut down.However, 75 percent of small-to medium-sized businesses (SMBs) surveyed by Acronis and IDC (International Data Corporation) admit that their data is not fully protected. Every 10 minutes, humanity creates as much data volume as was created from the dawn of civilization until the year 2000.You must protect and back up all this data.
![]() The next, even bigger challenge is to back up the PCs to this storage. Eventually, this company may need to acquire as much as a one p etabyte of storage for PC backups alone.Let’s assume this company invests in expensive storage for their PC backups. With a 2:1 compression ratio, the backup administrator needs to provision between 10 to 75TB for every full backup, plus have more space for incremental and differential backups. PCs contain from 20 to 150TB (terabytes) of data. What Is Backup Deduplication?Backup deduplication minimizes storage space by detecting data repetition and storing the identical data only once. If a backup solution transfers and stores only unique data, the company can decrease their storage capacity and network requirements by up to 50 times! With deduplication, your organization can realize these savings. Storing and transferring the same data multiple times to the same storage is a waste of time and resources. At this rate, a full backup will take from two to three weeks to transfer 10 to 75 TB of data over a 100 Mbit network.Yet, every desktop has the same Windows operating system, same applications, and often numerous copies of the same data. Deduplication Software Full Backups OfIncremental backups where the data does not change but the location of data does change for example, when data, such as a file, circulates over the network or within one system and appears in a new placeDuring deduplication, the backup data is split into blocks. Incremental backups of similar data from different sources for example, when you deploy OS updates to multiple systems and run an incremental backup Full backups of systems that you previously backed up to the same deduplicating storage F ull backups of similar data from different sources, such as operating systems (OS), virtual machines (VMs), and applications deployed from a standard image A storage location where deduplication is enabled is called a deduplicating storage.Deduplication can operate at a file-, sub file- (pieces of files), or block-level and usually works with all operating systems supported by your backup solution.Deduplication produces maximum results when you create: Skype for mac groupsIn these cases, the solution would always transfer this data to the storage without calculating the hash values. If so, the solution sends only the hash value otherwise, it sends the block itself.Some data, such as encrypted files or disk blocks of a non-standard size, cannot be deduplicated. The fixed-size block deduplication has proven to be ineffective – on small block sizes, it consumes a lot of RAM and CPU and on large block sizes, it provides much lower deduplication ratio.Most advanced modern backup solutions provide variable-size block deduplication, adapting the block sizes to maximize the deduplication ratio, while reducing RAM and CPU usage.Before sending the data block to the storage, the backup solution queries the storage system to determine whether the block’s hash value is already stored there. This fingerprint or checksum is often called a hash value.Your backup solution may support blocks of fixed size or variable size. Deduplication at SourceWhen performing a backup to a deduplicating storage, the backup solution calculates a fingerprint or a checksum of each data block. Unique blocks are sent to the storage and duplicates are skipped.For example, if 10 virtual machines are backed up to the deduplicated storage and the same block is found in five of them, only one copy of this block is sent and stored.This algorithm of skipping duplicate blocks saves storage space and minimizes network traffic. Hash values and links to data blocks are saved to the deduplication database, so the data can be easily reassembled (rehydrated).As a result, the data store contains a number of unique data blocks. Duplicate blocks are stored only once. Data blocks are moved from the backup file to a special file — the deduplication data store — within the storage. Usually this process works as follows: Archive 2 contains only data (green) blocks and is encrypted. The green blocks are the data blocks that cannot be deduplicated. In Archive 1, h1 through h7 — designated by blue blocks — contain hash values stored in the backup files. Each has a separate set of backups. The references are recorded in the deduplication database.The figure below illustrates the result of deduplication at target.The diagram shows two backup archives. Deleting Orphan Data BlocksAfter one or more backups are deleted from the storage — either manually or through retention rules — the data store may contain blocks, which are no longer referenced by any backup. For an agent, the recovery process is transparent and independent of the deduplication. The storage system reads backup data from the storage and if a block is referenced in the deduplication data store, the storage system reads data from it. RecoveryDuring recovery, the backup solution agent requests the data from the storage. Hash values for each data block are calculated before compression. Compression and EncryptionThe backup solution agent would usually compress the backed up data before sending it to the server. That is why this task usually runs only when a sufficient amount of data has accumulated in your storage. Second, the storage system deletes all the unused blocks.This process may require additional system resources. First, the storage system scans through all backups in the storage and marks all referenced blocks as used (the appropriate hash is marked as used in the deduplication database). Here is the formula for the deduplication ratio calculation:Deduplication ratio = Unique data percentage + (1 – Unique data percentage) / Number of machines If the storage medium is stolen or accessed by an unauthorized person, the storage cannot be decrypted without access to the storage system.Deduplication has the greatest impact when the deduplication ratio has the lowest value. In this case, during recovery, the data would be transparently decrypted by the storage system using a storage-specific encryption key. Deduplication is even more effective because there are a large number of workstations.Deduplication is very effective in this scenario because it minimizes storage capacity and saves storage costs. As a result, there are many duplicates. The workstations were initially deployed using a disk-imaging system deployment solution.The workstations were deployed from a single image, so the operating system and generic applications that run on all machines are identical. Use Case 1: Big Environment With Similar MachinesOne hundred similar workstations need to be backed up. Deduplication is most effective in environments where you need to back up a lot of similar machines/virtual machines/applicationsIn addition, deduplication can help in other scenarios, such as when you are trying to optimize your wide area network (WAN).Let us look at some typical use cases.
0 Comments
Leave a Reply. |
AuthorCarolyn ArchivesCategories |