Right, let’s talk backups. Not the most glamorous topic, I know, but utterly crucial. I recently sat down with Mia, a data protection guru at a local consultancy, to pick her brain about how to keep our company data safe without breaking the bank or clogging up the internet. Our conversation quickly gravitated towards data deduplication and compression, two techniques that, frankly, sound a bit intimidating but are surprisingly effective.
“So, Mia,” I began, coffee in hand, “everyone knows you need backups, but the sheer volume of data these days is overwhelming. What’s the deal with deduplication?”
Mia smiled. “Think of it like this. You’ve got a hundred copies of the same presentation floating around the network. Deduplication identifies those duplicates, and instead of backing up each individual file, it just stores one instance and then points all the other copies to that single instance. It’s surprisingly effective because a lot of data, especially in a business environment, is redundant.”
She went on to explain how this works in practice. Let’s say your accounts department emails the same monthly spreadsheet to everyone. Without deduplication, your backup system diligently copies that spreadsheet a hundred times. With deduplication, it recognises the identical data and only stores it once. The subsequent backups just record that “this file is the same as the one already stored”.
“And compression, I assume, is about squeezing the data down?” I asked.
“Exactly! Compression uses algorithms to reduce the size of files. It’s like zipping a file on your computer, but on a much larger scale and often automatically integrated into the backup process.”
Think of it like packing for a holiday. You could just throw everything in a suitcase, but by rolling your clothes and using packing cubes (compression), you can fit a lot more in. With data compression, it identifies patterns and eliminates redundancy within individual files, not across multiple files like deduplication. This means that images, video and media files can all be compressed into a smaller size which means it takes less bandwidth to get them to your cloud and ultimately, less money.
I asked Mia about the practical implications. “Okay, so these technologies sound great, but how do they actually save us money?”
“It’s a triple win,” she explained. “Firstly, you significantly reduce the amount of data you need to store, lowering your cloud storage costs. Secondly, because you’re transferring less data, your bandwidth consumption plummets, potentially saving you on internet bills. And thirdly, backup and restore processes become much faster because there’s less data to move around.”
We then touched on different cloud providers and their approaches to these technologies. Mia emphasized the importance of doing your research. “Not all providers offer the same level of deduplication and compression. Some might offer it as a standard feature, while others charge extra. Some might have more efficient algorithms than others. Read the small print!”
She suggested asking the following questions when evaluating cloud backup providers:
- What type of deduplication is used (block-level or file-level)? Block-level is generally more efficient as it can identify redundancies within files.
- What compression algorithms are employed? Some algorithms are better suited to certain types of data.
- Is deduplication and compression performed client-side (before data leaves your network) or server-side (after it reaches the cloud)? Client-side deduplication and compression can further reduce bandwidth usage.
- What are the costs associated with deduplication and compression? Are they included in the base price, or are they add-ons?
The conversation then shifted to regulatory compliance and insurance. I asked Mia how data deduplication and compression play into these aspects.
“Well, regulators and insurers are increasingly concerned about data security and resilience,” she said. “Demonstrating that you’re actively minimising your data footprint through deduplication and compression can be seen as a positive step, showing that you’re taking data management seriously. It helps to reduce the risk of a breach, because there is less data that would be compromised and ultimately you should expect an insurance discount.”
Finally, we touched on the importance of testing your backups. “It’s no good having a backup solution if you haven’t tested it,” Mia stated emphatically. “Regularly test your restore processes to ensure that you can recover your data quickly and efficiently in the event of a disaster. Make sure that the restored data is what you expect, too.”
I think the key takeaway is the more that you can squeeze your data, the quicker it takes to backup and the cheaper it is for your ongoing storage. You’ll also be better able to protect your data and prove to both your insurers and regulatory agencies that you take your backup security seriously.
