Understanding Backup Data Retention
I recently spoke to a business that admitted that they have turned ON their Litigation Hold feature in Microsoft 365 to preserve all data forever. They see this as an inexpensive way to achieve ‘backups’.
This approach is flawed at so many levels – and it made me realize how businesses could put themselves at risk of exposure due to an insufficient understanding of data retention. Retaining all data for all time – isn’t always the best thing – and it is important to understand why.
The concept of Data retention is at least as old as backups – because any time you make a backup and create a secondary copy one must ask the question of how long the secondary copy needs to be retained.
A data retention policy is an integral part of any business’ data management strategy. It can be simply defined as an organization’s protocol for retaining information that they require for business needs. Business needs may be operational or dictated by compliance requirements.
Any time you make a backup and create a secondary copy one must ask the question of how long the secondary copy needs to be retained.
Why data retention policy is important ?
Why have a data retention policy at all? For the reason that an organization should retain data only for as long as it is needed.
Why is this so?
Firstly – there is the matter of the cost. Retaining all data forever means an unbounded cost for storage. And with data showing close to exponential growth year on year, this isn’t practical to manage for any organization.
Secondly, not having an effective way to prune out unneeded data could result in having a lot of unnecessary data accumulate – making it difficult to find and access data that is actually required.
Thirdly, regulatory compliance requires organizations to be able to access and find relevant information whenever required, within all data they retain. Retaining more data than necessary, but without proper means to access and find required information is not only unhelpful, but it could also put the organization in the difficult position of being penalized for non-compliance. Businesses therefore choose to retain a limited amount of essential data and ensure that the data they retain can be searched easily for information access.
Lastly, and perhaps most important – retaining more information than necessary also means that it becomes part of any legal discovery process that the business may be subject to. Having only essential data retained reduces the possibility that unrelated data is surfaced during a legal discovery and used against the business.
Retention and Backups
An important consideration for data retention is how to deal with secondary copies – i.e. backups. Policies developed for data retention must cover not only automatic deletion of primary data, but secondary data copies as well.
The good news is that most modern backup software has automated ways to manage retention. Backup retention rules can take many forms:
1. Retain by file version – You can stipulate that only the latest version of any file or the latest 3 versions be retained. Older versions are automatically purged by the backup software.
2. Retain by time – Retain data for 1 year, 3 years, 7 years etc. The timeframes here are not always arbitrary – and could be driven by regulatory compliance requirements.
3. Retain by time and density – Many times businesses may choose to retain more recent data in a denser form (i.e. daily backups), older data sparsely – (e.g. store only weekly or monthly backups for data older than 6 months) and even older data more sparsely – (e.g. store only yearly backups for data older than one year – but going back no farther than 7 years)
Another complexity that gets layered into retention is also the matter of where the backups are stored. Backups (especially older backup data) could be stored offsite. This used to take the form of physical tape movement in older days – and is today more usually replaced with an offsite copy such as in an alternate data center or using the public cloud.
Suspending Data Retention
It is important to be able to suspend retention rules for periods of time. Temporary suspensions of data retention usually result from a “Litigation Hold” request. Also called Legal Hold, this is triggered in cases where there is a demand for information from an attorney or a judge, or a legal team.
In such scenarios, the business may be compelled to retain and produce all data pertaining to a certain matter or certain users. For the period that a litigation or legal review is ongoing, the business’ legal team or a judge may ask data retention to be suspended for specific subjects.
Many modern backup solutions have litigation or legal hold features built-in. Data retention can be suspended temporarily and once a user is placed in a policy with Litigation Hold turned ON, the software automatically suspends any policy-based data deletion – in other words data retention becomes infinite, and results in backed up data being retained forever from that point forward – until of course the ‘hold’ is turned OFF.
Data retention is a complex and nuanced topic and data retention policies should be crafted with care. At Parablu, where we design and bulid data management solutions (including enterprise class backup), we deal with questions from customers around data retention all the time. Retaining all data forever or turning ON “litigation hold’ for all your users may not be the best approach – in fact it could be fraught with peril in case the business becomes a target for legal discovery.
At Parablu we build solutions focused on protecting enterprise data. Parablu’s BluVault is designed to enable robust data backup from user endpoints, SaaS workloads (Microsoft 365) and edge servers. Our patented integration with Microsoft 365 and OneDrive for Business also means that you can deploy BluVault without spending a penny for backup storage. Sound interesting? Reach out to us and learn more.