Demystifying Data Backups

Demystifying Data Backups Banner

Ask a non-tech person what a data backup is – and they’ll probably answer that it is simply a second copy of data you make for safety.  And they would be right!  There isn’t much more to backup than the fact that it is a redundant copy of data – that you can rely on, should you lose your original copy.

But backup methods have gotten so convoluted and complex in today’s world – it would seem you need a post-graduate degree in Computer Science to understand what they do and how they work.  In this blog post, we’ll try to de-mystify backups and restores and lay them bare – so we can all easily understand what happens behind the scenes.

Backup techniques have evolved over time and become increasingly sophisticated (and perhaps complex as a result).

Evolution of backups

Backup techniques have evolved over time and become increasingly sophisticated (and perhaps complex as a result).  Considerations such as time taken for backup, time taken for restores, storage costs, network bandwidth savings etc. – have all, over time, driven innovations which have been designed to make backups better – but also increase complexity as a result.

Let’s take a look at the types of backups that are available and their respective pros and cons.

Full backups

I am quite sure everybody who’s reading this blog has heard of full backups.  They are the simplest form of backup and the easiest to understand.  Full backups essentially make a backup of everything you wish to protect every time.  So, all files, objects, bytes however you wish to measure your data – every one of them is copied over to a secondary storage target each time.  If you perform full backups once a day – then everything is copied over once a day.

Let’s take an example – say you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup.

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1 and new File called E is added.   A, C & D remain the same.  
  • hen you run the backup on Day 2, it’ll backup all 5 files and it’ll take you 50mins
  • One Day 3, let’s say File B changes again and becomes B2.  File C also changes to C1, and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup 4 files again (D is removed – remember?) and it’ll take you 40 mins.
Demystifying Data Backups - Full backups

When you restore, you will most likely get data from the latest backup and it’ll take you 40 mins to restore.

Simple enough?

Differential backups

Full backups, as you can see take time.  40 minutes – 50 minutes each day as in our example. The next optimization the industry made was differential backups. Differential backups make a copy of files that have changed since the full backup.

Let’s take the same example – say you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1 and a new File called E is added.  Files A, C & D stay the same.
  • When you run the backup on Day 2, it’ll backup just the 2 changed files and the backup will take you 20mins
  • One Day 3, let’s say File B changes again and becomes B2.  File C also changes to C1 and File D gets deleted.
  • When you run the backup on Day 3, it’ll backup 3 files (B2, C1, and E) and it’ll take you 30 mins.  Why did we backup E?  Remember a differential backup picks up everything that changed since the full backup.
Demystifying Data Backups - Differential backups

Let’s see what happens when you restore.  When you restore data, just like in the previous case – you’ll need to restore the Full backup first – and then layer in each incremental backup on top of that – in order. 

If you wish to get back the latest copy of data, it’ll take us 80 mins to restore – that 40 + 20 + 20.  So, not great from a restore standpoint. 

So, while we have been able to improve backup speeds progressively, we have traded off restore times in each of the above cases.

It is for this reason that traditional backup strategies recommend doing a full backup at frequent points in time – weekly, monthly, quarterly, yearly etc.  The idea is to ensure that you’re able to keep restore times in check.  If you’re able to start the restore from a recent full backup, then the number of subsequent backups to restore and overlay on top of it are limited – thus saving time.

But modern, enterprise class backup technology has progressed further than this – and will allow you to have the best of both worlds.  Fast incremental backups and fast restores.  The secret is something called cataloging. 

Incremental backups

But if you think about it, differential backups have the potential to keep getting bigger and take longer and longer each day.  After all, they’re backing up all changes since the full backup.  So, there could come a point where a daily differential backup is taking as much time as a full backup (or perhaps more).

Enter the next innovation – incremental backups.  Incremental backups only backup what was changed since the last backup.  Sounds efficient right?

Let’s look at this with the same example:

So, you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1 and new File called E is added. 
  • When you run the backup on Day 2, it’ll backup just the 2 changed files – and it’ll take you 20 mins
  • One Day 3, let’s say File B changes again to B2.  File C also changes to C1 and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup just the 2 files again (B2 and C1) (D is removed – remember?) and it’ll take you 20 mins

So, definitely an improvement over differential backups.

Demystifying Data Backups - Incremental backups

Let’s see what happens when you restore.  When you restore data, just like in the previous case – you’ll need to restore the Full backup first – and then layer in each incremental backup on top of that – in order. 

If you wish to get back the latest copy of data, it’ll take us 80 mins to restore – that 40 + 20 + 20.  So, not great from a restore standpoint. 

So, while we have been able to improve backup speeds progressively, we have traded off restore times in each of the above cases.

It is for this reason that traditional backup strategies recommend doing a full backup at frequent points in time – weekly, monthly, quarterly, yearly etc.  The idea is to ensure that you’re able to keep restore times in check.  If you’re able to start the restore from a recent full backup, then the number of subsequent backups to restore and overlay on top of it are limited – thus saving time.

But modern, enterprise class backup technology has progressed further than this – and will allow you to have the best of both worlds.  Fast incremental backups and fast restores.  The secret is something called cataloging. 

Modern enterprise class backup technology has progressed further and now offers best of both the worlds – Fast incremental backups and fast restores.  The secret is something called Cataloging

The Magic of Cataloging

Cataloging is a meta-data – i.e. data about your data.  In each of the above cases, our backups were self-describing.  There is no additional information required to restore from any of those backups.

But meta-data describes data – it can contain interesting information about file versions, about locations on media files are kept etc. – which can dramatically improve restore performance.

Let’s try the same example – this time with cataloging.

So, you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1 and a new File called E is added. 
  • When you run the backup on Day 2, it’ll backup just the 2 changed files – and it’ll take you 20 mins
  • One Day 3, let’s say File B changes again to B2.  File C also changes to C1, and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup just the 2 files again (B2 and C1) (D is removed – remember?) and it’ll take you 20 mins

Now, when you restore, the catalog will supply you the latest version of each file automatically – So A, B2, C1 and E.   In 40 minutes.  And you’ll not even bother bringing back D because the catalog knows it was deleted.

The secret?  Cataloging.  Modern backup software keeps a meta-data catalog which remembers which version of which file is present in which backup – and allows a smart way of bringing back just the data that you need on a restore.  This way – one doesn’t have to start with a full backup and layer each incremental on top.  You can get the latest versions of all files in a single go.

When you have the strength of data cataloging working for you – you don’t need to run full backups over and over again.  You can run incremental backups forever.

Demystifying Data Backups - The Magic of Cataloging

When you have the strength of data cataloging working for you – you don’t need to run full backups over and over again. You can run incremental backups forever.

Synthesized Full backups

A number of backup solutions now also offer a synthesized full backup.  This is usually meant to satisfy archaic backup policies (that are still extant) which dictate that one should have a full backup available each week/month/year etc. 

Rather than take the hit of running a full backup each week, month, or year – which is technically unnecessary – modern backup software offers to “syntheisize a full” backup for you.  It is the equivalent of running a restore of all your latest file versions – but rather than actually restore the data, it re-records the meta-data as if to show that these files got backed up again.  It is a neat trick which doesn’t require any data movement – but simply adds/updates meta-data records.

Demystifying Data Backups - Synthesized Full backups

But what of real changes that may have occurred on the data source since the last incremental backup?  Not a problem – just couple the synthesized full with an incremental backup that runs just  before the synthesized full, and you’re good to go.

Know your backups

Backups are a necessary part of any good IT Administrator’s toolkit.  Having a solid backup strategy makes life a lot simpler and allows Administrators to focus on things that need their attention instead of having to be in “crisis” mode all the time.  Understanding how they work makes an IT administrator job that much easier.

Thoughts?  Let us know – we always love to hear from readers.

Hopefully, we’ve simplified a few backup and restore concepts as part of this blog.  Please write to us at info@parablu.com with questions or opinions.  We’d love to hear from you.

Featured Image - Backups as a defense against Ransomware-as-a-Service (1)

Webinar: Backups as a defense against Ransomware-as-a-Service

Speaker: Anand Prahlad, President and CEO, Parablu
August 11, 2021, Wednesday | 11:30 AM - 12:00 PM IST

Register to the webinar