Demystifying Data Backups

Types of Backup: Full, Differential, and Incremental Backup

Types of backup

What is Backup?

Ask a non-tech person what a data backup is – and they’ll probably answer that it is simply a second copy of data you make for safety.  And they would be right!  There isn’t much more to backup than the fact that it is a redundant copy of data – that you can rely on, should you lose your original copy.

But different types of backup methods have gotten so convoluted and complex in today’s world – it would seem you need a post-graduate degree in Computer Science to understand what they do and how they work.  In this blog post, we’ll try to de-mystify the types of backup and restore and lay them bare – so we can all easily understand what happens behind the scenes.

Backup techniques have evolved over time and become increasingly sophisticated (and perhaps complex as a result).

Evolution of backup

Backup techniques have evolved over time and become increasingly sophisticated (and perhaps complex as a result).  Considerations such as time taken for backup, time taken for restores, storage costs, network bandwidth savings, etc. – have all, over time, driven innovations that have been designed to make backups better – but also increase complexity as a result.

 

How many types of Backup?

There are mainly three types of backup are there: Full backup, differential backup, and incremental backup. Let’s take a look at each types of backup and its respective pros and cons.

Full backup

I am quite sure everybody who’s reading this blog has heard of full backups.  They are the simplest form of backup and the easiest to understand.  Full backup essentially makes a backup of everything you wish to protect every time.  So, all files, objects, bytes however you wish to measure your data – every one of them is copied over to a secondary storage target each time.  If you perform a full backup once a day – then everything is copied over once a day.

Let’s take an example – say you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup.

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1, and a new File called E is added.   A-C & D remain the same.  
  • hen you run the backup on Day 2, it’ll backup all 5 files and it’ll take you 50mins
  • One Day 3, let’s say File B changes again and becomes B2.  File C also changes to C1, and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup 4 files again (D is removed – remember?) and it’ll take you 40 mins.
Types of backup- Full backup

When you restore, you will most likely get data from the latest backup and it’ll take you 40 mins to restore.

Simple enough?

Differential backup

Full backups, as you can see take time.  40 minutes – 50 minutes each day as in our example. The next optimization the industry made was a differential backup. Differential backup makes a copy of files that have changed since the full backup.

Let’s take the same example – say you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1, and a new File called E is added.  Files A, C & D stay the same.
  • When you run the backup on Day 2, it’ll backup just the 2 changed files and the backup will take you 20mins
  • One Day 3, let’s say File B changes again and becomes B2.  File C also changes to C1 and File D gets deleted.
  • When you run the backup on Day 3, it’ll backup 3 files (B2, C1, and E) and it’ll take you 30 mins.  Why did we backup E?  Remember a differential backup picks up everything that changed since the full backup.
Types of backup - Differential backup

Let’s see what happens when you restore.  When you restore data, just like in the previous case – you’ll need to restore the Full backup first – and then layer in each incremental backup on top of that – in order.

If you wish to get back the latest copy of data, it’ll take us 80 mins to restore – that 40 + 20 + 20.  So, not great from a restore standpoint.

So, while we have been able to improve backup speeds progressively, we have traded off restore times in each of the above cases.

It is for this reason that traditional types of backup strategies recommend doing a full backup at frequent points in time – weekly, monthly, quarterly, yearly, etc.  The idea is to ensure that you’re able to keep restore times in check.  If you’re able to start the restore from a recent full backup, then the number of subsequent backups to restore and overlay on top of it are limited – thus saving time.

But modern, enterprise-class backup technology has progressed further than this – and will allow you to have the best of both worlds.  Fast incremental backups and fast restores.  The secret is something called cataloging.

Incremental backup

But if you think about it, differential backup has the potential to keep getting bigger and take longer and longer each day.  After all, they’re backing up all changes since the full backup.  So, there could come a point where a daily differential backup is taking as much time as a full backup (or perhaps more).

Enter the next innovation – incremental backup.  Incremental backup only backup what was changed since the last backup.  Sounds efficient right?

Let’s look at this with the same example:

So, you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1, and a new File called E is added. 
  • When you run the backup on Day 2, it’ll backup just the 2 changed files – and it’ll take you 20 mins
  • One Day 3, let’s say File B changes again to B2.  File C also changes to C1 and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup just the 2 files again (B2 and C1) (D is removed – remember?) and it’ll take you 20 mins

So, definitely an improvement over differential backup.

Types of Backup - Incremental backup

Let’s see what happens when you restore.  When you restore data, just like in the previous case – you’ll need to restore the Full backup first – and then layer in each incremental backup on top of that – in order.

If you wish to get back the latest copy of data, it’ll take us 80 mins to restore – that 40 + 20 + 20.  So, not great from a restore standpoint.

So, while we have been able to improve backup speeds progressively, we have traded off restore times in each of the above cases.

It is for this reason that traditional types of backup strategies recommend doing a full backup at frequent points in time – weekly, monthly, quarterly, yearly, etc.  The idea is to ensure that you’re able to keep restore times in check.  If you’re able to start the restore from a recent full backup, then the number of subsequent backups to restore and overlay on top of it are limited – thus saving time.

But modern, enterprise-class backup technology has progressed further than this – and will allow you to have the best of both worlds.  Fast incremental backups and fast restores.  The secret is something called cataloging.

Modern enterprise class backup technology has progressed further and now offers best of both the worlds – Fast incremental backups and fast restores.  The secret is something called Cataloging

The Magic of Cataloging

Cataloging is a meta-data – i.e. data about your data.  In each of the above cases, our backups were self-describing.  There is no additional information required to restore from any of those backups.

But meta-data describes data – it can contain interesting information about file versions, about locations on media files, are kept, etc. – which can dramatically improve restore performance.

Let’s try the same example – this time with cataloging.

So, you have 4 files A, B, C, and D.  And let’s say each of them is about 1GB each and each of them takes 10 mins to backup

  • On Day 1 – you’ll backup  4GB and it’ll take you 40 mins
  • On Day 2, let’s say File B changes to B1 and a new File called E is added. 
  • When you run the backup on Day 2, it’ll backup just the 2 changed files – and it’ll take you 20 mins
  • One Day 3, let’s say File B changes again to B2.  File C also changes to C1, and File D gets deleted. 
  • When you run the backup on Day 3, it’ll backup just the 2 files again (B2 and C1) (D is removed – remember?) and it’ll take you 20 mins

Now, when you restore, the catalog will supply you the latest version of each file automatically – So A, B2, C1 and E.   In 40 minutes.  And you’ll not even bother bringing back D because the catalog knows it was deleted.

The secret?  Cataloging.  Modern backup software keeps a meta-data catalog which remembers which version of which file is present in which backup – and allows a smart way of bringing back just the data that you need on a restore.  This way – one doesn’t have to start with a full backup and layer each incremental on top.  You can get the latest versions of all files in a single go.

When you have the strength of data cataloging working for you – you don’t need to run full backup over and over again.  You can run incremental backup forever.

Types of backup - Cataloging

When you have the strength of data cataloging working for you – you don’t need to run full backups over and over again. You can run incremental backups forever.

Synthetic Full backup

A number of backup solutions now also offer a synthesized full backup.  This is usually meant to satisfy archaic backup policies (that are still extant) which dictate that one should have a full backup available each week/month/year etc.

Rather than take the hit of running a full backup each week, month, or year – which is technically unnecessary – modern backup software offers to “synthesize a full” backup for you.  It is the equivalent of running a restore of all your latest file versions – but rather than actually restore the data, it re-records the meta-data as if to show that these files got backed up again.  It is a neat trick that doesn’t require any data movement – but simply adds/updates meta-data records.

Types of backup - Synthetic full backup

But what of real changes that may have occurred on the data source since the last incremental backup?  Not a problem – just couple the synthesized full with an incremental backup that runs just  before the synthesized full, and you’re good to go.

Know your backups

Backups are a necessary part of any good IT Administrator’s toolkit.  Having a solid backup strategy makes life a lot simpler and allows Administrators to focus on things that need their attention instead of having to be in “crisis” mode all the time.  Understanding how they work makes an IT administrator’s job that much easier.

Thoughts?  Let us know – we always love to hear from readers.

Hopefully, we’ve simplified a few backup and restore concepts as part of this blog.  Please write to us at info@parablu.com with questions or opinions.  We’d love to hear from you.