When the word “source code” is mentioned, it is always worth remembering about backup – it is the final line of any event of failure, including outages, human errors, or ransomware. Even if the DevOps team works and stores their projects on GitLab – a popular git hosting platform, they should clearly figure it out: git itself is not a backup and it can’t be a reliable solution to meet their shared responsibility and compliance requirements.
Who is responsible for what?
GitLab as any other cloud hosting provider operates in accordance with the Shared Responsibility Model. What does it mean? GitLab is responsible for keeping its users’ infrastructure secure. So to say, GitLab should repair errors, handle server-side software and hardware failures, or infrastructure recovery after service outages. Though, it doesn’t mean that the user’s repository and metadata will be easily accessible during the disaster, as the security of the data is on the customer’s shoulders. Thus, even the git hosting provider advises having a backup in place:
“GitLab doesn’t backup items that aren’t stored on the file system. If you’re using object storage, be sure to enable backups with your object storage provider, if desired.”
GitLab backup as a key to meeting Compliance
Every enterprise wishes to be sure that its intellectual property is safe and sound. What is the best way to show it without any words? Compliance with global security standards, like SOC 2 and ISO27K. It will bring them a “tag” of a highly trusted company and enhance its credibility.
So, one faces the security audit which usually includes proof of security, availability, confidentiality, privacy, and processing integrity. And, backup is that “key” which can help the organization handle that issue.
GitLab Backup Methods to consider
Trying to handle the necessity to back up their data every company can follow two ways – make their own backup or use third-party backup software, like GitProtect. Both of the paths can lead to easy data recoverability but there are some points that should be mentioned in any of these variants.
Path # 1 – Self-managed backup
There are a few options to perform a self-managed backup of the given GitLab environment, though all of them are manual and need time and effort to be performed. There is a possibility to use a built-in component GitLab provides, or have a self-written script.
GitLab permits its customers to use Rake tasks to back up and restore GitLab instances. Following this way, it is possible to get an archive file of the GitLab environment. Though, it can only be restored to exactly the same GitLab version or type and as a whole pack. It may take some time and interrupt the developing process for a while.
Another option is a self-written script. Again, by choosing this variant, the company will be responsible for performing backups – all the infrastructure, processes, frequency of backups, and maintenance. Moreover, the company will need to test those backups, as the main reason to have a backup in place is the possibility to recover the data. And here comes the question: Can data be recovered fast without interrupting the company’s business continuity? Yes, if the management of the company foresees a situation like that and delegates its DevOps to prepare a recovery script in advance. Though, implementation of that script will take time as all the data will be recovered in a bulk.
Both these options may seem cost-effective, however, in a long-term perspective it can turn out to cost a company a fortune, as there will be a necessity to delegate a member of its DevOps team to write scripts to perform backups or archive the repos on a daily basis distracting the employee from his core duties. And what if the company seeks to meet its legal, compliance, or shared responsibility requirements? How many backups will he have to perform? The questions are open…
Path # 2: Third-party backup software
This way may seem expensive from the early beginning, though in a long-term perspective the automation of a backup plan performance that backup companies usually accommodate can seriously reduce costs and time the DevOps team spends on GitLab backup, permitting them to focus on their core duties. Despite automation, it provides backup features that increase a company’s data resistance to any failure – “so-called” backup best practices, which include unlimited retention, encryption, replication, ransomware-proof solution, and other turnkey features.
Another advantage is recovery. Backup vendors, like GitProtect, usually foresee different scenarios, and, consequently, guarantee that data recoverability is fast and doesn’t interrupt the development process. They reach it via different recovery methods, like granular recovery of only selected data, point-in-time restore, data migration between platforms (i.e. from GitLab to GitHub or Bitbucket), possibility to restore to the same or new GitLab account, or to the local device, between SaaS and self-hosted accounts and many more. Actually, software like this ensures data recoverability in case of any potential data loss or downtime scenario.
GitLab backup best practices
The main task of backup is to bring serenity to the company and help organize its work smoothly so that its team can work continuously and uninterruptedly without any threat. Thus, building a reliable backup strategy is essential. Though, what should be included in a reliable backup plan, no matter whether it is self-managed or a backup vendor solution?
First, but not the last, metadata. A perfect GitLab backup should contain not only repositories but also metadata, including Wikis, deployment keys, labels, issues, LFS, etc.
Then, there should be advanced protection features that can help to resist ransomware attacks. Thus, it should include encryption in-flight and at rest, compression, WORM-compliant storage, password manager, and keeping copies in unexecutable form. In this case, even if the company’s backed-up data is hit by any malicious actor, he won’t be able to execute and spread it on the storage.
Another valuable feature is long-term retention. In general, there is no git hosting provider that would offer long-term retention – GitLab usually stores the user’s data for up to 90 days. Thus, to meet legal and security compliance requirements, some companies need to have the possibility to store their backups for much longer periods of time – even years to archive old unused repositories for future reference.
Storage is another aspect that is worth paying attention to. Achieving a really secure backup plan is possible when the enterprise follows the 3-2-1 backup standard – the situation when it has at least 3 copies in 2 different destinations, and one of them is offsite storage. Thus, the possibility to assign multiple storages with a backup replication option is reasonable.
Multi-storage opportunities lead to storage capacity and the necessity somehow reduces the amount of information within the storage. How to achieve it? Compression is the answer.
Information is what rules the world. The source code the company develops is unique information that brings that company money. Thus, its proper protection and the monitoring of its security is the responsibility of every company and the way it chooses for that reason is absolutely optional. What the enterprise should keep in mind is the possibility to recover the data: in the best scenario – when its backup works out – it wins, in the worst one – delay or impossibility to recover the data – the enterprise can lose time, budget, and reputation.