Sunday 20 May 2018

GitLab – Recovering from a corrupted database record

Introduction

One of the most terrifying thing for a Database administrator to come across is a corrupted database.
A database corruption can be defined as a problem associated with the improper storage of the 0’s and 1’s that you need to store on the disk in order to access your data.
Despite the usefulness of the relational databases, they are prone to corruption, which results in the inaccessibility of some or all the data in the database.
More than 95% of corruption happens due to hardware failure. Among the remaining 5% we have,
  • Bugs in software itself
  • Abrupt system shutdown while the database is opened for writing or reading.
  • Changes in SQL account
  • Virus infection
  • Upgrading software also at times, results in the corruption of the database
This blog addresses a similar issue associated with corrupted databases and how we identified and solved the problem.

Problem understanding

Our organization has an on-premise GitLab server. We have configured GitLab to take daily backups, but due to database corruption the backups started failing with the following error.
The plan was to move the GitLab installation to a new server but since we could not take a backup of our existing installation, we were stuck.

Next Steps

In case of database corruption, usually the loss is limited to the last action of one user, i.e. a single change to data. When a user starts to change data and the change is interrupted — for example, because of network service loss or any other reason, then the database file is marked as corrupted. The file can be repaired, but sometimes we might lose data after repair.
We need to be extremely careful with the live instance and the live data. Experimentation should be done on a virtual machine before performing any actions on the live instance.
The following steps were taken by us,
  1. Virtual environment setup
  2. Identify corrupted record(s)
  3. Database backup without corrupted record(s)
  4. Import database on virtual machine
  5. Fix database relationship and constraints
  6. Export database from virtual environment
  7. Import database on live instance
  8. Upgrade GitLab
  9. Finally run tests

Virtual environment setup

Generally, having a virtual environment gives us the freedom to experiment. Its a good practice to run trials on a virtual machine until you are satisfied with the results. This gives us the confidence that things will work as expected on live instance too.
Now lets create a virtual environment of the live instance. Only the applications which are directly or indirectly related to your database should be installed.
  • Install Vagrant
  • Install Debian Stretch
  • Install PostgreSQL
  • Install phpPgAdmin (Optional) – Web-based administration tool for PostgreSQL
GitLab uses PostgreSQL database.

Install Vagrant

Go to Vagrant’s official site and download the deb file for Debian.
You can find the link here.

Install Debian stretch64

Document on how to install Debian Stretch can be found here.

Install PostgreSQL

Install phpPgAdmin

This step is optional, use any of your favorite PostgreSQL GUI tools.

Identify corrupted record(s)

Login into the live instance.
Become a gitlab-psql user:
Connect to the PostgreSQL CLI:
gitlabhq_production  is the default database.
If you already know about the records which are corrupted then you can skip this step. Otherwise write a small script which selects all the records from each table, select query will fail if the table has corrupted data.
Note that we already knew that the corrupted records belong to merge_request_diffs  table by looking at the backup command error message.
For example if the corrupted record is in the merge_request_diffs table, then we can write a small script to identify the row which is corrupted:
The above script basically select all the rows one by one, but will fail to select the row which is corrupted and will print the row index. Assume that selection failed at index 2045, this tells us that the record at index2045 is corrupted.
Repeat this process for all the tables if you don’t know exactly which table has corrupted records.

Database backup without corrupted record(s)

We were fine with losing these corrupted records, so the next step would be to take a backup without those records.
Select the record, and get it’s row / tuple ID.
Note that if all the columns were selected the query was failing, probably because the corrupted data is in one of the columns.
base_commit_sha – 8250ee35a6a8d0cd60b5056c7ac94736a048e7a5
head_commit_sha – b3f08056e93f281c2905f6185395787ae7ffaada
merge_request_id – 1091
id – 1933
Create a copy of merge_request_diffs table without the corrupted record i.e. skip id 1933.
This will create a copy of  merge_request_diffs  table without the corrupted record.
The primary key and the foreign key of  merge_request_diffs are not copied to merge_request_diffs_copy.
Take the backup of the entire database without table merge_request_diffs. Backups will fail if there are corrupted records.
The above command will skip  merge_request_diffs  table while taking backup.

Import database on virtual machine

We will setup the backup we took from the previous step on the Vagrant machine.

Create a test database

Create a new database to fix the issues on our vagrant machine.

GitLab owner

Create a PostgreSQL role as  gitlab  to import the  gitlab  database.
Since the ownership of all the tables is given to  gitlab  role in the backup which was taken earlier.

Install  pg_trgm  extension

GitLab uses pg_trgm  few extensions which are not installed by default.
Run the following command to install  pg_trgm  extension:

Import data to  gitlab_test

Use  psql  command to import from the dump.
Note that  git_lab_db_dump  is the name of the file which contains the database backup from the live instance.

Fix database relationship and constraints

Fixes to database should happen on the vagrant machine.

Get constraints of corrupted table

Go back to the GitLab server and get the table schema of the corrupted table.
Use the below command to get the table schema:
Sample schema is shown below:
Make sure that  merge_request_diffs_copy  table has the same constraints as the originalmerge_request_diffs  table.
Use the above queries to add indexes and foreign keys.
Be very careful with the queries, especially with SEQUENCES,
START WITH 11421  keyword determines what should be the auto increment primary key value for the next record which will be inserted.

Rename table name

Rename the table name from  merge_request_diffs_copy  to merge_request_diffs.
So now, we have the database backup without any corrupted records.

Testing

Install the same version of GitLab on vagrant machine and use  gitlab_test  database.
Trigger the backup script and confirm that the backups are now being taken properly.

Export database from virtual environment

Take the backup from your vagrant instance

Import database on live instance

Drop the database from live server

Create  gitlabhq_production  database again on live server

Install pg_trgm extension

GitLab uses  pg_trgm  extension which will not be installed by default with PostgreSQL.

Import the data to  gitlabhq_production

Use  psql  command to import all the records from the dump into  gitlabhq_production  database.

Upgrade GitLab

You can find document here on how to update GitLab.
Note that the upgrade process will automatically run the migrations over the database unless you decide to do it manually.

Finally run tests

After a successful upgrade of GitLab server we should run few tests on it to verify that everything works fine.
Verify that the server is up and running.
Confirm if the database integration is done properly.
Test and verify if the database backup script is able to take backups properly.

References

References were taken from the following blogs, forum posts and Wiki’s.

No comments:

Post a Comment

Voice of the Customer (VOC) – Dynamics 365 CRM

Tags:   Dynamics 365   Dynamics CRM   feedback   survey   VOC   Voice of customer Introduction To Voice of the Customer For any bus...