Archiving

Using WAL-G and pg_dump for archiving and restoration

WAL-G

WAL-G **** is a simple and efficient archival restoration tool for PostgreSQL that simplifies Continuous Archiving and Point-in-Time Recovery (PITR) and can store backups in S3, Google Cloud Storage, Azure, Swift, remote host (via SSH) or local file system.

pg_basebackup can also be used to take base backups of a running PostgreSQL database cluster, but we are using WAL-G.

In order to set up continuous backups, you need to do the following:

  1. Configure wal-g access to external storage.

  2. Enable archiving of logs in Postgres and write the archive command:

    archive_command = 'wal-g wal-push %p'

  3. Make a base backup

    wal-g backup-push $PGDATA

    You can always restore the database from base backup if your database is corrupted.

  4. Use cron jobs to schedule backup process (daily, weekly, etc.).

  5. To start rotating logs schedule this command using cron job:

    wal-g delete retain FULL 30

    This will delete all but 30 latest backups.

Backups integrity:

WAL-G includes "wal-verify" command that checks backup integrity.

You can use cron, Cronjob to schedule "wal-verify" execution.

Pick the JSON output of the command. The format is explained here.

Aidboxdb recovery:

  1. Configure WAL-G access to external storage

  2. Download backup wal-g backup-pull $PGDATA

  3. Configure the wal-g wal-fetch restore command

  4. Start Postgres

Postgres downloads all the missing logs and reads them on start.

In replica mode Postgres operates in "read-only" mode, continues to receive WAL logs and lags minimum one WAL file behind main instance.

Incremental backups

You can configure incremental backups with env variable WALG_DELTA_MAX_STEPS.

So the backup will be faster, but the recovery process will take longer.

What WAL-G doesn't work for?

  • WAL-G is not a replacement of pg_dump. pg_dump is used to create a logical dump of one or several DBs in cluster.

  • WAL-G doesn't schedule backup automatically.

pg_dump

pg_dump is a utility for backing up a PostgreSQL database.

It makes consistent backups even if the database is being used concurrently. pg_dump only dumps a single database. To back up an entire cluster, or to back up global objects that are common to all databases in a cluster, use pg_dumpall

Dumps can be output in SQL script or archive file formats. Script dumps are plain-text files containing SQL commands required to reconstruct the database to the state it was in at the time pg_dump was called.

To restore a database feed the dump file to psql.

The alternative archive file formats must be used with pg_restore **** to restore the database.

pg_dump works best for small databases (< 5 GB) and assumes downtime.

Last updated

Change request #2416: