Planning Your Data Protection

SHIELD can perform PostgreSQL backups in a few different ways, depending on your network throughput, access control, and personal preferences.

Local Agent

The simplest method is to run the SHIELD agent on the same box as the postmaster process, and use the local PostgreSQL socket to connect to the database server.

If you want to protect more than one database host, you can install the agent on each. They will all then register with the SHIELD instance, with different identities, and you can configure backup jobs for each of them.

Remote Agent

You can also run the SHIELD agent somewhere else, and configure your backup jobs to connect to the PostgreSQL host(s) over TCP, using either their public IPs, or (if you have configured them) internal Linode IPs.

This can be useful if you'd rather not load additional software on your database host(s). You can also reuse the external agent to protect multiple PostgreSQL instances, since your backup jobs will specify the IP address of each.

Setting up a Dedicated PostgreSQL DB User (optional)

While you can get by using either the superuser account (especially over the local UNIX domain socket) or the same account you use from your applications, you may want to consider setting up a specific, less privileged database user for performing backups via SHIELD.

To create such a user, you can use the following SQL commands, from the psql terminal shell:

root@localhost:~# psql
psql (12.2 (Debian 12.2-2.pgdg100+1))
Type "help" for help.

postgres=# CREATE ROLE backups WITH PASSWORD 'its-a-sekrit';
CREATE ROLE
postgres=# GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA public TO backups;
GRANT

The above privileges are representative of normal use cases. Your mileage may vary, and you may need to GRANT additional privileges to suit your particular configuration.

System Prerequisites

SHIELD uses native database tooling to perform backup and restore operations against PostgreSQL servers. For this to work, you will need to install the following tools, matched to the version of PostgreSQL you are running:

  1. pg_dump for dumping structure and data from a single database.
  2. pg_dumpall for enumerating all databases and then dumping their structure and data via pg_dump.
  3. psql for replaying the dumped SQL commands to the server, upon restore.

Configuring Backup Jobs

With PostgreSQL, you can constrain how much or how little of the full set of databases you protect. For most use cases, you'll want to backup all of the databases, but you can limit your export to a single database.

It's entirely up to you.

Authentication

Three parameters govern the authentication to your PostgreSQL server: PostgreSQL Host, PostgreSQL Username, and PostgreSQL Password.

You are encouraged to set up a dedicated user to perform backup / restore tasks as. It helps with access control, compliance with least-privilege policies, and auditing.

Backing up All Databases

The default configuration of the PostgreSQL backup plugin is to protect all databases found, and include all of the tables, their structure, and data, in each snapshot.

You dont' have to do anything — just make sure to leave the Database to Backup parameter blank (its default value)!

Backing up a Single Database

If you only want to back up a single database, specify the name of that database in the Database to Backup parameter when you configure your data system. This will cause the PostgreSQL plugin to only emit CREATE TABLE and INSERT INTO statements for that particular database.

Note: this parameter has no bearing on restores; SHIELD will restore all databases found in the snapshot archive, regardless of what you set for Database to Backup.

Restoring PostgreSQL from a Snapshot

PostgreSQL SHIELD snapshots consist of plain SQL statements for recreating and repopulating databases and tables. These are replayed during restore operations by the psql utility.

It is worth noting that psql will run whatever SQL statements are in the snapshot! If you run a full backup, including all databases, and then reconfigure the data system to specify a single database, the archive created before the reconfiguration will still have all databases — more importantly, it will have all of the accompanying DROP TABLE statements.

Performance Notes

Extracting all of the data out of busy PostgreSQL server can severly impact responsiveness, availability, and fitness of the database host if you are not careful. Most of the point of having a relational database is to provide accurate and fast sub-selection of the whole data set, and PostgreSQL is tuned to handle this workload.

If running backups against your primary database server proves to be too detrimental, you may want to spin up a read replica, and run your backup operations against that. SHIELD has specific support for this approach.

With a read replica, one host acts as the primary database server; it processes writes to the database (INSERT, UPDATE, and DELETE queries). For every change to the underlying data, the primary database writes a record to its binlog, which the read replica reads constantly. When new binlog entries appear, the replica makes the appropriate changes in a copy of the data set.

In a data protection setting, this allows you to point SHIELD at the read replica for backup operations, and the primary database server for restores. If the read replica experiences slowdowns due to the load of performing a snapshot, database clients won't be impacted. Instead, replication lags a little, and when the backup job finishes, the replica will catch back up.

To enable this, you'll need to ensure that the same user credentials work against both your primary database server and your read replica, and then set the PostgreSQL Read Replica parameter to the hostname / IP, and TCP port of the read replica.

Configuration Reference

This section details all available configuration parameters for the PostgreSQL data protection plugin.

PostgreSQL Host
The hostname or IP address of your PostgreSQL server.
PostgreSQL Username
Username to authenticate to PostgreSQL as.
PostgreSQL Password
Password to authenticate to PostgreSQL as.
Database to Backup
Limit scope of the backup to include only this database. By default, all databases will be backed up.
Additional pg_dump options
You can tune pgdump (which performs the backup) by specifying additional options and command-line arguments.
PostgreSQL Read Replica
An optional PostgreSQL replica (possibly readonly) to use for backups, instead of the canonical host. Restore operations will still be conducted against the real database host.