Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS


inchirags@gmail.com Chirag's PostgreSQL DBA Tutorial https://www.chirags.in
*********************************************************************************************************
Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS
*********************************************************************************************************
Static IP addresses for all nodes.
+------------+------------------+---------------+
| Role | IP Address | Hostname |
+------------+------------------+---------------+
| Primary DB | 192.168.224.128 | node1 |
+------------+------------------+---------------+
| Replica DB | 192.168.224.129 | node2 |
+------------+------------------+---------------+
What Is repmgr?
Repmgr is an open source suite for managing PostgreSQL HA clusters. It closely integrates with PostgreSQL to configure a primary node, clone replica nodes, monitor the HA cluster, and perform a failover. Repmgr supports a single read-write primary server and one or more read-only standby nodes, also known as replicas. This guide focuses on repmgr because it is an efficient and robust solution with long-standing popularity in the database management area.
The complete list of steps required to provision PostgreSQL and repmgr follows this sequence:
1. Install PostgreSQL on both nodes.
2. Access and secure PostgreSQL on both nodes.
3. Install repmgr on both nodes.
4. Enable SSH connectivity between the nodes.
5. Create a repmgr user on the primary node.
6. Configure the database replication settings in the PostgreSQL configuration file on the primary.
7. Configure the PostgreSQL authentication settings on the primary.
8. Configure the repmgr HA cluster settings on both nodes.
9. Register the primary server.
10. Clone and register the standby server.
Steps:
1. Install PostgreSQL on both nodes.
Ensure the server is up to date. Use the apt update command to install any updates. Reboot the server if necessary.
All Nodes as sudo User:
sudo apt update
sudo apt upgrade -y
Install PostgreSQL:
All Nodes as sudo User:
sudo apt -y install postgresql postgresql-contrib
Ensure the PostgreSQL service starts automatically.
sudo systemctl start postgresql
sudo systemctl enable postgresql
Check the status:
sudo systemctl status postgresql
2. Access and secure PostgreSQL on both nodes.
Change the password for the postgres Linux account:
All Nodes as sudo User:
sudo passwd postgres
admin@123
Choose a unique strong password and store it in a secure location.
Output:
passwd: password updated successfully
Switch to the postgres user to access PostgreSQL Prompt:
All Nodes as sudo User:
sudo -u postgres psql
SELECT version();
Output:
---------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.9 (Ubuntu 16.9-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
(1 row)
Add a password for the postgres database account using the following command.
ALTER USER postgres WITH PASSWORD 'admin@123';
Output:
ALTER ROLE
Exit:
\q
3. Install repmgr on both nodes.
All Nodes as sudo User:
sudo apt-get install -y postgresql-16-repmgr
All Nodes as sudo User:
sudo systemctl enable repmgrd
sudo systemctl start repmgrd
4. Enable SSH connectivity between the nodes.
On the primary node, switch to the postgres user:
**Primary Node as sudo User:
su - postgres
Generate an SSH key:
Primary Node as postgres User:
ssh-keygen -t rsa -b 4096
Copy the key to each standby node in the HA cluster. In the following example, replace 192.168.224.129 with the IP node of your standby node:
Primary Node as postgres User:
ssh-copy-id postgres@192.168.224.129
SSH can now be used to access a standby node without a password from the primary. The following command, when run from the postgres account on the primary, places the user in the postgres user directory on the standby:
Primary Node as postgres User:
ssh 192.168.224.129
When done, type exit to log out of the secondary server.
exit
Repeat these steps to create a key on each standby node share it with the primary:
Standby Node as sudo User:
su - postgres
Standby Node as postgres User:
ssh-keygen -t rsa -b 4096
Standby Node as postgres User:
ssh-copy-id postgres@192.168.224.128
Output:
Number of key(s) added: 1
Now try logging into the machine
ssh postgres@192.168.224.128
and check to make sure that only the key(s) you wanted were added.
-----------snapshot---------
5. Create a repmgr user on the primary node (192.168.224.128).
To allow repmgr to manage PostgreSQL data replication, create a repmgr user on the primary server. Then create a new database for the repmgr data. The commands in this section must only be executed on the primary server. Do not create any database entries on the standby because this interferes with replication.
While logged in as the postgres account, create the repmgr user:
Primary Node as postgres User:
createuser -s repmgr
Now create the repmgr database, with the repmgr user as the owner:
Primary Node as postgres User:
createdb repmgr -O repmgr
6. Configure the database replication settings in the PostgreSQL configuration file on the primary.
To configure the replication settings in postgresql.conf, follow these steps. This file must only be changed on the primary node. Repmgr copies it to the standby nodes in a later configuration stage.
Edit the /etc/postgresql/16/main/postgresql.conf file as the postgres user:
Primary Node as postgres User:
nano /etc/postgresql/16/main/postgresql.conf
Change the settings in the following file sample to adjust the replication settings. In some cases, these lines only have to be uncommented. These lines are found in non-contiguous locations in the file. Use the text editor search utility (CTRL+W in nano)to find them.
File: /etc/postgresql/16/main/postgresql.conf
listen_addresses = '*'
shared_preload_libraries = 'repmgr'
wal_level = replica
archive_mode = on
archive_command = '/bin/true'
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
wal_log_hints = on
Note:
It is easiest to set listen_addresses to *. However, in some networks this might pose additional security concerns. For extra security, set this value to a comma-separated list consisting of localhost and the IP addresses of all nodes in the HA cluster.
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
7. Configure the PostgreSQL authentication settings on the primary.
Users must also add client authentication capabilities to the pg_hba.conf file. This file tells PostgreSQL what type of connections to trust and how to authenticate them. Add entries to trust repmgr connections from both the primary and standby servers. Edit this file on the primary server only.
Open the /etc/postgresql/16/main/pg_hba.conf file for editing as the postgres user:
Primary Node as postgres User:
nano /etc/postgresql/16/main/pg_hba.conf
Add the below details:
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.224.128/32 trust
host replication repmgr 192.168.224.129/32 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.224.128/32 trust
host repmgr repmgr 192.168.224.129/32 trust
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
Restart the PostgreSQL process on the primary using an account with sudo privileges:
Primary Node as sudo User:
sudo systemctl restart postgresql
Ensure there are no error messages and PostgreSQL is still active:
Primary Node as sudo User:
sudo systemctl status postgresql
Press the "q" key to exit the systemctl status output.
Allow Firewall port 5432 in both the nodes:
As sudo User in both the nodes:
sudo ufw allow 5432/tcp
Ensure the primary database is accessible from the standby nodes.
Standby Node as postgres User:
psql 'host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'
The PostgreSQL prompt should appear, indicating the repmgr database context:
Output:
psql (16.8 (Ubuntu 16.8-0ubuntu0.24.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
repmgr=#
When done, type quit or exit to leave the PostgreSQL prompt, then type exit again to return to the terminal shell as your standard Linux user with sudo privileges.
8. Configure the repmgr HA cluster settings on both nodes.
Create log folder in both the nodes as sudo User:
sudo mkdir -p /var/log/postgresql/repmgr.log
sudo touch /var/log/postgresql/repmgr.log
sudo chown postgres:postgres /var/log/postgresql/repmgr.log
Primary Node as sudo User:
sudo nano /etc/repmgr.conf
node_id=1
node_name=pg1
conninfo='host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
Create an etc/repmgr.conf file on each standby node:
Standby Node as sudo User:
sudo nano /etc/repmgr.conf
node_id=2
node_name=pg2
conninfo='host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
9. Register the primary server.
Primary Node as sudo User:
su - postgres
Primary Node as postgres User:
repmgr -f /etc/repmgr.conf primary register
Output:
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
The repmgr utility confirms the primary is registered:
Confirm the primary is running using the cluster show command.
Primary Node as postgres User:
repmgr -f /etc/repmgr.conf cluster show
A node with an ID of 1 has the role of primary and a status of running.
Output:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2
10. Clone and register the standby server.
Standby Node as sudo User:
sudo systemctl stop postgresql
Standby Node as sudo User:
su - postgres
Standby Node as postgres User:
repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --dry-run
Output:
NOTICE: destination directory "/var/lib/postgresql/16/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB
INFO: "repmgr" extension is installed in database "repmgr"
WARNING: target data directory appears to be a PostgreSQL data directory
DETAIL: target data directory is "/var/lib/postgresql/16/main"
HINT: use -F/--force to overwrite the existing data directory
INFO: replication slot usage not requested; no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: external configuration files detected, checking SSH connection to host "192.168.224.128"
INFO: SSH connection to host "192.168.224.128" succeeded
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met
-----------------------
If the dry run is successful, run the command again without the --dry-run option:
Standby Node as postgres User:
repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --force
Output:
NOTICE: destination directory "/var/lib/postgresql/16/main" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: copying external configuration files from upstream node "192.168.224.128"
INFO: rsync command line:
rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/postgresql.conf /etc/postgresql/16/main/postgresql.conf
receiving incremental file list
postgresql.conf
29,957 100% 28.57MB/s 0:00:00 (xfr#1, to-chk=0/1)
INFO: rsync command line:
rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_hba.conf /etc/postgresql/16/main/pg_hba.conf
receiving incremental file list
pg_hba.conf
6,484 100% 6.18MB/s 0:00:00 (xfr#1, to-chk=0/1)
INFO: rsync command line:
rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_ident.conf /etc/postgresql/16/main/pg_ident.conf
receiving incremental file list
WARNING: directory "/var/lib/postgresql/16/main" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/var/lib/postgresql/16/main"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/postgresql/16/main start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
--------------
Repeat the steps above to clone any other standby nodes.
On the standby node, make a further adjustment to postgresql.conf:
Standby Node as postgres User:
nano /etc/postgresql/16/main/postgresql.conf
data_directory = '/var/lib/postgresql/16/main'
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
Start PostgreSQL on all standby nodes:
Standby Node as sudo User:
sudo systemctl start postgresql
Verify that PostgreSQL has a status of active:
Standby Node as sudo User
sudo systemctl status postgresql
Press "q" to close the output and return to the terminal shell.
Check for replication is active, access the PostgreSQL shell on the primary node as the postgres user:
Primary Node as postgres User:
psql
Primary Node as postgres User in PostgreSQL Shell:
SELECT * FROM pg_stat_replication;
Output:
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start
| backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
5331 | 16388 | repmgr | pg2 | 192.168.224.129 | | 53056 | 2025-03-14 14:59:55.601531+00 | | streaming | 0/30001F0 | 0/30001F0 | 0/30001F0 | 0/30001F0 | | | | 0 | async | 2025-03-14 15:00:55.710461+00
(1 row)
Each active standby node should have its own entry. Scan for the following details in the output:
application_name should contain the node_name of the standby.
The client_addr should indicate the IP address of the standby node.
The state should be streaming.
The sync_state is async.
Return to the standby server and log in as the postgres account if not already:
Standby Node as sudo User:
su - postgres
Standby Node as postgres User:
repmgr -f /etc/repmgr.conf standby register
Output:
INFO: connecting to local node "pg2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pg2" (ID: 2) successfully registered
Repeat this operation on each standby node.
Run the cluster show command on the standby to confirm it is registered:
Standby Node as postgres User:
repmgr -f /etc/repmgr.conf cluster show
Output:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2
Register all nodes, including the primary and all standby nodes, with repmgrd. The daemon monitors the node and quickly responds to any failures.
Testing a Failover Event
To confirm the HA cluster is working, follow these steps.
On the primary node, access the postgres database as the postgres database user:
Primary Node as postgres User:
psql postgres
Create a new customers table inside the postgres database:
Primary Node as postgres User in PostgreSQL Shell:
CREATE TABLE employees (employee_id int, first_name varchar(100), last_name varchar(100));
INSERT INTO employees (employee_id, first_name, last_name) VALUES
(1, 'Chirag', 'Mahto'),
(2, 'Sanju', 'Mehta');
\dt
The new table appears in the output:
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | employees | table | postgres
(1 row)
postgres=# SELECT * FROM employees;
employee_id | first_name | last_name
-------------+------------+-----------
1 | Chirag | Mahto
2 | Sanju | Mehta
(2 rows)
Now, Access PostgreSQL on the standby node:
Standby Node as postgres User:
psql postgres
\dt
The same table appears in the output. The update on the primary is replicated to this node.
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | employees | table | postgres
SELECT * FROM employees;
Output:
employee_id | first_name | last_name
-------------+------------+-----------
1 | Chirag | Mahto
2 | Sanju | Mehta
(2 rows)
Primary Node as sudo User:
sudo systemctl stop postgresql
Standby Node as postgres User:
repmgr -f /etc/repmgr.conf cluster show
The new table appears in the output:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
1 | pg1 | primary | - failed | ? | default | 100 | | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 2 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2
Ensure the new primary allows write operations. Create a new table in the postgres database to verify this:
Standby Node as sudo User:
sudo -u postgres repmgr standby promote
psql
Standby Node as postgres User in PostgreSQL Shell
CREATE TABLE employees2 (employee_id int, first_name varchar(100), last_name varchar(100));
Standby Node as postgres User in PostgreSQL Shell
\dt
The new customers2 table is listed alongside the previous table:
List of relations
Schema | Name | Type | Owner
--------+------------+-------+----------
public | employees | table | postgres
public | employees2 | table | postgres
(2 rows)
For any doubts and query, please write on YouTube video comments section.
Note : Flow the Process shown in video.
😉Subscribe and like for more videos:
https://www.youtube.com/@chiragstutorial
💛Don't forget to, 💘Follow, 💝Like, 💖Share 💙&, Comment
Thanks & Regards,
Chitt Ranjan Mahto "Chirag"
_________________________________________________________________________________________
Note: All scripts used in this demo will be available in our website.
Link will be available in description.
Subscribe to my newsletter
Read articles from Chitt Ranjan Mahto (Chirag) directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
