inchirags@gmail.com Chirag's PostgreSQL DBA Tutorial https://www.chirags.in

*********************************************************************************************************

Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS

*********************************************************************************************************

https://www.youtube.com/watch?v=p_yvt0jLz4Q

Static IP addresses for all nodes.

+------------+------------------+---------------+
| Role       |  IP Address      | Hostname      |
+------------+------------------+---------------+
| Primary DB |  192.168.224.128 | node1         |
+------------+------------------+---------------+
| Replica DB |  192.168.224.129 | node2         |
+------------+------------------+---------------+

What Is repmgr?

Repmgr is an open source suite for managing PostgreSQL HA clusters. It closely integrates with PostgreSQL to configure a primary node, clone replica nodes, monitor the HA cluster, and perform a failover. Repmgr supports a single read-write primary server and one or more read-only standby nodes, also known as replicas. This guide focuses on repmgr because it is an efficient and robust solution with long-standing popularity in the database management area.

The complete list of steps required to provision PostgreSQL and repmgr follows this sequence:

1. Install PostgreSQL on both nodes.

2. Access and secure PostgreSQL on both nodes.

3. Install repmgr on both nodes.

4. Enable SSH connectivity between the nodes.

5. Create a repmgr user on the primary node.

6. Configure the database replication settings in the PostgreSQL configuration file on the primary.

7. Configure the PostgreSQL authentication settings on the primary.

8. Configure the repmgr HA cluster settings on both nodes.

9. Register the primary server.

10. Clone and register the standby server.

Steps:

1. Install PostgreSQL on both nodes.

Ensure the server is up to date. Use the apt update command to install any updates. Reboot the server if necessary.

All Nodes as sudo User:

sudo apt update

sudo apt upgrade -y

Install PostgreSQL:

All Nodes as sudo User:

sudo apt -y install postgresql postgresql-contrib

Ensure the PostgreSQL service starts automatically.

sudo systemctl start postgresql

sudo systemctl enable postgresql

Check the status:

sudo systemctl status postgresql

2. Access and secure PostgreSQL on both nodes.

Change the password for the postgres Linux account:

All Nodes as sudo User:

sudo passwd postgres

admin@123

Choose a unique strong password and store it in a secure location.

Output:

passwd: password updated successfully

Switch to the postgres user to access PostgreSQL Prompt:

All Nodes as sudo User:

sudo -u postgres psql

SELECT version();

Output:

---------------------------------------------------------------------------------------------------------------------------------

PostgreSQL 16.9 (Ubuntu 16.9-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit

(1 row)

Add a password for the postgres database account using the following command.

ALTER USER postgres WITH PASSWORD 'admin@123';

Output:

ALTER ROLE

Exit:

\q

3. Install repmgr on both nodes.

All Nodes as sudo User:

sudo apt-get install -y postgresql-16-repmgr

All Nodes as sudo User:

sudo systemctl enable repmgrd

sudo systemctl start repmgrd

4. Enable SSH connectivity between the nodes.

On the primary node, switch to the postgres user:

**Primary Node as sudo User:

su - postgres

Generate an SSH key:

Primary Node as postgres User:

ssh-keygen -t rsa -b 4096

Copy the key to each standby node in the HA cluster. In the following example, replace 192.168.224.129 with the IP node of your standby node:

Primary Node as postgres User:

ssh-copy-id postgres@192.168.224.129

SSH can now be used to access a standby node without a password from the primary. The following command, when run from the postgres account on the primary, places the user in the postgres user directory on the standby:

Primary Node as postgres User:

ssh 192.168.224.129

When done, type exit to log out of the secondary server.

exit

Repeat these steps to create a key on each standby node share it with the primary:

Standby Node as sudo User:

su - postgres

Standby Node as postgres User:

ssh-keygen -t rsa -b 4096

Standby Node as postgres User:

ssh-copy-id postgres@192.168.224.128

Output:

Number of key(s) added: 1

Now try logging into the machine

ssh postgres@192.168.224.128

and check to make sure that only the key(s) you wanted were added.

-----------snapshot---------

5. Create a repmgr user on the primary node (192.168.224.128).

To allow repmgr to manage PostgreSQL data replication, create a repmgr user on the primary server. Then create a new database for the repmgr data. The commands in this section must only be executed on the primary server. Do not create any database entries on the standby because this interferes with replication.

While logged in as the postgres account, create the repmgr user:

Primary Node as postgres User:

createuser -s repmgr

Now create the repmgr database, with the repmgr user as the owner:

Primary Node as postgres User:

createdb repmgr -O repmgr

6. Configure the database replication settings in the PostgreSQL configuration file on the primary.

To configure the replication settings in postgresql.conf, follow these steps. This file must only be changed on the primary node. Repmgr copies it to the standby nodes in a later configuration stage.

Edit the /etc/postgresql/16/main/postgresql.conf file as the postgres user:

Primary Node as postgres User:

nano /etc/postgresql/16/main/postgresql.conf

Change the settings in the following file sample to adjust the replication settings. In some cases, these lines only have to be uncommented. These lines are found in non-contiguous locations in the file. Use the text editor search utility (CTRL+W in nano)to find them.

File: /etc/postgresql/16/main/postgresql.conf

        listen_addresses = '*'
        shared_preload_libraries = 'repmgr'
        wal_level = replica
        archive_mode = on
        archive_command = '/bin/true'
        max_wal_senders = 10
        max_replication_slots = 10
        hot_standby = on
    wal_log_hints = on

Note:

It is easiest to set listen_addresses to *. However, in some networks this might pose additional security concerns. For extra security, set this value to a comma-separated list consisting of localhost and the IP addresses of all nodes in the HA cluster.

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

7. Configure the PostgreSQL authentication settings on the primary.

Users must also add client authentication capabilities to the pg_hba.conf file. This file tells PostgreSQL what type of connections to trust and how to authenticate them. Add entries to trust repmgr connections from both the primary and standby servers. Edit this file on the primary server only.

Open the /etc/postgresql/16/main/pg_hba.conf file for editing as the postgres user:

Primary Node as postgres User:

nano /etc/postgresql/16/main/pg_hba.conf

Add the below details:

local   replication   repmgr                                    trust
host    replication   repmgr            127.0.0.1/32            trust
host    replication   repmgr            192.168.224.128/32      trust
host    replication   repmgr            192.168.224.129/32      trust
local   repmgr        repmgr                                    trust
host    repmgr        repmgr            127.0.0.1/32            trust
host    repmgr        repmgr            192.168.224.128/32      trust
host    repmgr        repmgr            192.168.224.129/32      trust

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Restart the PostgreSQL process on the primary using an account with sudo privileges:

Primary Node as sudo User:

sudo systemctl restart postgresql

Ensure there are no error messages and PostgreSQL is still active:

Primary Node as sudo User:

sudo systemctl status postgresql

Press the "q" key to exit the systemctl status output.

Allow Firewall port 5432 in both the nodes:

As sudo User in both the nodes:

sudo ufw allow 5432/tcp

Ensure the primary database is accessible from the standby nodes.

Standby Node as postgres User:

psql 'host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'

The PostgreSQL prompt should appear, indicating the repmgr database context:

Output:

psql (16.8 (Ubuntu 16.8-0ubuntu0.24.04.1))

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)

Type "help" for help.

repmgr=#

When done, type quit or exit to leave the PostgreSQL prompt, then type exit again to return to the terminal shell as your standard Linux user with sudo privileges.

8. Configure the repmgr HA cluster settings on both nodes.

Create log folder in both the nodes as sudo User:

sudo mkdir -p /var/log/postgresql/repmgr.log

sudo touch /var/log/postgresql/repmgr.log

sudo chown postgres:postgres /var/log/postgresql/repmgr.log

Primary Node as sudo User:

sudo nano /etc/repmgr.conf

node_id=1
node_name=pg1
conninfo='host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Create an etc/repmgr.conf file on each standby node:

Standby Node as sudo User:

sudo nano /etc/repmgr.conf

node_id=2
node_name=pg2
conninfo='host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/16/main'
failover=automatic
promote_command='repmgr -f /etc/repmgr.conf standby promote --log-to-file'
follow_command='repmgr -f /etc/repmgr.conf standby follow --log-to-file'
log_file='/var/log/postgresql/repmgr.log'
log_level=DEBUG
primary_unreachable_timeout=5
repmgrd_standby_timeout=10
repmgrd_failover_delay=5
repmgrd_service_start_command='sudo systemctl start repmgrd'
repmgrd_service_stop_command='sudo systemctl stop repmgrd'
monitoring_history=yes

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

9. Register the primary server.

Primary Node as sudo User:

su - postgres

Primary Node as postgres User:

repmgr -f /etc/repmgr.conf primary register

Output:

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

The repmgr utility confirms the primary is registered:

Confirm the primary is running using the cluster show command.

Primary Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show

A node with an ID of 1 has the role of primary and a status of running.

Output:

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

10. Clone and register the standby server.

Standby Node as sudo User:

sudo systemctl stop postgresql

Standby Node as sudo User:

su - postgres

Standby Node as postgres User:

repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --dry-run

Output:

NOTICE: destination directory "/var/lib/postgresql/16/main" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB

INFO: "repmgr" extension is installed in database "repmgr"

WARNING: target data directory appears to be a PostgreSQL data directory

DETAIL: target data directory is "/var/lib/postgresql/16/main"

HINT: use -F/--force to overwrite the existing data directory

INFO: replication slot usage not requested; no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

NOTICE: external configuration files detected, checking SSH connection to host "192.168.224.128"

INFO: SSH connection to host "192.168.224.128" succeeded

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: would execute:

pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream

INFO: all prerequisites for "standby clone" are met

-----------------------

If the dry run is successful, run the command again without the --dry-run option:

Standby Node as postgres User:

repmgr -h 192.168.224.128 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --copy-external-config-files --force

Output:

NOTICE: destination directory "/var/lib/postgresql/16/main" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.224.128 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB

INFO: replication slot usage not requested; no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

NOTICE: copying external configuration files from upstream node "192.168.224.128"

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/postgresql.conf /etc/postgresql/16/main/postgresql.conf

receiving incremental file list

postgresql.conf

29,957 100% 28.57MB/s 0:00:00 (xfr#1, to-chk=0/1)

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_hba.conf /etc/postgresql/16/main/pg_hba.conf

receiving incremental file list

pg_hba.conf

6,484 100% 6.18MB/s 0:00:00 (xfr#1, to-chk=0/1)

INFO: rsync command line:

rsync --archive --checksum --compress --progress --rsh=ssh --delete --checksum 192.168.224.128:/etc/postgresql/16/main/pg_ident.conf /etc/postgresql/16/main/pg_ident.conf

receiving incremental file list

WARNING: directory "/var/lib/postgresql/16/main" exists but is not empty

NOTICE: -F/--force provided - deleting existing data directory "/var/lib/postgresql/16/main"

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

pg_basebackup -l "repmgr base backup" -D /var/lib/postgresql/16/main -h 192.168.224.128 -p 5432 -U repmgr -X stream

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /var/lib/postgresql/16/main start

HINT: after starting the server, you need to register this standby with "repmgr standby register"

--------------

Repeat the steps above to clone any other standby nodes.

On the standby node, make a further adjustment to postgresql.conf:

Standby Node as postgres User:

nano /etc/postgresql/16/main/postgresql.conf

data_directory = '/var/lib/postgresql/16/main'

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.

Start PostgreSQL on all standby nodes:

Standby Node as sudo User:

sudo systemctl start postgresql

Verify that PostgreSQL has a status of active:

Standby Node as sudo User

sudo systemctl status postgresql

Press "q" to close the output and return to the terminal shell.

Check for replication is active, access the PostgreSQL shell on the primary node as the postgres user:

Primary Node as postgres User:

psql

Primary Node as postgres User in PostgreSQL Shell:

SELECT * FROM pg_stat_replication;

Output:

------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------

5331 | 16388 | repmgr | pg2 | 192.168.224.129 | | 53056 | 2025-03-14 14:59:55.601531+00 | | streaming | 0/30001F0 | 0/30001F0 | 0/30001F0 | 0/30001F0 | | | | 0 | async | 2025-03-14 15:00:55.710461+00

(1 row)

Each active standby node should have its own entry. Scan for the following details in the output:

application_name should contain the node_name of the standby.

The client_addr should indicate the IP address of the standby node.

The state should be streaming.

The sync_state is async.

Return to the standby server and log in as the postgres account if not already:

Standby Node as sudo User:

su - postgres

Standby Node as postgres User:

repmgr -f /etc/repmgr.conf standby register

Output:

INFO: connecting to local node "pg2" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "pg2" (ID: 2) successfully registered

Repeat this operation on each standby node.

Run the cluster show command on the standby to confirm it is registered:

Standby Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show

Output:

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | * running | | default | 100 | 1 | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2

Register all nodes, including the primary and all standby nodes, with repmgrd. The daemon monitors the node and quickly responds to any failures.

Testing a Failover Event

To confirm the HA cluster is working, follow these steps.

On the primary node, access the postgres database as the postgres database user:

Primary Node as postgres User:

psql postgres

Create a new customers table inside the postgres database:

Primary Node as postgres User in PostgreSQL Shell:

CREATE TABLE employees (employee_id int, first_name varchar(100), last_name varchar(100));
INSERT INTO employees (employee_id, first_name, last_name) VALUES
(1, 'Chirag', 'Mahto'),
(2, 'Sanju', 'Mehta');
\dt

The new table appears in the output:

List of relations

Schema | Name | Type | Owner

--------+-----------+-------+----------

public | employees | table | postgres

(1 row)

postgres=# SELECT * FROM employees;

employee_id | first_name | last_name

-------------+------------+-----------

1 | Chirag | Mahto

2 | Sanju | Mehta

(2 rows)

Now, Access PostgreSQL on the standby node:

Standby Node as postgres User:

psql postgres

\dt

The same table appears in the output. The update on the primary is replicated to this node.

List of relations

Schema | Name | Type | Owner

--------+-----------+-------+----------

public | employees | table | postgres

SELECT * FROM employees;

Output:

employee_id | first_name | last_name

-------------+------------+-----------

1 | Chirag | Mahto

2 | Sanju | Mehta

(2 rows)

Primary Node as sudo User:

sudo systemctl stop postgresql

Standby Node as postgres User:

repmgr -f /etc/repmgr.conf cluster show

The new table appears in the output:

----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------

1 | pg1 | primary | - failed | ? | default | 100 | | host=192.168.224.128 user=repmgr dbname=repmgr connect_timeout=2

2 | pg2 | primary | * running | | default | 100 | 2 | host=192.168.224.129 user=repmgr dbname=repmgr connect_timeout=2

Ensure the new primary allows write operations. Create a new table in the postgres database to verify this:

Standby Node as sudo User:

sudo -u postgres repmgr standby promote

psql

Standby Node as postgres User in PostgreSQL Shell

CREATE TABLE employees2 (employee_id int, first_name varchar(100), last_name varchar(100));

Standby Node as postgres User in PostgreSQL Shell

\dt

The new customers2 table is listed alongside the previous table:

List of relations

Schema | Name | Type | Owner

--------+------------+-------+----------

public | employees | table | postgres

public | employees2 | table | postgres

(2 rows)

For any doubts and query, please write on YouTube video comments section.

Note : Flow the Process shown in video.

😉Subscribe and like for more videos:

https://www.youtube.com/@chiragstutorial

💛Don't forget to, 💘Follow, 💝Like, 💖Share 💙&, Comment

Thanks & Regards,

Chitt Ranjan Mahto "Chirag"

_________________________________________________________________________________________

Note: All scripts used in this demo will be available in our website.

Link will be available in description.

Manage Replication and Failover on a PostgreSQL 16 Cluster Using repmgr in 02 Nodes on Ubuntu 24.04 LTS

Subscribe to my newsletter

Chitt Ranjan Mahto (Chirag)

Chitt Ranjan Mahto (Chirag)