DbaKeeda: Step By Step – How To Setup Replication Manager(Repmgr) in PostgreSQL

Step 1 - Make sure the replication configuration is setup correctly.

Step 2 – Install repmgr on primary and standby servers.

Make sure we install the same version of repmgr which is of our postgreSQL version. Since I am running postgreSQL version 13, I will be installing repmgr-13.

Note - We have to install it as a root user.

On Primary Server -

#yum -y install repmgr13*

On Standby Server -

#yum -y install repmgr13*

Step 3 – Configure postgresql.conf on Master and Standby nodes.

Since we have already configured replication, most of the parameters are already set. But just check on the below parameters and set whichever is missing.

shared_preload_libraries = 'repmgr'

max_wal_senders = 5

max_replication_slots = 10

wal_level = replica

hot_standby = on

archive_mode = on

archive_command = 'cp %p /home/postgres/arch_dir/% - We can have the different archive locations.

Listen_address=’*’

Step 4 – Create a replication manager user as repmgr.

Psql$ create user repmgr;

Psql$ create database repmgr owner repmgr;

Psql$alter user repmgr with superuser;

Step 5 – Create repmgr.conf on Primary node.

Now we have to create repmgr.conf on Primary node. We can create it on any location but I have created under data directory only. My data directory is -

Create repmgr.conf with below parameters -

[postgres@Master master]$ cat repmgr.conf

cluster='failovertesting'

node_id=1

node_name=Master

conninfo='user=repmgr password=welcome host=192.168.204.133 dbname=repmgr port=5448 connect_timeout=2'

data_directory='/home/postgres/master/'

failover=automatic

promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /home/postgres/master/repmgr.conf --log-to-file'

follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /home/postgres/master/repmgr.conf --log-to-file --upstream-node-id=%n'

Parameter understanding –

failover='automatic'

The failover parameter is one of the mandatory parameters for the repmgr daemon. This parameter tells the daemon if it should initiate an automatic failover when a failover situation is detected. It can have either of two values: “manual” or “automatic”. We will set this to automatic in each node.

promote_command

promote_command= '/usr/pgsql-13/bin/repmgr standby promote -f /home/postgres/master/repmgr.conf --log-to-file'

This is another mandatory parameter for the repmgr daemon. This parameter tells the repmgr daemon what command it should run to promote a standby. The value of this parameter will be typically the “repmgr standby promote” command, or the path to a shell script that calls the command. For our use case, we set this to the following in each node:

follow_command

follow_command= '/usr/pgsql-13/bin/repmgr standby follow -f /home/postgres/master/repmgr.conf --log-to-file --upstream-node-id=%n'

This is the third mandatory parameter for the repmgr daemon. This parameter tells a standby node to follow the new primary. The repmgr daemon replaces the %n placeholder with the node ID of the new primary at run time.

priority

The priority parameter adds weight to a node’s eligibility to become a primary. Setting this parameter to a higher value gives a node greater eligibility to become the primary node. Also, setting this value to zero for a node will ensure the node is never promoted as primary.

Suppose we have two standbys: replica1 and replica2 and we want to promote replica2 as the new primary when master node goes offline, and replica2 to follow replica1 as its new primary. We set the parameter to the following values in the two standby nodes:

Node Name Parameter Setting

replica1 priority=60

replica2 priority=40

monitor_interval_secs

This parameter tells the repmgr daemon how often (default is seconds) it should check the availability of the upstream node. In our case, there is only one upstream node i.e the master node. The default value is 2 seconds, but we will explicitly set this anyhow in each node like below

monitor_interval_secs=2

connection_check_type

This parameter dictates the protocol repmgr daemon will use to reach out to the upstream node. This parameter can take three values:

ping: repmgr uses the PQPing() method.

connection: repmgr tries to create a new connection to the upstream node

query: repmgr tries to run a SQL query on the upstream node using the existing connection

Again, we will set this parameter to the default value of ping in each node:

connection_check_type='ping'

reconnect_attempts and reconnect_interval

When the primary becomes unavailable, the repmgr daemon in the standby nodes will try to reconnect to the primary for reconnect_attempts times. The default value for this parameter is 6. Between each reconnect attempt, it will wait for reconnect_interval seconds, which has a default value of 10.

For demonstration purposes, we will use a short interval and fewer reconnect attempts. We set this parameter in every node:

reconnect_attempts=4

reconnect_interval=8

primary_visibility_consensus

When the primary becomes unavailable in a multi-node cluster, the standbys can consult each other to build a quorum about a failover. This is done by asking each standby about the time it last saw the primary. If a node’s last communication was very recent and later than the time the local node saw the primary, the local node assumes the primary is still available, and does not go ahead with a failover decision.

To enable this consensus model, the primary_visibility_consensus parameter needs to be set to “true” in each node – including the witness:

primary_visibility_consensus=true

standby_disconnect_on_failover

When the standby_disconnect_on_failover parameter is set to “true” in a standby node, the repmgr daemon will ensure its WAL receiver is disconnected from the primary and not receiving any WAL segments. It will also wait for the WAL receivers of other standby nodes to stop before making a failover decision. This parameter should be set to the same value in each node. We are setting this to “true”.

standby_disconnect_on_failover=true

Setting this parameter to true means every standby node has stopped receiving data from the primary as the failover happens. The process will have a delay of 5 seconds plus the time it takes the WAL receiver to stop before a failover decision is made. By default, the repmgr daemon will wait for 30 seconds to confirm all sibling nodes have stopped receiving WAL segments before the failover happens.

Step 6 – Register Primary database cluster in the repmgr configuration

Since we have created the repmgr.conf file with the minimum required parameters. Now we have to register our primary database cluster in repmgr configuration.

[postgres@Master master]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/master/repmgr.conf primary register

We can see primary node is registered successfully using below command –

[postgres@Master master]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/master/repmgr.conf cluster show

Step 7 – Create repmgr.conf file on standby node.

We can create it in any location but I have created in the data directory only. My data directory is –

[postgres@Replica1 replica1]$ cat repmgr.conf

cluster='failovertesting'

node_id=2

node_name=Replica1

conninfo='host=192.168.204.134 user=repmgr password=welcome dbname=repmgr port=5448 connect_timeout=2'

data_directory='/home/postgres/replica1/'

failover=automatic

promote_command='/usr/pgsql-13/bin/repmgr standby promote -f /home/postgres/replica1/repmgr.conf --log-to-file'

follow_command='/usr/pgsql-13/bin/repmgr standby follow -f /home/postgres/replica1/repmgr.conf --log-to-file --upstream-node-id=%n'

Step 8 – Register Standby Database Cluster

Now we have to register our standby database cluster in the repmgr configuration.

[postgres@Replica1 replica1]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/replica1/repmgr.conf standby register

Getting below error –

Then I had to force register it like below –

[postgres@Replica1 replica1]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/replica1/repmgr.conf standby register –force

But when I was showing cluster status on primary it was giving below warning –

[postgres@Master ~]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/master/repmgr.conf cluster show

So I had to run below command on standby database.

[postgres@Replica1 ~]$ /usr/pgsql-13/bin/repmgr -f /home/postgres/replica1/repmgr.conf standby follow

After than when I checked the cluster status on primary it is running without any error –

I checked both the nodes and it is running fine –

/usr/pgsql-13/bin/repmgr -f /home/postgres/master/repmgr.conf node check

$/usr/pgsql-13/bin/repmgr –f /home/postgres/master/repmgr.conf cluster crosscheck

Step 9 – Start repmgrd process on master and standby

[postgres@Master ~]$ /usr/pgsql-13/bin/repmgrd -f /home/postgres/master/repmgr.conf

[postgres@Replica1 ~]$ /usr/pgsql-13/bin/repmgrd -f /home/postgres/replica1/repmgr.conf

We can see the series of events using cluster events commands –

When it starts monitoring then we will see the below details –

Repmgr Command Options –

====================The End================

DbaKeeda

Translate into your own language

Wednesday, September 29, 2021

Step By Step – How To Setup Replication Manager(Repmgr) in PostgreSQL

1 comment: