Translate into your own language

Tuesday, December 7, 2021

Step by step - How to rebuild standby database in PostgreSQL using PG_Rewind utility

 ========================PG_Rewind=================

Must set wal_log_hint to on in postgresql.conf. This change requires restart

#wal_log_hints = off

All wal needs to be present in the pg_wal directory. In case it has been deleted or copied to archived location then just

copy those files back to pg_wal directory. Otherwise pg_rewind will not work.

[postgres@master ~]$ pg_rewind --help

pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.

Usage:

  pg_rewind [OPTION]...

Options:

  -c, --restore-target-wal       use restore_command in target configuration to

                                 retrieve WAL files from archives

  -D, --target-pgdata=DIRECTORY  existing data directory to modify

      --source-pgdata=DIRECTORY  source data directory to synchronize with

      --source-server=CONNSTR    source server to synchronize with

  -n, --dry-run                  stop before modifying anything

  -N, --no-sync                  do not wait for changes to be written

                                 safely to disk

  -P, --progress                 write progress messages

  -R, --write-recovery-conf      write configuration for replication

                                 (requires --source-server)

      --debug                    write a lot of debug messages

      --no-ensure-shutdown       do not automatically fix unclean shutdown

  -V, --version                  output version information, then exit

  -?, --help                     show this help, then exit


=========================================================================================================================

pg_rewind --target-pgdata=prod --source-pgdata=rep1 --source-server="port=5448 user=postgres dbname=postgres" -n -R -P

==========================================================================================================================

    --source-server="port=5433 user=postgres dbname=postgres"


pg_rewind -D primary --source-server='cclocalhost port=5531' -P


===============

1. 

Before pg_rewind on master -


[postgres@master ~]$ psql -p 5448

psql (13.4)

Type "help" for help.


postgres=# \dt

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | emp  | table | postgres

 public | t1   | table | postgres

 public | t2   | table | postgres

 public | t3   | table | postgres

 public | t4   | table | postgres

(5 rows)


postgres=#


on replica1 --


[postgres@replica1 ~]$ psql -p 5448

psql (13.4)

Type "help" for help.


postgres=# d

postgres-# \dt

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | emp  | table | postgres

 public | t1   | table | postgres

 public | t2   | table | postgres

 public | t3   | table | postgres

 public | t4   | table | postgres

(5 rows)


on replica2

[postgres@replica2 ~]$ psql -p 5448

psql (13.4)

Type "help" for help.


postgres=# \dt

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | emp  | table | postgres

 public | t1   | table | postgres

 public | t2   | table | postgres

 public | t3   | table | postgres

 public | t4   | table | postgres

(5 rows)


Step 2 ---

Kill the master server

[postgres@master ~]$ kill -9 13013


Setp 3 ---

Promote the replica1

postgres-# \q

[postgres@replica1 ~]$ pg_ctl -D rep1 promote

waiting for server to promote.... done

server promoted


verify it from log -

2021-09-18 15:01:33.382 PDT [11617] LOG:  received promote request

2021-09-18 15:01:33.382 PDT [11617] LOG:  redo done at 0/2F029B20

2021-09-18 15:01:33.382 PDT [11617] LOG:  last completed transaction was at log time 2021-09-18 14:09:00.239418-07

2021-09-18 15:01:33.419 PDT [11617] LOG:  selected new timeline ID: 2

2021-09-18 15:01:33.563 PDT [11617] LOG:  archive recovery complete

2021-09-18 15:01:33.588 PDT [11615] LOG:  database system is ready to accept connections


step 4--

Cretated few tables on replica1

postgres=# \dt

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | emp  | table | postgres

 public | t1   | table | postgres

 public | t2   | table | postgres

 public | t3   | table | postgres

 public | t4   | table | postgres

 public | t5   | table | postgres

 public | t6   | table | postgres

 public | t7   | table | postgres

(8 rows)


step 5 --


[postgres@master ~]$ pg_rewind --target-pgdata=prod --source-pgdata=rep1 --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -n -R -P

pg_rewind: error: only one of --source-pgdata or --source-server can be specified

Try "pg_rewind --help" for more information.

[postgres@master ~]$ pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -n -R -P

pg_rewind: fatal: could not connect to server: Connection refused

        Is the server running locally and accepting

        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5448"?

[postgres@master ~]$

---Enable port -

firewall-cmd --permanent --zone=public --add-port=5448/tcp

===


[postgres@master ~]$ pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -n -R -P

pg_rewind: connected to server

pg_rewind: executing "/usr/pgsql-13/bin/postgres" for target server to complete crash recovery

pg_rewind: fatal: target server needs to use either data checksums or "wal_log_hints = on"


make wal_log_hints = on

=====


Note - I had to restart the cluster -- and then again shut it down


[postgres@master ~]$ pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -n -R -P

pg_rewind: connected to server

pg_rewind: executing "/usr/pgsql-13/bin/postgres" for target server to complete crash recovery

pg_rewind: fatal: target server must be shut down cleanly


[postgres@master ~]$ pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -n -R -P

pg_rewind: connected to server

pg_rewind: executing "/usr/pgsql-13/bin/postgres" for target server to complete crash recovery

pg_rewind: fatal: target server must be shut down cleanly

[postgres@master ~]$ pg_ctl -D prod stop

waiting for server to shut down.... done

server stopped

[postgres@master ~]$ pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -R -P

pg_rewind: connected to server

pg_rewind: servers diverged at WAL location 0/2F029B58 on timeline 1

pg_rewind: rewinding from last common checkpoint at 0/2F029AA8 on timeline 1

pg_rewind: reading source file list

pg_rewind: reading target file list

pg_rewind: reading WAL in target

pg_rewind: need to copy 87 MB (total source directory size is 142 MB)

89475/89475 kB (100%) copied

pg_rewind: creating backup label and updating control file

pg_rewind: syncing target data directory

pg_rewind: Done!

[postgres@master ~]$

===============

[postgres@master ~]$ pg_ctl -D prod start

waiting for server to start....2021-09-18 15:29:35.326 PDT [14485] LOG:  redirecting log output to logging collector process

2021-09-18 15:29:35.326 PDT [14485] HINT:  Future log output will appear in directory "log".

........................................................... stopped waiting

pg_ctl: server did not start in time

[postgres@master ~]$ cd prod/

[postgres@master prod]$ ps -ef | grep postgres

postgres    1171       1  0 00:54 ?        00:00:00 /usr/pgsql-13/bin/postmaster -D /var/lib/pgsql/13/data/

postgres    1240    1171  0 00:54 ?        00:00:00 postgres: logger

postgres    1306    1171  0 00:54 ?        00:00:00 postgres: checkpointer

postgres    1307    1171  0 00:54 ?        00:00:01 postgres: background writer

postgres    1308    1171  0 00:54 ?        00:00:01 postgres: walwriter

postgres    1309    1171  0 00:54 ?        00:00:00 postgres: autovacuum launcher

postgres    1310    1171  0 00:54 ?        00:00:00 postgres: stats collector

postgres    1311    1171  0 00:54 ?        00:00:00 postgres: logical replication launcher

postgres    3919       1  0 02:12 ?        00:00:00 /usr/lib/systemd/systemd --user

postgres    3924    3919  0 02:12 ?        00:00:00 (sd-pam)

postgres    4009    3919  0 02:12 ?        00:00:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

root       14224    1136  0 15:17 ?        00:00:00 sshd: postgres [priv]

postgres   14228   14224  0 15:17 ?        00:00:00 sshd: postgres@pts/2

postgres   14232   14228  0 15:17 pts/2    00:00:00 -bash

postgres   14485       1  0 15:29 ?        00:00:00 /usr/pgsql-13/bin/postgres -D prod

postgres   14486   14485  0 15:29 ?        00:00:00 postgres: logger

postgres   14487   14485  0 15:29 ?        00:00:00 postgres: startup waiting for 00000001000000000000002F

postgres   14532   14232  0 15:30 pts/2    00:00:00 ps -ef

postgres   14533   14232  0 15:30 pts/2    00:00:00 grep --color=auto postgres

[postgres@master prod]$ cd

[postgres@master ~]$ pg_ctl -D prod stop

waiting for server to shut down.... done

server stopped

$pg_rewind --target-pgdata=prod  --source-server="host=192.168.204.134 port=5448 user=postgres dbname=postgres" -R -P -c

After rebuild --

[postgres@master ~]$ psql -p 5448

psql (13.4)

Type "help" for help.


postgres=# \dt

        List of relations

 Schema | Name | Type  |  Owner

--------+------+-------+----------

 public | emp  | table | postgres

 public | t1   | table | postgres

 public | t2   | table | postgres

 public | t3   | table | postgres

 public | t4   | table | postgres

 public | t5   | table | postgres

 public | t6   | table | postgres

 public | t7   | table | postgres

(8 rows)

postgres=#

pg_rewind --target-pgdata=rep1  --source-server="host=192.168.204.133 port=5448 user=postgres dbname=postgres" -R -P -c


No comments:

Post a Comment