PostgreSQL - Replication and High Availability
We were using MySQL as database for Teckportal till now, whereas the Open-source software we use for Teckportal known as loraserver uses PostgreSQL. Maintaining different databases and deploying them with replication and high availability is a big problem. So we decided to switch Teckportal to PostgreSQL as well.
Since we are using liquibase, we didn't have too many problems with migration of tables because liquibase took care of it. The actual table data from MySQL
was exported as CSV and imported into tables in PostgreSQL.
Replication
Repmgr is the tool which I used for replication. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations. Repmgr uses the built in streaming replication of PostgreSQL.
Please note that in PostgreSQL replication, primary node has read and write functionality, whereas the standby node is read only.
Setup
As a first step, remove all the postgresql leftovers from the system:
sudo apt-get --purge remove postgresql-*
sudo rm -Rf /etc/postgresql /var/lib/postgresql
Note: The server I am using has ubuntu 20.04 instaled.
Install PostgreSQL12:
deb http://apt.postgresql.org/pub/repos/apt/ focal-pgdg main
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install postgresql-12
Install Repmgr 5.1:
curl https://dl.2ndquadrant.com/default/release/get/deb | sudo bash
sudo apt-get install postgresql-12-repmgr
Let's assume that we have 2 servers - a primary and a standby.Their IPs are listed below:
- Primary (127.0.0.1)
- Standby (127.0.0.2)
After installation use the postgres account to create a specific user and database for Repmgr.
sudo -i -u postgres
createuser --replication --createdb --createrole --superuser repmgr
psql -c 'ALTER USER repmgr SET search_path TO repmgr_test, "$user", public;'
createdb repmgr --owner=repmgr
Now we should configure the SSH access between each servers.
(On Primary)
# Generate id_rsa ssh key
ssh-keygen
# Copy ssh keys to other servers
ssh-copy-id root@127.0.0.2
(On Standby)
# Generate id_rsa ssh key
ssh-keygen
# Copy ssh keys to other servers
ssh-copy-id root@127.0.0.1
Then test the SSH connection. You should be able to log in to each server from the other one without a password.
# On Primary
ssh 127.0.0.2
# On Standby
ssh 127.0.0.1
Edit postgresql.conf file in both primary and standby with the following changes, so that in future, standby acts as primary with minimal changes.
sudo vi /etc/postgresql/12/main/postgresql.conf
listen_address = '*'
shared_preload_libraries = 'repmgr'
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 64
max_replication_slots = 5
hot_standby = on
wal_log_hints = on
Also note that the settings like max_wal_senders, wal_keep_segments etc are dependent on your server. The web page gives you optimal values for these properties as per your server configuration.
https://pgtune.leopard.in.ua/#/
Now edit pg_hba.comf in both primary and standby with the following changes:
sudo vi /etc/postgresql/12/main/pg_hba.conf
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 127.0.0.2/32 trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 127.0.0.2/32 trust
Restart Postgresql sudo service Postgresql restart and test the connection on each servers with the following commands:
# On Primary
sudo su postgres
psql 'host=127.0.0.2 dbname=repmgr user=repmgr'
# On Standby
sudo su postgres
psql 'host=127.0.0.1 dbname=repmgr user=repmgr'
Now the PostgreSQL servers on both the servers are ready for clustering.
Cluster creation
Now it's time to create the Repmgr cluster. Do this on both the servers.
Use host=127.0.0.2, node_id=1 and node_name = 'node1' on the first server and host=127.0.0.1, node_id=2 and node_name = 'node2' on the second server.
sudo vi /etc/repmgr.conf
node_id = 1
node_name = 'node1'
conninfo = 'host=127.0.0.2 user=repmgr dbname=repmgr'
data_directory = '/var/lib/postgresql/12/main'
use_replication_slots = yes
reconnect_attempts = 5
reconnect_interval = 1
failover = manual
pg_bindir = '/usr/lib/postgresql/12/bin'
promote_command = 'repmgr standby promote -f /etc/repmgr.conf'
follow_command = 'repmgr standby follow -f /etc/repmgr.conf'
log_level = INFO
log_file = '/var/log/postgresql/repmgr.log
If your firewall/ufw is on, make sure that you allow connections to and from both the servers on postgres port.
Cluster registration
Now it’s time to register the primary node in Repmgr on primary.
repmgr primary register
Then you can check the new cluster with repmgr cluster show.
Now configure the standby server.
sudo service postgresql stop
sudo su postgres
rm -rf /var/lib/postgresql/12/main
repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone
exit
sudo service postgresql start
sudo su postgres
repmgr -f /etc/repmgr.conf standby register
If everything went well, then congratulations, you have your Postgresql cluster running!
You can check the cluster status with the following command and you should see your primary node as well as your new standby node.
repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=127.0.0.1 user=repmgr dbname=repmgr
2 | node2 | standby | running | node1 | default | 100 | 1 | host=127.0.0.2 user=repmgr dbname=repmgr
Java connection string
The syntax for the connection url is:
jdbc:postgresql://host1:port1,host2:port2/database
For more details visit: https://jdbc.postgresql.org/documentation/head/connect.html
Load Balancing and High Availability
For Java and Spring, the method stated earlier is more than enough. But chirpstack uses Golang and does not support multiple IPs in connection strings. So I decided to use Pgpool2 which provides high availability as well as load balancing.
Setup
Follow the below instructions on both primary and standby servers. Only change wdhostname , heartbeatdestination0 and otherpgpoolhostname0 on the standby server.
#Install pgpool2
sudo apt-get install pgpool2
#Install pgpool extensions
sudo apt-get install postgresql-12-pgpool2
Now edit the config file for Pgpool2.
sudo vi /etc/pgpool2/pgpool.conf
listen_addresses = '*'
port = 5433
backend_hostname0 = '127.0.0.1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/12/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'node1'
backend_hostname1 = '127.0.0.2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/12/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'node2'
connection_cache = on
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 10
sr_check_user = 'repmgr'
sr_check_database = 'repmgr'
delay_threshold = 10000000
health_check_period = 10
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password = ''
health_check_database = 'postgres'
health_check_max_retries = 3
health_check_retry_delay = 10
connect_timeout = 10000
failover_command = '/opt/pgpool/scripts/failover.sh %d %h %p %D %m %H %M %P %r %R'
fail_over_on_backend_error = off
auto_failback = on
use_watchdog = on
wd_hostname = '127.0.0.1'
wd_port = 9999
wd_heartbeat_port = 9694
heartbeat_destination0 = '127.0.0.2'
heartbeat_destination_port0 = 9694
other_pgpool_hostname0 = '127.0.0.2'
other_pgpool_port0 = 5433
other_wd_port0 = 9999
Please make sure that backend_application_name0 and backend_application_name1 are same as the node_name in repmgr.conf file for each of the servers.
Create Pgpool user and Pgpool database.
sudo su postgres
psql
create user pgpool;
create database pgpool;
GRANT ALL PRIVILEGES ON DATABASE "pgpool" to pgpool;
GRANT pg_monitor TO pgpool;
GRANT pg_monitor TO repmgr;
Edit pg_hba.conf
sudo vi /etc/postgresql/12/main/pg_hba.conf
host pgpool pgpool 127.0.0.1/32 trust
host pgpool pgpool 127.0.0.1/32 trust
Please note that the above setup for pgpool includes watchdog setup.Watchdog is a sub process of Pgpool-II to add high availability for pgpool itself. Watchdog is used to resolve the single point of failure by coordinating multiple Pgpool-II nodes.
*We can setup watchdog with delegate ip and use that ip to connect to postgres. But the delegate ip is a virtual ip and the condition here is that all the pgpool instances are inside the same subnet. This does not work in our case and I am not setting up watchdog with delegate ip here. In my case I am just using watchdog to keep both the pgpool instances in sync.
*
Setup failover sciprt
cd /opt
mkdir pgpool
cd pgpool
mkdir scripts
cd scripts
vi failover.sh
#!/bin/bash
LOGFILE=/var/log/pgpool/failover.log
if [ ! -f $LOGFILE ] ; then
> $LOGFILE
fi
PGVER=${PGVER:-12}
echo $@
# Special values:
# %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
FALLING_NODE=$1 # %d
FALLING_HOST=$2 # %h
FALLING_PORT_NUMBER=$3 # %p
FALLING_CLUSTER_PATH=$4 # %D
NEW_MASTER_ID=$5 # %m
NEW_HOST=$6 # %H
OLD_MASTER_ID=$7 # %M
OLD_PRIMARY_ID=$8 # %P
NEW_PORT=$9 # %r
NEW_CLUSTER_PATH=$10 # %R
(
date
echo "FALLING_NODE: $FALLING_NODE"
echo "FALLING_HOST: $FALLING_HOST"
echo "FALLING_PORT_NUMBER: $FALLING_PORT_NUMBER"
echo "FALLING_CLUSTER_PATH: $FALLING_CLUSTER_PATH"
echo "NEW_MASTER_ID: $NEW_MASTER_ID"
echo "NEW_HOST: $NEW_HOST"
echo "OLD_MASTER_ID: $OLD_MASTER_ID"
echo "OLD_PRIMARY_ID: $OLD_PRIMARY_ID"
echo "NEW_PORT: $NEW_PORT"
echo "NEW_CLUSTER_PATH: $NEW_CLUSTER_PATH"
#ssh_options="ssh -p 22 -n -T -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
set -x
if [ $FALLING_NODE = $OLD_PRIMARY_ID ] ; then
ssh root@${NEW_HOST} "sudo -u postgres /usr/lib/postgresql/12/bin/repmgr --log-to-file -f /etc/repmgr.conf standby promote -v"
else
echo old primary id is $OLD_PRIMARY_ID and falling node is $FALLING_NODE
fi
exit 0;
) 2>&1 | tee -a ${LOGFILE}
sudo chowm -R postgres:postgres /opt/pgpool
Enable password less ssh access from root user and postgres user of both the servers:
Server 1
sudo su postgres
ssh-keygen
ssh-copy-id root@127.0.0.1
ssh-copy-id root@127.0.0.2
Server 2
sudo su postgres
ssh-keygen
ssh-copy-id root@127.0.0.1
ssh-copy-id root@127.0.0.2
Now restart Pgpool on both the servers.
sudo service pgpool2 restart
Now check the pgpool status using the following commands
sudo su postgres
psql -h 127.0.0.1 -p 5433 -U pgpool
show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 127.0.0.1 | 5432 | up | 0.500000 | primary | 29 | true | 0 | | | 2020-06-05 11:54:40
1 | 127.0.0.2 | 5432 | up | 0.500000 | standby | 52343 | false | 0 | streaming |async | 2020-06-05 11:54:40
In your connection string you just need to give the Pgpool IP and port. When your application connects to Pgpool, it will send the write requests to PostgreSQL primary server. The read requests meanwhile will be load balanced between the primary and standby servers. So you don't need multiple IPs in your connection string.
Failover
-
What happens in pgpool if primary node fails?
Pgpool will initiate failover. It will execute the script mentioned in failover_command in the pgpool.conf file, which will promote the old standby into new primary. -
What happens if the previous primary node comes back online?
We have to manually attach the old primary as new standby. Run the following commands:sudo service postgresql stop sudo su postgres rm -rf /var/lib/postgresql/12/main repmgr -h 127.0.0.2 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone exit sudo service postgresql start sudo su postgres repmgr -f /etc/repmgr.conf standby register
If we have set auto_failback as on in pgpool.conf, then pgpool will update its records to set standby as up again.
-
What happens in pgpool if stanby node fails?
Pgpool will update its records and set standby node as down. If the standby node comes back online, if we have set auto_failback as on in pgpool.conf, then pgpool will update its records to set standby as up again.
That's it! If you face any issues while setting up replication for PostgreSQL, please feel free to contact me.