PostgreSQL - Streaming Replication

Install (Ansible)

Parameters to use with the ANXS.postgresql role:

- name: PostgreSQL NODE1
  hosts: node1
  sudo: yes
  roles:
    - ANXS.postgresql
  vars:
    - postgresql_version: 9.4
    - postgresql_encoding: 'UTF-8'
    - postgresql_locale: 'en_US.UTF-8'
    - postgresql_listen_addresses:
        - '*'
    - postgresql_ssl: on
    - postgresql_wal_level: archive     # set 'hot_standby' for read only access
    - postgresql_max_wal_senders: 1
    - postgresql_max_replication_slots: 1
    - postgresql_hot_standby: off       # set 'on' for read only access
    - postgresql_users:
        - name: replication
          pass: password
          encrypted: no
    - postgresql_user_privileges:
        - name: replication
          db: "postgres"
          priv: "ALL"
          role_attr_flags: SUPERUSER
    - postgresql_pg_hba_custom:
        - { type: hostssl,  database: replication, user: replication,
            address: 'NODE2_IP/32', method: md5, comment: "Replication"}

Apply the same configuration on the node2 (changing the counterpart node address).

On NODE1 (current master)

As postgres user, create a replication slot dedicated to node2:

$ psql
postgres=# SELECT * FROM pg_create_physical_replication_slot('node2');
 slot_name | xlog_position
-----------+---------------
 node2     |

postgres=# SELECT * FROM pg_replication_slots;
 slot_name | slot_type | datoid | database | active | xmin | restart_lsn
-----------+-----------+--------+----------+--------+------+-------------
 node2     | physical  |        |          | f      |      |
(1 row)

On NODE2 (current standby)

Stop PostgreSQL:

# service postgresql stop

As postgres user, rebuild a cluster from the master:

$ rm -rf ~/9.4/main
$ pg_basebackup -D ~/9.4/main -d "host=NODE1_IP port=5432 user=replication password=password sslmode=require"

Write a recovery.conf file:

$ cat >> ~/9.4/main/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=NODE1_IP port=5432 user=replication password=password sslmode=require sslcompression=1'
primary_slot_name = 'node2'
trigger_file = '/var/lib/postgresql/9.4/main/failover_trigger'
EOF

Restart the server:

# service postgresql start

NODE2 is now a replica of NODE1.

Failover

Login as postgres on NODE2, and promote it as the new master:

$ pg_ctlcluster 9.4 main promote

Or by creating the trigger file:

$ touch /var/lib/postgresql/9.4/main/failover_trigger

Then prepare the replication slot to use on the standby server (node1, former master):

$ psql
postgres=# SELECT * FROM pg_create_physical_replication_slot('node1');
 slot_name | xlog_position
-----------+---------------
 node1     |

postgres=# SELECT * FROM pg_replication_slots;
 slot_name | slot_type | datoid | database | active | xmin | restart_lsn
-----------+-----------+--------+----------+--------+------+-------------
 node1     | physical  |        |          | f      |      |
(1 row)

Make NODE1 the new standby

Stop PostgreSQL on NODE1:

# service postgresql stop

As postgres user, rebuild a cluster from the new master node2:

$ mv ~/9.4/main ~/9.4/main.OLD
$ pg_basebackup -D ~/9.4/main -d "host=NODE2_IP port=5432 user=replication password=password sslmode=require"

Write a recovery.conf file:

$ cat >> ~/9.4/main/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=NODE2_IP port=5432 user=replication password=password sslmode=require sslcompression=1'
primary_slot_name = 'node1'
trigger_file = '/var/lib/postgresql/9.4/main/failover_trigger'
EOF

Restart the server:

# service postgresql start

NODE1 is now a replica of NODE2.

Restarting from the Failover part, you will be able to swap over and over again your PostgreSQL servers.