High Availability Clustering for PostgreSQL on Linux
For a seamless recovery of your servers, SteelEye Life Keeper from Open Minds guarantees to provide you with a complete disaster recovery solution ensuring that you maintain business continuity. Our high availability solutions will grant continuous data protection providing data replication as well as the monitoring of all your servers to ensure failover through the Java GUI.
LifeKeeper is able to monitor all resources used by PostgreSQL
(IP, Disk, File System) as well as checking that PostgreSQL itself
is running (for example it could execute a known SQL query). This
proactive approach allows LifeKeeper to initiate a failover to a different
node in the cluster regardless of whether the fault was software related
(O/S, Application etc) or hardware related (disk, network etc).
Server Organisation
LifeKeeper works through the server virtualisation concept, where all
resources required for a particular application are available on all servers. This
requires that there all nodes have access to binary files and application data.
Generally binary files will be installed separately on each server, while application data
will be shared or replicated between servers. The LifeKeeper PostgreSQL
recovery kit allows the administrator to share or replicate the binary files between servers.Because the same resources are available to all nodes, it is necessary to
be able to ensure data integrity across nodes, by only allowing one machine access to the
data at any one time.
PostgreSQL with Data Replication - eliminating the need for shared
storageShared storage is often expensive, and does not allow for a cluster to span
a LAN or WAN, so it can not provide a true disaster recovery solution. In addition, the shared storage itself is the single point of failure.
LifeKeepers data replication software
can be used to mirror data to a remote
server, and seamlessly handle the switchover in the event of a failure.
When data is written to disk, LifeKeeper data replication will
mirror the data to a remote server. This ensures that the data
is up to date in the event of a failover occurring.
In the event of a failover, LifeKeeper will bring the target of the mirror
into service, and reverse the direction of replication (if possible).
After this, the PostgreSQL server is started.
The data replication can occur on either a LAN or a WAN, and either
synchronous or asynchronous approaches are available.
Because the data replication software only transmits changes in
data, and reads at the raw block level, it is efficient, versatile,
relatively fast and independent of the file system used by the operating system.
The image below illustrates the replication of data from one active
node, to another backup node using the replication software.
Because the backup server will have an up to date copy of
the data held on the active server, failover can occur at any time.
PostgreSQL recovery over a Wide Area Network
The below image illustrates the data replication taking place over
a Wide Area Network. As mentioned above, due to the efficiency of
the data replication software, it is feasible for replication to
take place to a remote site, allowing for optimal disaster recovery
and data protection.
Active/Active
The diagrams so far have all shown PostgreSQL running with LifeKeeper
protection, in an Active/Backup configuration - i.e. one machine
is always idle in case of emergency. In some environments this redundancy
of hardware may not be desirable and LifeKeeper can run in an Active/Active
configuration.LifeKeeper can run in an Active/Active
configuration, where on failure on one node, the other node takes
over and runs both services locally. An Active/Active configuration
may use either shared storage, or the data replication discussed
above. The Active/Active configuration allows for optimal
resource usage combined with high availabilit.
LifeKeeper in an N+1 Configuration
Alternatively, if many servers are to be used, it is feasible to
have an N+1 configuration where one machine acts as a backup to
numerous active servers. In the event of more than one failover
occurring, the applications can fail over to any other active machines,
in an order determined by the administrator during installation
/ initial configuration.
PostgreSQL in a shared storage environment
LifeKeeper can manage shared storage, and determine when to failover
in the event of a failure on the active machine. The image
below illustrates LifeKeeper in a shared storage configuration.
LifeKeeper ensures data integrity by locking access to a SCSI resource at
the LUN level on shared storage. This eliminates any chance of data
corruption occurring in a split brain scenario, and removes the requirement
to purchase STONITH devices which add an extra layer of complexity to a
solution.
This requires some form of external SCSI / NAS /
RAID array, which is connected to all servers in the cluster. The
storage unit itself should be configured to provide redundancy -
providing fault tolerance at the data layer.