PostGreSQL Disaster Recovery and High Availability
High Availability Clustering for PostgreSQL on Linux
LifeKeeper is able to monitor all resources used by PostgreSQL (IP, Disk, File System) as well as checking that PostgreSQL itself is running (for example it could execute a known SQL query). This proactive approach allows LifeKeeper to initiate a failover to a different node in the cluster regardless of whether the fault was software related (O/S, Application etc) or hardware related (disk, network etc).
Server Organisation
LifeKeeper works through the server virtualisation concept, where all resources required for a particular application are available on all servers. This requires that there all nodes have access to binary files and application data. Generally binary files will be installed separately on each server, while application data will be shared or replicated between servers. The LifeKeeper PostgreSQL recovery kit allows the administrator to share or replicate the binary files between servers.Because the same resources are available to all nodes, it is necessary to be able to ensure data integrity across nodes, by only allowing one machine access to the data at any one time.

PostgreSQL with Data Replication - eliminating the need for shared storage
Shared storage is often expensive, and does not allow for a cluster to span a LAN or WAN, so it can not provide a true disaster recovery solution. In addition, the shared storage itself is the single point of failure.
LifeKeepers data replication software can be used to mirror data to a remote server, and seamlessly handle the switchover in the event of a failure.
When data is written to disk, LifeKeeper data replication will mirror the data to a remote server. This ensures that the data is up to date in the event of a failover occurring.
In the event of a failover, LifeKeeper will bring the target of the mirror into service, and reverse the direction of replication (if possible). After this, the PostgreSQL server is started.
The data replication can occur on either a LAN or a WAN, and either synchronous or asynchronous approaches are available. Because the data replication software only transmits changes in data, and reads at the raw block level, it is efficient, versatile, relatively fast and independent of the file system used by the operating system.
The image below illustrates the replication of data from one active node, to another backup node using the replication software. Because the backup server will have an up to date copy of the data held on the active server, failover can occur at any time.

PostgreSQL recovery over a Wide Area Network
The below image illustrates the data replication taking place over a Wide Area Network. As mentioned above, due to the efficiency of the data replication software, it is feasible for replication to take place to a remote site, allowing for optimal disaster recovery and data protection.
Active/Active
The diagrams so far have all shown PostgreSQL running with LifeKeeper protection, in an Active/Backup configuration - i.e. one machine is always idle in case of emergency. In some environments this redundancy of hardware may not be desirable and LifeKeeper can run in an Active/Active configuration.LifeKeeper can run in an Active/Active configuration, where on failure on one node, the other node takes over and runs both services locally. An Active/Active configuration may use either shared storage, or the data replication discussed above. The Active/Active configuration allows for optimal resource usage combined with high availabilit.
LifeKeeper in an N+1 Configuration
Alternatively, if many servers are to be used, it is feasible to have an N+1 configuration where one machine acts as a backup to numerous active servers. In the event of more than one failover occurring, the applications can fail over to any other active machines, in an order determined by the administrator during installation / initial configuration.

PostgreSQL in a shared storage environment
LifeKeeper can manage shared storage, and determine when to failover in the event of a failure on the active machine. The image below illustrates LifeKeeper in a shared storage configuration.
LifeKeeper ensures data integrity by locking access to a SCSI resource at the LUN level on shared storage. This eliminates any chance of data corruption occurring in a split brain scenario, and removes the requirement to purchase STONITH devices which add an extra layer of complexity to a solution.
This requires some form of external SCSI / NAS / RAID array, which is connected to all servers in the cluster. The storage unit itself should be configured to provide redundancy - providing fault tolerance at the data layer.
For a seamless recovery of your servers, SteelEye Life Keeper from Open
Minds guarantees to provide you with a complete disaster recovery
solution ensuring that you maintain business continuity. Our high
availability solutions will grant continuous data protection providing
data replication as well as the monitoring of all your servers to
ensure failover through the Java GUI.