Transparent Application Failover

Transparent Application Failover (TAF)

 

A runtime failover for high-availability environments, such as Oracle9i Real Application Clusters and Oracle Fail Safe, that refers to the failover and re-establishment of application-to-service connections. It enables client applications to automatically reconnect to the database if the connection fails, and, optionally, resume a SELECT statement that was in progress.Note when using OCI – this reconnect happens automatically.

Transparent Application Failover (TAF) is a client-side feature that allows for clients to reconnect to surviving databases in the event of a failure of a database instance. Notifications are used by the server to trigger TAF callbacks on the client-side.

TAF is configured using either client-side specified TNS connect string or using server-side service attributes. However, if both methods are used to configure TAF, the server-side service attributes will supersede the client-side settings. The server-side service attributes are the preferred way to set up TAF.
TAF can operate in one of two modes, Session Failover and Select Failover. Session Failover will recreate lost connections and sessions. Select Failover will replay queries that were in progress.

TAF automatically reestablishes the connection using the same connect string or an alternate connect string that you specify when configuring failover. TAF automatically logs a user in with the same user ID as was used prior to failure. If multiple users were using the connection, then TAF automatically logs them in as they attempt to process database commands. Unfortunately, TAF cannot automatically restore other session properties. These properties can, however, be restored by invoking a callback function. If a command was completely executed upon connection failure, and it changed the state of the database, TAF does not resend the command. If TAF reconnects in response to a command that may have changed the database, TAF issues an error message to the application. Any active transactions are rolled back at the time of failure because TAF cannot preserve active transactions after failover. The application instead receives an error message until a ROLLBACK is submitted.

TNSNAMES.ora
A sample configuration of a failover would look like following

xxxx =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (LOAD_BALANCE = ON)
      (FAILOVER = ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = xxxvip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = xxxvip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = xxxvip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = xxx)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = PRECONNECT)
      )
    )
  )
  • As shown here you can pre-establish a connection to reduce the failover time. You can do this with the METHOD=preconnect option instead of the METHOD=basic definition. Remember each establish connection will consume memory on the host database node so size the system to cope.

TYPE:
TAF supports three types of failover types

1.SESSION failover – If a user’s connection is lost, SESSION failover establishes a new session automatically created for the user on the backup node. This type of failover does not attempt to recover selects. This failover is ideal for OLTP (online transaction processing) systems, where transactions are small.

2.SELECT failover – If the connection is lost, Oracle Net establishes a connection to another node and re-executes the SELECT statements with cursor positioned on the row on which it was positioned prior to the failover. This mode involves overhead on the client side and Oracle NET keeps track of SELECT statements. This approach is best for data warehouse systems, where the transactions are big and complex

3.NONE: This setting is the default and failover functionality is provided. Use this setting to prevent failover.

It’s easy to simulate, make a bunch of connections from your application, kill SMON from the o/s layer, and then run the following sql to see if a session has failed over:

SELECT username, sid,serial#, inst_id,failover_type, failover_method, failed_over
FROM gv$session WHERE username NOT IN ('SYS','SYSTEM') AND failed_over = 'YES';

 

Prestige

Prestige supports the use TAF via the use of prmanager registry paramater:
FailoverSupported

FailoverSupported (REG_DWORD)
This setting is for Oracle based systems only and enables Transaction Application Failover support. Use of TAF is applicable when using Oracle RAC via PrManager sessions.

note

TAF will not always successfully restart a query in progress. It reopens the cursors and attempts to discard rows already returned.
In order to achieve that, it performs a checksum of the to-be-discarded rows and compares that checksum against a checksum for the rows already returned.
If the checksums are different, TAF knows the discarded rows are not the same as the rows already returned.
In such a case, it will not resume returning rows and returns an error.

  • Checksum discrepancies are more likely to happen with replica databases which are not block-for-block identical.

Leave a comment