Sunday, January 14, 2007

Logless transactions

A few months ago, in my blog entry Transactions, disks, and performance I went into the importance of minimizing the number of writes. Transaction logging is one of those cases where minimizing the number of writes greatly enhances performance. In this entry, I'll describe a way to avoid transaction logging altogether.

What is transaction logging? Transaction logging refers to persisting the state of a two-phase transaction so that in the event of a crash, the transaction can either be committed or rolled back (recovered). I won't go into the details of what XA is; more information about XA transactions can be found elsewhere, e.g. in Mike Spille's XA Exposed.

Let me illustrate what recovery is using a "diagram". Consider an XA two phase transaction with three Resource Managers (RMa, RMb, and RMc). To indicate what happens at what time, I'll put all actions in a table; each row corresponds to a different time.

time
RMa
RMb
RMc
Coordinator
t1
start(xid1a, TMNOFLAGS)



t2

start(xid1b, TMNOFLAGS)

t3


start(xid1c, TMNOFLAGS)
t4
end(xid1a, TMSUCCESS)



t5

end(xid1b, TMSUCCESS)

t6


end(xid1c, TMSUCCESS)
t7
prepare(xid1a)



t8

prepare(xid1b)

t9


prepare(xid1c)
t10



log
t11
commit(xid1a, false)



t12

commit(xid1b, false)

t13


commit(xid1c, false)
t14



delete from log

At t10 the transaction manager records the decision to commit to the log. Let's say that the system crashes after t10, say between t11 and t12. When the system restarts, it will call recover() on all known Resource Managers and it will read the transaction log. In the transaction log it will find that xid1x was marked for commit. Through recover() it will find that xid1b and xid1c are in doubt. It knows that these two need to be committed because of the commit decision in the log.

What happens if the system crashes before the commit decision is written to the log, for example between t8 and t9? Upon recovery, the recover() method of RMa, RMb and RMc return xid1a and xid1b (but not xid1c because prepare was not called on RMc yet). The transaction manager will rollback RMa and RMb because no commit decision was found in the log.

SeeBeyond's Logless XA Transactions

Let's take a look at the recover() method on the XAResource. This method returns an array of Xid objects. Each Xid object holds two byte[] arrays. These two arrays represent the global transaction ID and the branch qualifier. They are typically random numbers picked by the transaction manager. The Resource Managers that receive these Xids should use these objects as identifiers and return them in the recover() method unmodified.

At SeeBeyond, Jerry Waldorf and Venugopalan Venkataraman came up with an idea to use the storage space in the byte[] arrays of the Xid as a way to persist the transaction state. Here's how it works. Let's modify the above example by removing transaction logging:

time
RMa
RMb
RMc
Coordinator
t1
start(xid1a, TMNOFLAGS)



t2

start(xid1b, TMNOFLAGS)

t3


start(xid1c, TMNOFLAGS)
t4
end(xid1a, TMSUCCESS)



t5

end(xid1b, TMSUCCESS)

t6


end(xid1c, TMSUCCESS)
t7


prepare(xid1c)

t8

prepare(xid1b)

t9
prepare(xid1a)


t10


commit(xid1c, false)
t11

commit(xid1b, false)

t12
commit(xid1a, false)



A commit decision is still being made, but this decision is no longer persisted in a separate transaction log. In stead, it is persisted in xid1a. If the system finds xid1a upon recovery, it knows that a commit decision was made. If it doesn't find xid1a, it knows that a commit decision was not made. Note that the order in which both prepare and commit are called on the three Resource Managers is very important.

As in the first example, if the system crashes before a commit decision has been made, it will rollback any resources upon recovery. E.g. if the system crashes between t8 and t9, it will encounter xid1c and xid1b and will call rollback() on these because it cannot find a record of a commit-decision for xid1, i.e. it cannot find xid1a. Hence, xid1b and xid1c need to be rolled back.

If the system crashes after a commit decision has been made, for example between t10 and t11, it will find xid1b and xid1a. Since xid1a signifies a commit decision, both xid1b and xid1a should be committed.

So far so good. But how does the transaction manager know that if it encounters xidb it should look for xida to figure out if a commit decision was made? This is where the transaction manager uses the byte[] of the Xid: it stores this information in one of them.

Complicating factors

A problem in this scheme occurs when the prepare(xid1a) method returns XA_RDONLY. If that happens, commit(xid1a, false) cannot be called, and RMa will not return xid1a upon calling recover(). Recall that xid1a had special significance! Hence it is important to order the Resource Managers such that the first one on which prepare() is called, is both reliable and will not return XA_RDONLY. However, in normal EE applications, the application prescribes in which order resources are enlisted in a transaction. Hence, to use this logless transaction scheme, the application server either needs to be extended with a way to specify resources a priori, or the application server needs to be extended with a learning capability so that it knows which resources are enlisted in a particular operation so that it can pick the right resource manager to write the commit decision to.

The SeeBeyond logless transaction approach is one of the ways that transaction logging can be made less exensive. In a future blog, I'll cover additional ones.

1 comment:

Frank Kieviet said...

Hi Ludovic,


To your first question, how recover() knows about all resource managers: the transaction manager is called by the application server and asked to recover; it passes a list of XAResources to the transaction manager. The application server obtains this list of XAResources by going over all resource adapters that are deployed in the server, i.e. all global resource adapters and jdbc connection pools, as well as all resource adapters embedded in EAR files.


How can you prevent that prepare() returns XA_RDONLY? You would have to know about the resource managers. For instance, if you know the internal architecture of a particular JMS server, you can say with certainty that it won't ever return XA_RDONLY. On the other hand, Oracle is not a good candidate because it may return XA_RDONLY. Yes, having to know the internal details of the participating resource managers is a major drawback of this approach.


To your third question about performance: we haven't measured concurrent processing versus a serial approach; I can't say much about that. By the way, a concurrent approach has another interestering effect: the decrease in time between prepare() and commit() than throughput. It's another potential drawback of this approach.


Frank