Software Transactional Memory

Background

Software Transactional Memory (STM) was first proposed by Nir Shavit and Dan Touitou in 1995. Following this, academic interest in transactional memory increased substantially, and a large number of papers followed, covering different approaches, languages, and implementation choices. Interest peaked in the early 2000s, and Microsoft worked hard on adding STM support throughout the .NET framework, only to later abandon it. There are currently actively used and developed implementations of STM for the C++ Boost library and for Haskell amongst others.

STM is primarily aimed at trying to ease the difficulties of writing large-scale concurrent programs. In concurrent programs, where concurrent access is required to the same data-structures, locking is typically used to ensure multiple threads can't proceed through critical regions concurrently. If multiple threads were to progress through critical regions at the same time, they may be able to see the partial progress and intermediate states caused by other threads. Thus the locks are used to ensure the atomic, consistent and isolated properties.

Manual locks are notorious for being difficult to get right. If you use many locks (fine-grained locking) where each lock is used to protect a small data-structure then obtaining all the locks in such a way as to avoid potentially causing a deadlock can be tricky. If you have few locks (coarse-grained locking) then whilst it's much easier to avoid deadlock, you run the risk of losing available parallelism because each lock protects too large a data-structure.

STM avoids the programmer having to take any locks explicitly at all. Instead, in a transaction, the STM engine is either directly told which data-structures are being accessed (Boost.STM does this) or is able to infer it (either by the type-system, as with the Haskell implementation; or at run-time by proxies and other techniques, as with AtomizeJS). Dereferencing counts as a read of an object (so a.x is a read of object a); assignment counts as a write (so a.x = y is a write to object a). For example:

if (root.calendar.dayOfMonth === 1 && root.calendar.month === "January") {
    root.global.login.motd = "Happy New Year!";
}

Here, we have a read set of at least root and root.calendar. If it did happen to be the 1st of January, then we would further add to the read set root.global, and we would have a write set of root.global.login. Note that we do not have root.calendar.dayOfMonth in the read set: we did not try and inspect any children of dayOfMonth. (Equally, if you assign to root.calendar.dayOfMonth then the write set would include root.calendar, not root.calendar.dayOfMonth, and so the reads and writes match.)

AtomizeJS

There are then several ways to implement STM. What follows is an outline of how AtomizeJS works. This is not dissimilar to other implementations.

AtomizeJS keeps a transaction log. This contains all the reads and writes performed in the current transaction. Writes in the current transaction are not performed against the real object, but merely cached in the transaction log, and then reads must first check the transaction log in order to ensure the correct value is returned. This approach means that if the transaction needs to be abandoned, the transaction log is just thrown away and nothing else needs doing.

When the transaction function completes, the transaction log contains the net effect of the transaction function. This log needs to be verified and committed. Every object is versioned, and the log captures the version of every object that has been read or written to. This information is then passed to the server which tests to see whether the server-copy of those objects is the same version as the client used, or whether some other client has modified any of the objects in the meantime, and thus changed the version numbers. If there are any discrepancies between the version numbers in the transaction log and on the server then the client is sent updates for the objects for which it holds out-of-date copies, and is then told to restart the transaction. Eventually, the transaction will be performed against the current values of all relevant objects, the server will verify this and will then apply the writes from the transaction log to its copies of those objects, and bump their version numbers. It will then confirm the commit back to the client which will similarly apply the writes to the real objects and bump those version numbers. Finally, the client will invoke the transaction function continuation.

Currently, because the server is implemented simply in NodeJS, there is no concurrency on the server. As such, no locks need to be taken. However, with a more sophisticated concurrent implementation, during the commit on the server, locks on every object read or written to would need to be taken.

Limitations

The key attraction of STM is that you don't need to think about which locks to take: you can just write a function, pass it to atomically and expect it to be run with all the correct properties. Provided the function only accesses and manipulates data-structures which are controlled by the STM engine, this is indeed true: the transaction log will capture all reads and writes as required, and the transaction will be managed correctly.

However, if the transaction function manipulates objects outside of STM engine, or performs other operations which have side effects (such as as disk access, or network access, or launching the missiles) then there's really no way for the transaction log to capture and simulate the effect of this. For example, consider a function which tries to send a network packet to a remote host, and receive a response from that remote host. Doing this in a transaction is nonsensical: if the transaction throws an exception and has to abort, how do you undo the send, or the receive?

Thus STM does not make the impossible possible: without making the entire universe transactional, it would not be possible to achieve such things. It would seem that STM was billed to eventually be able to solve such problems, but inevitably was unable to. This is one of the reasons, amongst others, that Microsoft abandoned its STM in .NET.

However, when these limitations are understood (and indeed, if you think about what you would expect to be possible if you were manually using locks then it's all rather obvious), STM is a very useful and valuable technology that does make writing concurrent programs easier: the STM engine is able to detect collisions at the finest-grained level and, experiments have shown that with concurrent implementations of the engine, is able to perform extremely well. In the case of AtomizeJS, whilst the current server is not multi-threaded, clients still perform the transaction functions in parallel before sending the transaction log to the server. Thus it's still possible to see how parallel execution can occur.

We believe the AtomizeJS STM engine, in combination with the distribution of objects to clients in a safe way (i.e. respecting the atomic, consistent and isolated properties) creates a powerful mechanism for writing sophisticated applications in JavaScript.