Getting Lazy

Tuesday 27 March 2012

Up until today, when a client connects, that client has built into it a definition of the root object at version 1, which is a plain empty object, {}. When the client performs some transaction that reads or modifies the root object, if the server's version of the root object is different, then everything that is reachable from the root object is sent down to the client. This was true in general: when the server has to send down an updated version of an object, it traverses that object for fields which point to other objects that the client doesn't know about, and sends those too. In fact, it sends the transitive closure.

The reason for this is pretty simple: until now, I've not had a nice way of dealing with dangling pointers. Thus if client A does:

atomize.atomically(function () {
    var a = {}, b = {}, c = {}, d = {};
    atomize.root.a = atomize.lift(a);
    atomize.root.a.b = atomize.lift(b);
    atomize.root.a.b.c = atomize.lift(c);
    atomize.root.a.b.c.d = atomize.lift(d);
});

and then client B does a transaction which reads or modifies an older version of the root object then there was no choice but to send down all 4 new objects so that the object graph could be fully populated.

This can obviously be quite wasteful: there is the possibility that client B really doesn't care about those 4 new objects: sure, it needs the most up to date version of the root object, but it was happily working away under atomize.root.differentObject which (obviously) has nothing in common with those objects now reachable from atomize.root.a.

The solution I've come up with is for the server to send down, in certain circumstances, version 0 objects. These are always plain, empty objects. Whenever you try to read from them, the client notices you're trying to read from a version 0 object, and interrupts the current transaction. It then transparently sends a retry up to the server where the transaction log says "I just read version 0 of this object". Immediately, the server notices that there is a newer version of that object, sends down the new version, and the client then restarts the transaction. Thus there's been no protocol change, and this modification is implemented entirely in terms of existing STM primitives. But there is no change to the way you write code at all.

So, in the above example, after client A has performed its transaction, client B tries some transaction which modifies the root object. This transaction fails because it was against an older version of the root object, but now, the server only sends down a version 0 object which is directly reachable from atomize.root.a, and that object is empty: none of the b, c or d objects are sent down to client B. Now, should client B now attempt a transaction which reads from this a object, for example:

atomize.atomically(function () {
    return Object.keys(atomize.root.a);
}, console.log.bind(console));

the client will spot it read from a version 0 object (a), and transparently issue a transaction up to the server which will merely cause the server to send down the full a object. The updated a object (now at version 1 or greater) will have a b field which itself will point to a version 0 object: again, we've not sent down the transitive closure of everything reachable from the full a object, merely everything directly reachable from a in one hop.

In this case, yes, it results in more round trips. But in many cases, it results in substantially less communication overhead: the test suite has more than doubled in speed as a result of this change.

The important thing to note is that there is no change to the code you write. It's simply now the case that there may be more retry operations going on under the bonnet than are indicated by your code.

Given the last blog posts, you might well be wondering how this optimisation interacts with that bug, and its fix. Well, that bug was all about a client having an older version of an object, and through a series of events having a transaction restart that was able to observe both that older object at the same time as some updated objects which together could be used to observe a violation of isolation. The key thing though is that the client already had to have the older object.

This optimisation doesn't impact the fix developed for that bug: if the client already has an object then it will be updated according to the dependency chain traversal as described, thus isolation is still enforced. What this optimisation achieves is that it causes objects managed by AtomizeJS to be brought down to the client on demand. When they are brought down, because the implementation just uses the existing retry functionality, the updates that are sent down are calculated using exactly the same mechanism as normal, thus again, the algorithm used to ensure isolation is respected is invoked.

Thus, if going from version 0 of an object to the current version, say version 3 requires that some other objects the client already knows about be updated to current versions, then that is still achieved.