Progress, Presentation Video, and Slides

Wednesday 06 June 2012

A couple of weeks ago I gave a talk at the London Node User Group. I thought it went pretty well and the reaction both in person and on twitter afterwards seemed pretty positive.

The video and slides are now available online.

LNUG May 2012 - Matthew Sackman from Forward Technology on Vimeo.

Recently I've really been doing general improvements, tidying up of the code and refactorings. Possibly not any exciting new features, but the basic code is in much better shape and is much much closer to properly enforcing the properties of JavaScript than before (e.g. things like only being able to delete properties that are marked configurable etc). Also a fairly major bug in the way retry was implemented has been found and fixed, and I've done a fair amount of testing with the translation tool and generally trying to ensure that the claims that I make about compatibility really are bourne out in practise. Of course, IE is the most challenging target, but it is on the whole managable (at least, up-to-date versions of IE are).

All of which means that I've bumped the versions of both the client and server to 0.4.9. Hopefully after some more testing and minor bug fixes I should be able to either move to a 0.5.0 or maybe even go to 0.9.0 to target a possible 1.0.0 release. Thus any feedback, comments, questions or bug reports are very eagerly requested!

Talk at London Node User Group

Tuesday 22 May 2012

Tomorrow (23rd May 2012), I'm speaking about AtomizeJS at the London Node User Group. This will be a talk covering what AtomizeJS is, what problems it solves, why you should use it, and what you can use it for. It would be great to have a good audience, so if you're at all curious about AtomizeJS (or a seasoned user of it!), please come along. Apparently there'll be beers and pizza from 6:30pm!

Over the last few days I've been writing various demos using AtomizeJS for this talk, which has been great as it's exposed lots of bugs (which I've fixed), and again just reinforced that sometimes, hard problems are just plain hard to solve!

Development has been a little slower over the last month as I've been involved in various other projects. I spent an awful lot of time hacking in a security layer for AtomizeJS. All the hooks are now there, so you should be able to implement whatever security policies you want, but it's actually not clear to me how you would want to express such security policies. Or rather, whilst some ideas are fairly attractive to me, managing to achieve them in JavaScript is rather more painful than it ought to be. Having read around the subject, it's clear that since almost no one else tries to solve this problem, it's considered a hard problem. So I've left the hooks in but am yet to try to make a big deal out of it.

The client now has some reconnection logic in it, so if it does lose connection to the server, it should attempt to reconnect. However, currently that's a little buggy because SockJS doesn't expose a disconnect due to packet drops - there's a bug filed and hopefully it should Just Work when the next version of SockJS comes out. Lots and lots of other little bugs have been fixed: it turned out that much of AtomizeJS was broken when writing nodejs-side code as a client of AtomizeJS, but thankfully in developing some demos, that's been exposed, and fixed.

Getting Lazy

Tuesday 27 March 2012

Up until today, when a client connects, that client has built into it a definition of the root object at version 1, which is a plain empty object, {}. When the client performs some transaction that reads or modifies the root object, if the server's version of the root object is different, then everything that is reachable from the root object is sent down to the client. This was true in general: when the server has to send down an updated version of an object, it traverses that object for fields which point to other objects that the client doesn't know about, and sends those too. In fact, it sends the transitive closure.

The reason for this is pretty simple: until now, I've not had a nice way of dealing with dangling pointers. Thus if client A does:

atomize.atomically(function () {
    var a = {}, b = {}, c = {}, d = {};
    atomize.root.a = atomize.lift(a);
    atomize.root.a.b = atomize.lift(b);
    atomize.root.a.b.c = atomize.lift(c);
    atomize.root.a.b.c.d = atomize.lift(d);
});

and then client B does a transaction which reads or modifies an older version of the root object then there was no choice but to send down all 4 new objects so that the object graph could be fully populated.

This can obviously be quite wasteful: there is the possibility that client B really doesn't care about those 4 new objects: sure, it needs the most up to date version of the root object, but it was happily working away under atomize.root.differentObject which (obviously) has nothing in common with those objects now reachable from atomize.root.a.

The solution I've come up with is for the server to send down, in certain circumstances, version 0 objects. These are always plain, empty objects. Whenever you try to read from them, the client notices you're trying to read from a version 0 object, and interrupts the current transaction. It then transparently sends a retry up to the server where the transaction log says "I just read version 0 of this object". Immediately, the server notices that there is a newer version of that object, sends down the new version, and the client then restarts the transaction. Thus there's been no protocol change, and this modification is implemented entirely in terms of existing STM primitives. But there is no change to the way you write code at all.

So, in the above example, after client A has performed its transaction, client B tries some transaction which modifies the root object. This transaction fails because it was against an older version of the root object, but now, the server only sends down a version 0 object which is directly reachable from atomize.root.a, and that object is empty: none of the b, c or d objects are sent down to client B. Now, should client B now attempt a transaction which reads from this a object, for example:

atomize.atomically(function () {
    return Object.keys(atomize.root.a);
}, console.log.bind(console));

the client will spot it read from a version 0 object (a), and transparently issue a transaction up to the server which will merely cause the server to send down the full a object. The updated a object (now at version 1 or greater) will have a b field which itself will point to a version 0 object: again, we've not sent down the transitive closure of everything reachable from the full a object, merely everything directly reachable from a in one hop.

In this case, yes, it results in more round trips. But in many cases, it results in substantially less communication overhead: the test suite has more than doubled in speed as a result of this change.

The important thing to note is that there is no change to the code you write. It's simply now the case that there may be more retry operations going on under the bonnet than are indicated by your code.

Given the last blog posts, you might well be wondering how this optimisation interacts with that bug, and its fix. Well, that bug was all about a client having an older version of an object, and through a series of events having a transaction restart that was able to observe both that older object at the same time as some updated objects which together could be used to observe a violation of isolation. The key thing though is that the client already had to have the older object.

This optimisation doesn't impact the fix developed for that bug: if the client already has an object then it will be updated according to the dependency chain traversal as described, thus isolation is still enforced. What this optimisation achieves is that it causes objects managed by AtomizeJS to be brought down to the client on demand. When they are brought down, because the implementation just uses the existing retry functionality, the updates that are sent down are calculated using exactly the same mechanism as normal, thus again, the algorithm used to ensure isolation is respected is invoked.

Thus, if going from version 0 of an object to the current version, say version 3 requires that some other objects the client already knows about be updated to current versions, then that is still achieved.

Interesting Bug Fixed

Tuesday 20 March 2012

In my last post, I documented the discovery of what I thought was a subtle bug. After some thought over the weekend, I eventually decided that it was actually rather unsurprising: indeed the surprise was that it had taken so long to come to light.

The bug boils down to the following:

  • When you commit a transaction, the transaction log is verified. The verification ensures that all the objects read from and written to were done so at their latest, most up to date version. Object versions are only advanced when a transaction successfully commits, and given the server is single threaded, it's thus easy to see that this would lead to consistent modifications to the object state on the server, where consistent is defined as respecting the atomic and isolated properties.

  • Where it goes wrong though is when that transaction can't be committed because someone else has, in the meantime, modified some of the same objects that the current transaction has modified. I.e. the server detects that the transaction log documents modifications of old versions of objects. At this point, the transaction is rejected, and a set of updates is sent to the client. These updates are there to allow the client to update its own copies of these objects so that it can then restart the transaction and have it run against the most up-to-date versions of these objects, eventually sending back to the server a new transaction log, which hopefully will then commit.

  • The bug was that the updates that were sent down to the client contained only the objects that had both been changed and were logged in the transaction log. At first glance, that might seem sound, but of course, when the transaction is restarted, it might choose to read and write different objects: you just have no idea. So the first time around, the transaction may modify objects a and b (and incidentally, the client already has an old copy of c which isn't touched by this transaction). If someone else changes a in the meantime, the transaction will fail, and the new version of a will be sent down to the client. This time, the transaction, on seeing the new version of a, instead modifies objects a and c. But if that middle transaction, the one that only modified a, instead modifies both a and c, then our final transaction will see the new value of a but the old value of c: the rejected transaction modified only a and b, so the server didn't think to send the client the new version of c. But that means that within the transaction, you can observe a violation of isolation: you see some objects at versions after a transaction, whilst other objects at versions before the same transaction.

I believe I have now fixed this bug. Each client has a representation on the server, and that representation tracks which objects and at what version number have been sent to the client. When a transaction commits, we now build an object that maps every object modified to its new version number, and every object modified manages a linked list of these. These are the dependencies: they say that "at the point at which a was modified to version 3, c was also modified to version 7". The linked list then means that if a client representation knew that it previously sent version 4 of c to the client and it now wants to send version 7, it must walk the linked list from its current location (corresponding to version 4) all the way up to version 7. This will allow it to discover that in the course of forming versions 5, 6 and 7, several other objects were modified, and so updates for these objects must also be sent down to the client. The transitive closure of this operation must be found.

The final trick is to ensure that these linked lists are bounded in length. Even though it only needs to be a singly-linked list, sadly, we have to keep hold of the oldest end of it too (if we didn't have to keep track of the oldest end, we could just let the list grow and grow and allow GC to tidy it up as necessary). This is because we may have to send the object to a client that has never previously seen this object at all. We're going to send the latest version of the object (indeed, we never keep track of anything other than the latest version), but for the same reasons as above, that latest version may very well only make sense in the context of updates to other objects, or even sending down other objects the client has never seen before. Thus we have to be able to find out every object that has ever been modified at the same time as our object-to-send, and make sure the client has up-to-date versions of all of those too (again, form the transitive closure). Clearly, over time, these linked lists could become very long indeed.

However, it's possible periodically to roll an object's linked list up: to amalgamate all the entries into one, and then shrink the list down to a single entry and start appending from there again. The intuition is that if one transaction pushed a to version 3 and c to version 7, and the next transaction pushed a to version 4 and b to version 6, then for a (and a alone), this can be combined to a single entry pushing a to version 4, b to 6, and c to 7. After all, we will never want to send version 3 of a to a client - only the most recent version ever gets sent, which at this point would be version 4.

The next question would then be: Why bother with the list at all - why not just keep an amalgamated set of dependencies for every object? The answer is that if that set is very large and most changes to it are for single elements, then client representations performing this update algorithm will have to iterate through every entry, only eventually to find a single relevant change to send down. By keeping the linked list, the client representation instead records its location in the list, and changes to the amalgamated set correspond to new list entries of exactly the change alone. Essentially the list stores diffs, and thus avoids client representations having to recalculate diffs on every update: they either use the diff directly, or have to calculate it only infrequently after a roll-up has occurred.

This fix appears in version 0.0.8 of the AtomizeJS node server.

Interesting Bug

Friday 16 March 2012

I've spent the whole of today chasing down a bug. I've finally found what's causing it, yet currently have no idea how to solve it. I think it's a rather amazing bug which shows some very interesting behaviour.

Over the last couple of days, I've been building out a test suite so that as I add additional features, I can have a degree of confidence I've not obviously broken things. This morning I wrote a test which deliberately has a large number of transactions that collide with each other: indeed overall progress is slow because of the huge contention created. The basic idea is we start with a global a object, and structure (this is all a bit simplified, but not by too much):

a = {0: {num: 1000},
     1: {num: 1000},
     2: {num: 1000}}

Then, every transaction decrements the num field in every object it finds within a. Just to shake things up a bit more, randomly, each transaction can replace one of the inner objects. The replacement will also contain a num field and the correct value. The test stops when all the fields reach 0.

I set up several clients connected to the same AtomizeJS server, and each client ran several transactions that looked like:

var fun;
fun = function (c) {
    c.atomically(function () {
        if (undefined === a) {
            c.retry();
        }
        var keys = Object.keys(a),
            x, field, n, obj;
        for (x = 0; x < keys.length; x += 1) {
            field = keys[x];
            if (undefined === n) {
                n = a[field].num;
                if (0 === n) {
                    return n;
                }
            } else if (n !== a[field].num) {
                throw ("All fields should have the same number: " +
                       n + " vs " + a[field].num);
            }
            if (0.5 < Math.random()) {
                obj = c.lift({});
                obj.num = n;
                a[field] = obj;
            }
            a[field].num -= 1;
        }
        return n;
    }, function (n) {
        if (n > 0) {
            fun(c); // recurse
        } else {
            // Test done!
        }
    });
}

And then, with various different AtomizeJS clients, invoke fun with the client, and set up a suitable a object managed by AtomizeJS and known to all the clients.

Turns out, we hit the exception within the transaction. Yup, within the transaction, we can violate the isolation and atomic properties. Even more interesting was the minimum requirements for provoking the bug: you need two clients (i.e. two instances of Atomize) and three transactions in flight at the same time (i.e. one client must run multiple copies of the transaction at the same time - or at least as close as you can get in JavaScript: when one transaction commits, and goes to the network to send the transaction log to the server, whilst waiting for the response, it then goes and starts the other transaction). Running all three transactions in the same client can't provoke it, nor can running one transaction each in three different clients. If you rewrite the test so that it does the throw based on a test in the continuation (i.e. after each transaction has committed) then it never goes wrong, which means that the violation is eliminated when the transaction commits. But even so, within a transaction, you should not be able to see the partial effects of other transactions. The random changing of objects within a is crucial: if you don't replace the objects, the bug doesn't appear.

So what on earth is going on?

Two clients: c1 and c2. Three transactions: t1, t2 and t3. For simplicity, I'm going to define these transactions precisely as:

t1 = function () {
    a.0.num -= 1;
    a.1 = atomize.lift({num: a.0.num});
    a.2.num -= 1;
};
t2 = function () {
    a.0.num -= 1;
    a.1.num -= 1;
    a.2.num -= 1;
};
t3 = t2;

Initially, both clients are aware of a as shown at the top of this post. The a, and 0, 1 and 2 objects are all at version 1 of themselves, and this is known to both clients.

First, c1 runs t1 followed immediately by t2: i.e. whilst the transaction log of t1 is in flight to the server, c1 starts running t2. Thus both t1 and t2 get first run on the original objects, and so both will try to change the values of 1000 to 999.

The transaction t1 will have a read set of a, 0 and 2, and a write set of a, 0 and 2. This transaction goes to the server, commits successfully and comes back to c1 which updates its own copies of the objects. The object at a.1 has changed: the new object is at version 1 (along with the previous old object that used to be reachable from a.1), whilst all the other objects (a, 0 and 2) are now at version 2. The num fields have values of 999 now, though the original object that was at a.1 has a num value of 1000 still. Only the server and c1 know all this.

Whilst that was going on, c2 runs t3. This transaction is initially run against the original objects (version 1 of everything, with num fields at 1000). The transaction log arrives at the server after t1, and gets rejected because t1 committed successfully, and changed the versions. The server sends back to c2 version 2 of a, 0 and 2, along with version 1 of the new object reachable from a.1 (but all with num fields of 999). The client, c2 now restarts t3. This time t3 has a transaction log with a read set of a, 0, and 2 at version 2, plus the new 1 at version 1, and a write set of 0 (version 2), 1 (version 1 - remember: the new replacement object), and 2 (version 2). This goes to the server and commits correctly. The version numbers are bumped accordingly: both c2 and the server agree that a, 0 and 2 are now at version 3, the old original 1 (which is no longer reachable) is at version 1, and the new 1 is at version 2. The num fields are all now at 998, except for the old a.1 object that's still at 1000, but it's unreachable, so it doesn't matter.

Now, the transaction log from t2 arrives at the server. It was run by c1 a while ago, indeed against the original objects (version 1 of everything - num fields at 1000), but the server's been kept busy and is only now getting around to dealing with it. Blame the network. The transaction log of t2 contains reads of version 1 of a, 0, 1 (the original 1) and 2. The server notices that these are old versions and rejects the transaction. But here comes the problem: the server sends down the current versions of a, 0 and 2 (all version 3 - num fields at 998). It does not send down the current version of a.1 because the transaction log from t2 had nothing to do with the current object at a.1: it only mentions the old original object that was at a.1, and no one's modified that object: it still has a num field at 1000.

So now the client c1 applies those updates, and restarts t2. Now t2 sees that a.0 and a.1 have num fields at 998, but a.1.num is actually at 999, because t2 (and indeed c1 as a whole) has not seen the effect of t3 (run by c2) on a.1: it's only seen the effect of t1. Thus isolation is broken: t2 is seeing parts of the world from before t3 committing and other parts from afterwards.

When t2 now commits again, the server will again reject it because this time, t2's transaction log will contain a read and a write of the new object at a.1, but at version 1, not version 2. So the server will reject it, send the update to the new a.1 which the client c1 will apply to get its a.1 up to version 2, and finally t2 will be restarted and this time will commit successfully.

Race conditions like these are always fun to unravel. Even more fun is that I currently have no idea how to solve this: it's almost like we need some sort of dependency chain to say that "if the server is going to send down version X of object J, then it must also send down version Y of object K". In many ways, this seems to be rather like a cache invalidation problem. I wonder how other STM systems solve this, whether they don't, or whether the problem really only appears due to the distributed nature of AtomizeJS. I think it might be the latter.

Translation

Monday 20 February 2012

Broad browser compatibility is here now: the current versions of all the major browsers now work with AtomizeJS. The cost though is the translation tool.

In the end there was no choice: it has to be a server-side and/or static translation of JavaScript, or just write to an extended API and pay the cost up front. I had wondered about doing a dynamic browser-side on-demand translation: after all, you should get the source code of any given function with fun.toString(). The problem though is that after you've done the translation, you have to eval() it back to a function, but now you're in a different environment. So if you previously had:

var a = 5;
function myFun () {
    return a;
}

then yes, you can get the source of myFun and you can transform it as necessary. But you can't then re-eval() it back to a function and have it capture the same value of a as before: there's no way to extract the bound variables the closure captured the first time in order to re-present them to the eval().

So instead we have the translation tool, and yes, it's not ideal, and you may have to treat external libraries too (though actually I believe there probably won't be too many cases where that's necessary). The code you get back is readable and nicely formatted, and makes the transformations applied quite obvious. Most importantly, it's enough to show that this will work with older browsers and that the AtomizeJS itself is a viable technology for writing applications today.

Or at least that's what I think! As ever, feedback is very welcome.

Dropping the dependencies

Thursday 08 December 2011

One of my highest priorities right now is to increase browser compatibility. Supporting IE6 probably isn't going to happen, and even IE7 is unlikely. IE8 would certainly be a nice to have given that as of July 2011, about 60% of IE users are using IE8, though that will change. I'd really love to avoid having to write different versions of JavaScript for different browsers, but we shall see...

First up is getting rid of the dependency on WeakMaps or Maps in general. Simple Maps and Sets are due to be in the next version of JavaScript. Without them, you have problems telling the difference between certain types of key, because when you do a plain obj[key] = val, the field name created is just the string representation of key. Thus 1 and "1" are the same thing, and every object is [object Object] - hardly very useful. I need objects as keys.

This one I've managed to work around: I've managed to build an implementation of a Map. Inevitably, it's a compromise, and it ends up storing a unique ID in every object it touches. Currently, it does that via defineProperty (in order to make it non-deletable, non-rewritable and non-enumerable) which is broken in IE8, but I hope to be able to work around that.

The much bigger problem is working around the lack of Proxies in older browsers. Initially, I'll have to build the API out so that you can drive the proxy manually. This will mean that instead of writing code like a.b.c = x.y you'll have to write code like a.get('b').set('c', x.get('y')). Yup, it's pretty grim, and I doubt that'll be the worst of it. I'm hoping to have some sort of mechanised translation, but seeing as you can write

function MyFun (f) {
    atomize.atomically(f);
}

you're either going to have to do dynamic translation of any f that arrives there (which is OK to a point - f.toString() should give you the source code of f (which I could then parse, build an AST, analyse and rewrite), unless it's a browser built-in), or you're going to have to do whole program analysis in advance and really prepare two different versions of every function and then select at run time which version to run based on whether or not you're inside a transaction. I've not made up my mind which one I prefer - comments welcome - but the first step will be to build out the proxy API so that these things can be driven manually, even if the syntax is pretty ugly.

Starting to decloak

Monday 05 December 2011

Under 1 man-month of development done; AtomizeJS is public, but there is much to do. The most major limitation is that of browser support: currently only Firefox 8 and Chrome 17 are supported. This is because they support experimental features of the next version of JavaScript which AtomizeJS currently depends upon. I have some ideas about how to support other browsers and doing this is my first priority, but it's not the only thing on my to-do list.

There is also much work to be done on:

  • Exception handling: particularly transactions that throw exceptions after they've been restarted. Having a good story on dealing with failure scenarios is very important.

  • Garbage Collection: currently, any object that has been lifted into AtomizeJS will exist on the server forever. Distributed GC is on the whole quite a challenge. I have some ideas how to solve this, but it's not a straight-forward problem.

  • Multi-node server support: it's pretty simple to imagine having multiple NodeJS servers all attached to the same AtomizeJS instance. This should be fairly simple to achieve: the lack of a SockJS client for NodeJS is the reason it doesn't work yet, but I could do a plain socket implementation.

  • Presence: it's fairly simple to build a system where a new client makes its presence known to the other clients, though there are some open questions about naming clients. It's much harder currently for other clients to know about the loss of a client. Indeed, quite what loss means may vary from application to application. Having some mechanism for being able to indicate which clients currently exist, for varying degrees of exist, would be useful for a large number of applications.

  • Security and partitioning: the focus of AtomizeJS is to make it easier to move more and more application logic to the browser side. However, there are always going to be applications which need to have some server-side component. One of the likely important areas here is to provide a means whereby the server can control which clients can read and write to which variables. Currently there is no security at all: any client can write and read to and from any object managed by AtomizeJS (provided they can get hold of it in the first place - objects do not have to be reachable via root). It's easy to imagine needing private objects amongst different clients and the server.

  • Libraries: AtomizeJS and STM in general provide some neat primitives. These are fairly low-level, but can be usefully combined to create more powerful patterns, for example the broadcast queue in the getting started guide. I plan to create a set of libraries which capture many of these higher-level patterns.

  • Optimisations: There are many optimisations that could be done both client and server side.

  • Alternative servers: There's no reason why the server should just be implemented in NodeJS. For performance reasons, it might be a good idea to have other implementations in other languages.

At this stage, any and all feedback is very welcome. I realise that right now, without broader browser support, few of you are going to start building your next world-changing application on top of AtomizeJS. However, for the early-adopters out there and everyone who's keen to have a play around and a quick read, I'd love to know what you think of the project as a whole, how easy you find it to write applications on AtomizeJS, whether you think the APIs make sense and so forth. Please get in touch with any thoughts you have.