Progress, Presentation Video, and Slides
Wednesday 06 June 2012
A couple of weeks ago I gave a talk at the London Node User Group. I thought it went pretty well and the reaction both in person and on twitter afterwards seemed pretty positive.
The video and slides are now available online.
LNUG May 2012 - Matthew Sackman from Forward Technology on Vimeo.
Recently I've really been doing general improvements, tidying up of
the code and refactorings. Possibly not any exciting new features, but
the basic code is in much better shape and is much much closer to
properly enforcing the properties of JavaScript than before
(e.g. things like only being able to delete properties that are marked
configurable etc). Also a fairly major bug in the way retry
was
implemented has been found and fixed, and I've done a fair amount of
testing with the translation tool and generally trying to ensure that
the claims that I make about compatibility really are bourne out in
practise. Of course, IE is the most challenging target, but it is on
the whole managable (at least, up-to-date versions of IE are).
All of which means that I've bumped the versions of both the client and server to 0.4.9. Hopefully after some more testing and minor bug fixes I should be able to either move to a 0.5.0 or maybe even go to 0.9.0 to target a possible 1.0.0 release. Thus any feedback, comments, questions or bug reports are very eagerly requested!
Talk at London Node User Group
Tuesday 22 May 2012
Tomorrow (23rd May 2012), I'm speaking about AtomizeJS at the London Node User Group. This will be a talk covering what AtomizeJS is, what problems it solves, why you should use it, and what you can use it for. It would be great to have a good audience, so if you're at all curious about AtomizeJS (or a seasoned user of it!), please come along. Apparently there'll be beers and pizza from 6:30pm!
Over the last few days I've been writing various demos using AtomizeJS for this talk, which has been great as it's exposed lots of bugs (which I've fixed), and again just reinforced that sometimes, hard problems are just plain hard to solve!
Development has been a little slower over the last month as I've been involved in various other projects. I spent an awful lot of time hacking in a security layer for AtomizeJS. All the hooks are now there, so you should be able to implement whatever security policies you want, but it's actually not clear to me how you would want to express such security policies. Or rather, whilst some ideas are fairly attractive to me, managing to achieve them in JavaScript is rather more painful than it ought to be. Having read around the subject, it's clear that since almost no one else tries to solve this problem, it's considered a hard problem. So I've left the hooks in but am yet to try to make a big deal out of it.
The client now has some reconnection logic in it, so if it does lose connection to the server, it should attempt to reconnect. However, currently that's a little buggy because SockJS doesn't expose a disconnect due to packet drops - there's a bug filed and hopefully it should Just Work when the next version of SockJS comes out. Lots and lots of other little bugs have been fixed: it turned out that much of AtomizeJS was broken when writing nodejs-side code as a client of AtomizeJS, but thankfully in developing some demos, that's been exposed, and fixed.
Getting Lazy
Tuesday 27 March 2012
Up until today, when a client connects, that client has built into it
a definition of the root
object at version 1, which is a plain
empty object, {}
. When the client performs some transaction that
reads or modifies the root
object, if the server's version of the
root
object is different, then everything that is reachable from
the root
object is sent down to the client. This was true in
general: when the server has to send down an updated version of an
object, it traverses that object for fields which point to other
objects that the client doesn't know about, and sends those too. In
fact, it sends the transitive closure.
The reason for this is pretty simple: until now, I've not had a nice way of dealing with dangling pointers. Thus if client A does:
atomize.atomically(function () {
var a = {}, b = {}, c = {}, d = {};
atomize.root.a = atomize.lift(a);
atomize.root.a.b = atomize.lift(b);
atomize.root.a.b.c = atomize.lift(c);
atomize.root.a.b.c.d = atomize.lift(d);
});
and then client B does a transaction which reads or modifies an
older version of the root
object then there was no choice but to
send down all 4 new objects so that the object graph could be fully
populated.
This can obviously be quite wasteful: there is the possibility that
client B really doesn't care about those 4 new objects: sure, it
needs the most up to date version of the root
object, but it was
happily working away under atomize.root.differentObject
which
(obviously) has nothing in common with those objects now reachable
from atomize.root.a
.
The solution I've come up with is for the server to send down, in certain circumstances, version 0 objects. These are always plain, empty objects. Whenever you try to read from them, the client notices you're trying to read from a version 0 object, and interrupts the current transaction. It then transparently sends a retry up to the server where the transaction log says "I just read version 0 of this object". Immediately, the server notices that there is a newer version of that object, sends down the new version, and the client then restarts the transaction. Thus there's been no protocol change, and this modification is implemented entirely in terms of existing STM primitives. But there is no change to the way you write code at all.
So, in the above example, after client A has performed its
transaction, client B tries some transaction which modifies the
root
object. This transaction fails because it was against an
older version of the root
object, but now, the server only sends
down a version 0 object which is directly reachable from
atomize.root.a
, and that object is empty: none of the b
, c
or
d
objects are sent down to client B. Now, should client B now
attempt a transaction which reads from this a
object, for example:
atomize.atomically(function () {
return Object.keys(atomize.root.a);
}, console.log.bind(console));
the client will spot it read from a version 0 object (a
), and
transparently issue a transaction up to the server which will merely
cause the server to send down the full a
object. The updated a
object (now at version 1 or greater) will have a b
field which
itself will point to a version 0 object: again, we've not sent down
the transitive closure of everything reachable from the full a
object, merely everything directly reachable from a
in one hop.
In this case, yes, it results in more round trips. But in many cases, it results in substantially less communication overhead: the test suite has more than doubled in speed as a result of this change.
The important thing to note is that there is no change to the code you
write. It's simply now the case that there may be more retry
operations going on under the bonnet than are indicated by your code.
Given the last blog posts, you might well be wondering how this optimisation interacts with that bug, and its fix. Well, that bug was all about a client having an older version of an object, and through a series of events having a transaction restart that was able to observe both that older object at the same time as some updated objects which together could be used to observe a violation of isolation. The key thing though is that the client already had to have the older object.
This optimisation doesn't impact the fix developed for that bug: if
the client already has an object then it will be updated according to
the dependency chain traversal
as described, thus isolation
is still enforced. What this optimisation achieves is that it causes
objects managed by AtomizeJS to be brought down to the client on
demand. When they are brought down, because the implementation just
uses the existing retry
functionality, the updates that are sent
down are calculated using exactly the same mechanism as normal, thus
again, the algorithm used to ensure isolation is respected is invoked.
Thus, if going from version 0 of an object to the current version, say version 3 requires that some other objects the client already knows about be updated to current versions, then that is still achieved.
Interesting Bug Fixed
Tuesday 20 March 2012
In my last post, I documented the discovery of what I thought was a subtle bug. After some thought over the weekend, I eventually decided that it was actually rather unsurprising: indeed the surprise was that it had taken so long to come to light.
The bug boils down to the following:
When you commit a transaction, the transaction log is verified. The verification ensures that all the objects read from and written to were done so at their latest, most up to date version. Object versions are only advanced when a transaction successfully commits, and given the server is single threaded, it's thus easy to see that this would lead to consistent modifications to the object state on the server, where consistent is defined as respecting the atomic and isolated properties.
Where it goes wrong though is when that transaction can't be committed because someone else has, in the meantime, modified some of the same objects that the current transaction has modified. I.e. the server detects that the transaction log documents modifications of old versions of objects. At this point, the transaction is rejected, and a set of updates is sent to the client. These updates are there to allow the client to update its own copies of these objects so that it can then restart the transaction and have it run against the most up-to-date versions of these objects, eventually sending back to the server a new transaction log, which hopefully will then commit.
The bug was that the updates that were sent down to the client contained only the objects that had both been changed and were logged in the transaction log. At first glance, that might seem sound, but of course, when the transaction is restarted, it might choose to read and write different objects: you just have no idea. So the first time around, the transaction may modify objects
a
andb
(and incidentally, the client already has an old copy ofc
which isn't touched by this transaction). If someone else changesa
in the meantime, the transaction will fail, and the new version ofa
will be sent down to the client. This time, the transaction, on seeing the new version ofa
, instead modifies objectsa
andc
. But if that middle transaction, the one that only modifieda
, instead modifies botha
andc
, then our final transaction will see the new value ofa
but the old value ofc
: the rejected transaction modified onlya
andb
, so the server didn't think to send the client the new version ofc
. But that means that within the transaction, you can observe a violation of isolation: you see some objects at versions after a transaction, whilst other objects at versions before the same transaction.
I believe I have now fixed this bug. Each client has a representation
on the server, and that representation tracks which objects and at
what version number have been sent to the client. When a transaction
commits, we now build an object that maps every object modified to its
new version number, and every object modified manages a linked list of
these. These are the dependencies: they say that "at the point at
which a
was modified to version 3, c
was also modified to version
7". The linked list then means that if a client representation knew
that it previously sent version 4 of c
to the client and it now
wants to send version 7, it must walk the linked list from its current
location (corresponding to version 4) all the way up to version
7. This will allow it to discover that in the course of forming
versions 5, 6 and 7, several other objects were modified, and so
updates for these objects must also be sent down to the client. The
transitive closure of this operation must be found.
The final trick is to ensure that these linked lists are bounded in length. Even though it only needs to be a singly-linked list, sadly, we have to keep hold of the oldest end of it too (if we didn't have to keep track of the oldest end, we could just let the list grow and grow and allow GC to tidy it up as necessary). This is because we may have to send the object to a client that has never previously seen this object at all. We're going to send the latest version of the object (indeed, we never keep track of anything other than the latest version), but for the same reasons as above, that latest version may very well only make sense in the context of updates to other objects, or even sending down other objects the client has never seen before. Thus we have to be able to find out every object that has ever been modified at the same time as our object-to-send, and make sure the client has up-to-date versions of all of those too (again, form the transitive closure). Clearly, over time, these linked lists could become very long indeed.
However, it's possible periodically to roll an object's linked list
up: to amalgamate all the entries into one, and then shrink the list
down to a single entry and start appending from there again. The
intuition is that if one transaction pushed a
to version 3 and c
to version 7, and the next transaction pushed a
to version 4 and b
to version 6, then for a
(and a
alone), this can be combined to a
single entry pushing a
to version 4, b
to 6, and c
to 7. After
all, we will never want to send version 3 of a
to a client - only
the most recent version ever gets sent, which at this point would be
version 4.
The next question would then be: Why bother with the list at all - why not just keep an amalgamated set of dependencies for every object? The answer is that if that set is very large and most changes to it are for single elements, then client representations performing this update algorithm will have to iterate through every entry, only eventually to find a single relevant change to send down. By keeping the linked list, the client representation instead records its location in the list, and changes to the amalgamated set correspond to new list entries of exactly the change alone. Essentially the list stores diffs, and thus avoids client representations having to recalculate diffs on every update: they either use the diff directly, or have to calculate it only infrequently after a roll-up has occurred.
This fix appears in version 0.0.8 of the AtomizeJS node server.
Interesting Bug
Friday 16 March 2012
I've spent the whole of today chasing down a bug. I've finally found what's causing it, yet currently have no idea how to solve it. I think it's a rather amazing bug which shows some very interesting behaviour.
Over the last couple of days, I've been building out a test suite so
that as I add additional features, I can have a degree of confidence
I've not obviously broken things. This morning I wrote a test which
deliberately has a large number of transactions that collide with each
other: indeed overall progress is slow because of the huge contention
created. The basic idea is we start with a global a
object, and
structure (this is all a bit simplified, but not by too much):
a = {0: {num: 1000},
1: {num: 1000},
2: {num: 1000}}
Then, every transaction decrements the num
field in every object it
finds within a
. Just to shake things up a bit more, randomly, each
transaction can replace one of the inner objects. The replacement will
also contain a num
field and the correct value. The test stops when
all the fields reach 0
.
I set up several clients connected to the same AtomizeJS server, and each client ran several transactions that looked like:
var fun;
fun = function (c) {
c.atomically(function () {
if (undefined === a) {
c.retry();
}
var keys = Object.keys(a),
x, field, n, obj;
for (x = 0; x < keys.length; x += 1) {
field = keys[x];
if (undefined === n) {
n = a[field].num;
if (0 === n) {
return n;
}
} else if (n !== a[field].num) {
throw ("All fields should have the same number: " +
n + " vs " + a[field].num);
}
if (0.5 < Math.random()) {
obj = c.lift({});
obj.num = n;
a[field] = obj;
}
a[field].num -= 1;
}
return n;
}, function (n) {
if (n > 0) {
fun(c); // recurse
} else {
// Test done!
}
});
}
And then, with various different AtomizeJS clients, invoke fun
with
the client, and set up a suitable a
object managed by AtomizeJS and
known to all the clients.
Turns out, we hit the exception within the transaction. Yup,
within the transaction, we can violate the isolation and atomic
properties. Even more interesting was the minimum requirements for
provoking the bug: you need two clients (i.e. two instances of
Atomize
) and three transactions in flight at the same time (i.e. one
client must run multiple copies of the transaction at the same time
- or at least as close as you can get in JavaScript: when one
transaction commits, and goes to the network to send the transaction
log to the server, whilst waiting for the response, it then goes and
starts the other transaction). Running all three transactions in the
same client can't provoke it, nor can running one transaction each in
three different clients. If you rewrite the test so that it does the
throw based on a test in the continuation (i.e. after each transaction
has committed) then it never goes wrong, which means that the
violation is eliminated when the transaction commits. But even so,
within a transaction, you should not be able to see the partial
effects of other transactions. The random changing of objects within
a
is crucial: if you don't replace the objects, the bug doesn't
appear.
So what on earth is going on?
Two clients: c1 and c2. Three transactions: t1, t2 and t3. For simplicity, I'm going to define these transactions precisely as:
t1 = function () {
a.0.num -= 1;
a.1 = atomize.lift({num: a.0.num});
a.2.num -= 1;
};
t2 = function () {
a.0.num -= 1;
a.1.num -= 1;
a.2.num -= 1;
};
t3 = t2;
Initially, both clients are aware of a
as shown at the top of this
post. The a
, and 0
, 1
and 2
objects are all at version 1 of
themselves, and this is known to both clients.
First, c1 runs t1 followed immediately by t2: i.e. whilst the transaction log of t1 is in flight to the server, c1 starts running t2. Thus both t1 and t2 get first run on the original objects, and so both will try to change the values of 1000 to 999.
The transaction t1 will have a read set of a
, 0
and 2
, and a
write set of a
, 0
and 2
. This transaction goes to the server,
commits successfully and comes back to c1 which updates its own
copies of the objects. The object at a.1
has changed: the new object
is at version 1 (along with the previous old object that used to be
reachable from a.1
), whilst all the other objects (a
, 0
and 2
)
are now at version 2. The num
fields have values of 999 now,
though the original object that was at a.1
has a num
value of 1000
still. Only the server and c1 know all this.
Whilst that was going on, c2 runs t3. This transaction is initially
run against the original objects (version 1 of everything, with
num
fields at 1000). The transaction log arrives at the server after
t1, and gets rejected because t1 committed successfully, and
changed the versions. The server sends back to c2 version 2 of
a
, 0
and 2
, along with version 1 of the new object reachable
from a.1
(but all with num
fields of 999). The client, c2 now
restarts t3. This time t3 has a transaction log with a read set of
a
, 0
, and 2
at version 2, plus the new 1
at version 1, and
a write set of 0
(version 2), 1
(version 1 - remember: the new
replacement object), and 2
(version 2). This goes to the server
and commits correctly. The version numbers are bumped accordingly:
both c2 and the server agree that a
, 0
and 2
are now at
version 3, the old original 1
(which is no longer reachable) is at
version 1, and the new 1
is at version 2. The num
fields are
all now at 998, except for the old a.1
object that's still at 1000,
but it's unreachable, so it doesn't matter.
Now, the transaction log from t2 arrives at the server. It was run
by c1 a while ago, indeed against the original objects (version 1
of everything - num
fields at 1000), but the server's been kept busy
and is only now getting around to dealing with it. Blame the
network. The transaction log of t2 contains reads of version 1 of
a
, 0
, 1
(the original 1
) and 2
. The server notices that
these are old versions and rejects the transaction. But here comes the
problem: the server sends down the current versions of a
, 0
and
2
(all version 3 - num
fields at 998). It does not send down the
current version of a.1
because the transaction log from t2 had
nothing to do with the current object at a.1
: it only mentions the
old original object that was at a.1
, and no one's modified that
object: it still has a num
field at 1000.
So now the client c1 applies those updates, and restarts t2. Now
t2 sees that a.0
and a.1
have num
fields at 998, but a.1.num
is actually at 999, because t2 (and indeed c1 as a whole) has not
seen the effect of t3 (run by c2) on a.1
: it's only seen the
effect of t1. Thus isolation is broken: t2 is seeing parts of the
world from before t3 committing and other parts from
afterwards.
When t2 now commits again, the server will again reject it because
this time, t2's transaction log will contain a read and a write of
the new object at a.1
, but at version 1, not version 2. So the
server will reject it, send the update to the new a.1
which the
client c1 will apply to get its a.1
up to version 2, and finally
t2 will be restarted and this time will commit successfully.
Race conditions like these are always fun to unravel. Even more fun is that I currently have no idea how to solve this: it's almost like we need some sort of dependency chain to say that "if the server is going to send down version X of object J, then it must also send down version Y of object K". In many ways, this seems to be rather like a cache invalidation problem. I wonder how other STM systems solve this, whether they don't, or whether the problem really only appears due to the distributed nature of AtomizeJS. I think it might be the latter.
Translation
Monday 20 February 2012
Broad browser compatibility is here now: the current versions of all the major browsers now work with AtomizeJS. The cost though is the translation tool.
In the end there was no choice: it has to be a server-side and/or
static translation of JavaScript, or just write to an extended API and
pay the cost up front. I had wondered about doing a dynamic
browser-side on-demand translation: after all, you should get the
source code of any given function with fun.toString()
. The problem
though is that after you've done the translation, you have to eval()
it back to a function, but now you're in a different environment. So
if you previously had:
var a = 5;
function myFun () {
return a;
}
then yes, you can get the source of myFun
and you can transform it
as necessary. But you can't then re-eval()
it back to a function
and have it capture the same value of a
as before: there's no way
to extract the bound variables the closure captured the first time in
order to re-present them to the eval()
.
So instead we have the translation tool, and yes, it's not ideal, and you may have to treat external libraries too (though actually I believe there probably won't be too many cases where that's necessary). The code you get back is readable and nicely formatted, and makes the transformations applied quite obvious. Most importantly, it's enough to show that this will work with older browsers and that the AtomizeJS itself is a viable technology for writing applications today.
Or at least that's what I think! As ever, feedback is very welcome.
Dropping the dependencies
Thursday 08 December 2011
One of my highest priorities right now is to increase browser compatibility. Supporting IE6 probably isn't going to happen, and even IE7 is unlikely. IE8 would certainly be a nice to have given that as of July 2011, about 60% of IE users are using IE8, though that will change. I'd really love to avoid having to write different versions of JavaScript for different browsers, but we shall see...
First up is getting rid of the dependency on WeakMaps or Maps in
general. Simple
Maps and Sets
are due to be in the next version of JavaScript. Without them, you
have problems telling the difference between certain types of key,
because when you do a plain obj[key] = val
, the field name created
is just the string representation of key
. Thus 1
and "1"
are the
same thing, and every object is [object Object]
- hardly very
useful. I need objects as keys.
This one I've managed to work around: I've managed to build an
implementation of a Map. Inevitably, it's a compromise, and it ends up
storing a unique ID in every object it touches. Currently, it does
that via defineProperty
(in order to make it non-deletable,
non-rewritable and non-enumerable) which is broken in IE8, but I hope
to be able to work around that.
The much bigger problem is working around the lack of Proxies in
older browsers. Initially, I'll have to build the API out so that you
can drive the proxy manually. This will mean that instead of writing
code like a.b.c = x.y
you'll have to write code like
a.get('b').set('c', x.get('y'))
. Yup, it's pretty grim, and I doubt
that'll be the worst of it. I'm hoping to have some sort of mechanised
translation, but seeing as you can write
function MyFun (f) {
atomize.atomically(f);
}
you're either going to have to do dynamic translation of any f
that
arrives there (which is OK to a point - f.toString()
should give you
the source code of f
(which I could then parse, build an AST,
analyse and rewrite), unless it's a browser built-in), or you're
going to have to do whole program analysis in advance and really
prepare two different versions of every function and then select at
run time which version to run based on whether or not you're inside a
transaction. I've not made up my mind which one I prefer -
comments welcome - but the first step will be to build out
the proxy API so that these things can be driven manually, even if the
syntax is pretty ugly.
Starting to decloak
Monday 05 December 2011
Under 1 man-month of development done; AtomizeJS is public, but there is much to do. The most major limitation is that of browser support: currently only Firefox 8 and Chrome 17 are supported. This is because they support experimental features of the next version of JavaScript which AtomizeJS currently depends upon. I have some ideas about how to support other browsers and doing this is my first priority, but it's not the only thing on my to-do list.
There is also much work to be done on:
Exception handling: particularly transactions that throw exceptions after they've been restarted. Having a good story on dealing with failure scenarios is very important.
Garbage Collection: currently, any object that has been lifted into AtomizeJS will exist on the server forever. Distributed GC is on the whole quite a challenge. I have some ideas how to solve this, but it's not a straight-forward problem.
Multi-node server support: it's pretty simple to imagine having multiple NodeJS servers all attached to the same AtomizeJS instance. This should be fairly simple to achieve: the lack of a SockJS client for NodeJS is the reason it doesn't work yet, but I could do a plain socket implementation.
Presence: it's fairly simple to build a system where a new client makes its presence known to the other clients, though there are some open questions about naming clients. It's much harder currently for other clients to know about the loss of a client. Indeed, quite what loss means may vary from application to application. Having some mechanism for being able to indicate which clients currently exist, for varying degrees of exist, would be useful for a large number of applications.
Security and partitioning: the focus of AtomizeJS is to make it easier to move more and more application logic to the browser side. However, there are always going to be applications which need to have some server-side component. One of the likely important areas here is to provide a means whereby the server can control which clients can read and write to which variables. Currently there is no security at all: any client can write and read to and from any object managed by AtomizeJS (provided they can get hold of it in the first place - objects do not have to be reachable via
root
). It's easy to imagine needing private objects amongst different clients and the server.Libraries: AtomizeJS and STM in general provide some neat primitives. These are fairly low-level, but can be usefully combined to create more powerful patterns, for example the broadcast queue in the getting started guide. I plan to create a set of libraries which capture many of these higher-level patterns.
Optimisations: There are many optimisations that could be done both client and server side.
Alternative servers: There's no reason why the server should just be implemented in NodeJS. For performance reasons, it might be a good idea to have other implementations in other languages.
At this stage, any and all feedback is very welcome. I realise that right now, without broader browser support, few of you are going to start building your next world-changing application on top of AtomizeJS. However, for the early-adopters out there and everyone who's keen to have a play around and a quick read, I'd love to know what you think of the project as a whole, how easy you find it to write applications on AtomizeJS, whether you think the APIs make sense and so forth. Please get in touch with any thoughts you have.