Timeouts and write fences

Avital Oliver

unread,

Sep 24, 2012, 2:11:25 PM9/24/12

to meteo...@googlegroups.com

This started out as an internal conversation about a recent change to the write fence in server methods (the mechanism used to notify the client that all data modified by a method has been sent down). If anyone wants to join in or add relevant feedback that'd be great.

---------- Forwarded message ----------
From: David Greenspan <da...@meteor.com>
Subject: Re: Timeouts and write fences

For the record, my original reasoning for why this isn't a problem was:

- Writes are fast. They take up to 50 ms now (waiting for Mongo poll) but will take less in the future.

- It's not directly observable whether a write from a setTimeout from a method "hits the fence." The ordering of the data and the method satisfaction was nondeterministic before (in case A) and is nondeterministic now; the only difference is sometimes method satisfaction waits for the next poll. To measure this effect you'd need some kind of quantum slit experiment where you fire lots of method calls and observe a gap in the probability distribution.

However, I admit that in a world with many types of back ends, some generalized writes will be slow. Also, whether a write hits the fence may be directly observable if the method would otherwise be very fast.

I think the right solution should have the property that you don't do anything different in case A, and you do something simple different in case B.

Here's a proposal:

(3)

By default, timers don't preserve the write fence. However, there is some flag Tinytest sets or some call it makes so that timers do. For example, there's an environment variable that's a "sticky bit" for the write fence and causes it to be preserved even when it otherwise wouldn't. I'm not sure where this environment variable goes.

Or:

(4)

setTimeout only clears the write fence if there is a current method invocation (i.e. it is called from a method), otherwise it leaves the environment alone, assuming something Meteor-internal or more advanced is going on. So Tinytest just has to clear the current method invocation without clearing the write fence before firing off the tests. Then the write fence will survive. This sounds a little indirect but would be easy to implement and leave the default behavior as it was before.

-- David

David Glasser <gla...@meteor.com> wrote:

This is my current thoughts on the issue David started to fix in
https://github.com/meteor/meteor/commit/dcd26415fe3b526d72137dfc67091ac5ea4108dc

Here's the issue: You're in a method invocation, and you want to do a
write asynchronously (using Meteor.setTimeout and friends).

If you're thinking carefully about it (which many users may not be)
there are two cases you might want:

(A) This is really a "write later" situation, and you don't want to
block the method's "all committed" message on the write. (Of course,
you might run into the issue that the write never happens because the
process crashes / is shut down by the runner / etc, but hey, welcome
to software engineering.)

(B) The write is asynchronous for some good reason, but you don't
want the method to send the "all committed" message until the write in
the method is done. This is something Tinytest does: there are
asynchronous calls all over the place, and it doesn't want
tinytest/run to quiesce until all test results are committed.

Of course, there's also:

(C) You aren't thinking particularly carefully about the fact that
the write is happening asynchronously.

Now, in the current Meteor implementation, the trigger to send "all
committed" is that the method body returns and all writes created at
that point finish. So if you want to implement (B), you need to use
Futures manually to make sure you don't return from the method until
after all writes have been *started*, at the very least.

Before David's commit, Meteor.setTimeout (and similar functions)
explicitly removed the write fence from timeout callbacks. So (A)
situations "work" by making all timer callbacks never block the
method. (B) situations required avoiding Meteor.setTimeout; there was
a hacky Tinytest.setTimeout that explicitly preserved the
CurrentWriteFence.

David's commit effectively (though not technically) changes the
semantics such that timer callbacks preserve the CurrentWriteFence,
until the point where the fence fires, at which point they are mutated
to have no write fence. (The actual implementation involves setting an
onAllCommitted timeout on Meteor's one _CurrentWriteFence
instantiation which "retires" the fence, which means essentially that
all calls on it becomes no-ops.) This means that (A) cases continue
to "just work", and (B) cases work (like in Tinytest) as long as you
are doing the "wait on the callbacks using Futures" thing correctly.

However, as several people noted, it means that if you set shortish
timeouts and *don't* wait with Futures, then whether or not the write
is included in the write fence is nondeterministic. Additionally, it's
*silently* nondeterministic, in that nobody is going to notice that
sometimes their writes end up in the fence and sometimes they don't.

I do think that the current code is an improvement on last week's
code, and that it's fine to release 0.4.1 with this change... just
that we may want to make further changes in the future.

I suggest two possible ways we can go from here, together or separately:

(1)
Keep the new implementation of setTimeout which preserves the write
fence, but get rid of the concept of retiring write fences.
Add a new function called (something better than)
Meteor.dontWaitForWrites, with an implementation like
Meteor.dontWaitForWrites = function (f) {
return function () { Meteor._CurrentWriteFence.withValue(null, f); };
}

This way, users who do (A) will receive an error about trying to add a
write after the fence fired (the error message should be improved
too), but the answer if they are really trying to do this is to wrap
their callback in dontWaitForWrites:
Meteor.setTimeout(Meteor.dontWaitForWrites(function (f) {
// do some write
}, 60*1000);
(NOTE: This implementation DOES NOT WORK, because the wrapping of
dontWaitForWrites and the bindEnvironment inside setTimeout are in the
wrong order! But we could make something work.)

Users who do (B) will have things Just Work, as long as they are
waiting properly (otherwise they'll get the exception too).

Users who do (C) will be forced to think a little about what they are
really trying to do, and either put in some waits or inject
dontWaitForWrites.

(2)
Another approach is to allow methods to override the meaning of when
methods end. Inside a method, you can run something like
this.methodMayDoMoreWrites(); // with a better name
This prevents the fence from being armed as soon as the method body
returns. (The result value is still returned immediately, though.)
Then, later, from inside the callback (which should be run inside the
write fence!) you call
this.methodDoneWriting();
to arm the fence.

I *think* you still want to have dontWaitForWrites to handle the (A) case.
But this means that the (B) case now, instead of futzing around with
Futures, can just call methodMayDoMoreWrites, return, and let the
callbacks later call methodDoneWriting.
(ie, this doesn't affect the unpredictability of the current code; it
just makes writing (B) easier.)

Thoughts?

Paul Liu

unread,

Sep 28, 2012, 11:04:50 PM9/28/12

to meteo...@googlegroups.com

When I was testing my program, I encountered the following which might be related to this topic.

I wrote a test function that do the following:

Setup:

1 PC(Ubuntu) running Meteor server

1 PC(Windows 8) running 3 clients: 1 Chrome, 1 Firefox, 1 IE (All latest version)

1 PC(Ubuntu) run 2 clients: 1 Chrome, 1 Firefox (All latest version)

1 PC(Mac OS 10.8.2) running 3 clients: 1 Chrome, 1 Firefox, 1 Safari (All latest version)

1 new iPAD(IOS 6): Chrome.

Test procedure:

1. Copy a set of record(around 50 to 100 records, each record has 7 items) from a Collection and erase them from the Collection.

-Each client has its own set of record in the Collection

2. Use Meteor.setTimeout to insert the record one by one back to the Collection at 100ms interval (recursive loop). This is to simulate user input.

3. Repeat this process for 5 times.

There is no problem when I run 1 to 3 clients. If I increase the clients to 7 or 8 and run the test simultaneously, I will get error message regarding inserting duplicate key from Meteor server after a while. When this happen, I will see on PC system monitor that Node process went into an endless cycle of 100% CPU down to 10% CPU.

Best Regards

Paul

Avital Oliver於 2012年9月25日星期二UTC+8上午2時11分46秒寫道：

David Glasser

unread,

Oct 1, 2012, 5:41:25 PM10/1/12

to meteo...@googlegroups.com

Can you provide the actual test program (eg as a cloneable git repo)?
That will make it easier for us to reproduce.

> --
> You received this message because you are subscribed to the Google Groups
> "meteor-core" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/meteor-core/-/QRTD_9g27fYJ.
>
> To post to this group, send email to meteo...@googlegroups.com.
> To unsubscribe from this group, send email to
> meteor-core...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/meteor-core?hl=en.

Paul Liu

unread,

Oct 2, 2012, 2:24:56 PM10/2/12

to meteo...@googlegroups.com

Hi David,

After reading this article:

http://j-query.blogspot.com.es/2011/11/notes-on-failed-launch-with-nodejs-and.html

I am rethinking about my testing method. I plan to redo the test with PhantomJS this week. I will send you the code after the test.

Paul

David Glasser於 2012年10月2日星期二UTC+8上午5時41分46秒寫道：

Reply all

Reply to author

Forward