Request / Thread local storage

Showing 1-28 of 28 messages
Request / Thread local storage Gaurav Vaish 7/1/12 11:45 PM
Hi,

What is the best way to implement a thread-local / request-local
storage?

Basically, I am looking at a way to implement per-request-singleton
object definition.

(ThreadLocal: http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html)



--
Happy Hacking,
Gaurav Vaish
www.m10v.com
Re: [node-dev] Request / Thread local storage Ben Noordhuis 7/2/12 7:25 AM
On Mon, Jul 2, 2012 at 8:45 AM, Gaurav Vaish <gaurav...@gmail.com> wrote:
> Hi,
>
> What is the best way to implement a thread-local / request-local
> storage?
>
> Basically, I am looking at a way to implement per-request-singleton
> object definition.
>
> (ThreadLocal: http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html)

I'm not sure what you mean. Are you talking about TLS in C/C++ or
something in JS?
Re: [node-dev] Request / Thread local storage Elad Ben-Israel 7/2/12 1:12 PM
As far as I know node does not support something like that. One idea that came to mind is to add support for associating arbitrary context to a domain and extract it along the way.

I was thinking on trying to hack something like that. Any thoughts?

Elad.
--
Elad.
Re: Request / Thread local storage hasanyasin 7/3/12 1:21 PM
In JavaScript, everything is much simpler. In Node, there are no threads, no such problems.

From your question, especially "thread-local / request-local" I think you are talking about a server, maybe an http server, maybe another protocol. Either way, what you want is very simple in Node. For http, for example, in standard http module provided, every request is processed by a call that is given two two objects: request and response. Whatever you attach to these as properties/methods, are private to this request and not shared by other requests. You can also declare variables in your callback function and use them as parameters to other functions you will call during your request processing workflow.

In summary, only the variables that are declared in global scope (outside of body of any function) or others that are attached to those global-scope objects are shared between request callbacks. Other than these, everything you will declare inside your functions will be accessible only from inside that call.

You can always make local variables accessible globally by providing paths to them:

var requests = [];

http.Server(function(req, res) {
    // req and res are what you want, already provided. They are request-local.
    requests.push(req); // here we provide a way for others to access this private from outside.
});

JavaScript is a simple language. Its objects are flexible, powerful; and also very simple. Java programming is more about bureaucracy (things enforced by design) while JavaScript is all about socially acceptable behavior (things recommended by convention).
Re: Request / Thread local storage Gaurav Vaish 7/3/12 9:02 PM
>
> > (ThreadLocal:http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html)
>
> I'm not sure what you mean. Are you talking about TLS in C/C++ or
> something in JS?

Well, yes!
Re: Request / Thread local storage Gaurav Vaish 7/3/12 9:03 PM
Hi Elad,

> As far as I know node does not support something like that. One idea that
> came to mind is to add support for associating arbitrary context to a
> domain<http://nodejs.org/docs/latest/api/all.html#all_domain> and
> extract it along the way.
>
> I was thinking on trying to hack something like that. Any thoughts?

Do you already have some thoughts on the same?
Re: Request / Thread local storage Gaurav Vaish 7/3/12 9:04 PM
> var requests = [];
>
> http.Server(function(req, res) {
>     // req and res are what you want, already provided. They are
> request-local.
>     requests.push(req); // here we provide a way for others to access this
> private from outside.
>
> });

Well, there's a catch => How do the "outside" know what "index in
array" are we looking at?
Re: Request / Thread local storage hasanyasin 7/3/12 9:05 PM
Very sorry for my extreme naivety...
Re: Request / Thread local storage hasanyasin 7/3/12 9:23 PM
Oh, that I can tell :p

I have given this just an example. It is not very useful to make requests accessible unless you want to run over each request object for some reason. It will also prevent garbage collection of call stacks and ram usage will increase rapidly on a real system. It was just a silly example to show how the opposite of what you want could be done.

If your question was about JavaScript (I still did not understand which one you meant by "Well, yes!" to a question between two alternatives), whatever you declare locally is private to that call stack, so the request. For JavaScript side of Node, there are no threads. For requests, it is exactly how I told in my previous answer.

For internal workings of it, each process have a few threads for inter-process communication and io, to my humble knowledge.

Again, if you are developing a library for Node in C++, you still should not use threads other than the way Node uses them and I really did not understand your question in this case, that is why I thought I totally misunderstood and apologized for being naive.

Also, if it is about JavaScript, this is wrong group. :D nodejs group is where it should go.

Pehh, you put me in big dilemma sir. What were you asking, really?
Re: [node-dev] Re: Request / Thread local storage Ben Noordhuis 7/4/12 3:19 AM
On Wed, Jul 4, 2012 at 6:02 AM, Gaurav Vaish <gaurav...@gmail.com> wrote:
>>
>> > (ThreadLocal:http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html)
>>
>> I'm not sure what you mean. Are you talking about TLS in C/C++ or
>> something in JS?
>
> Well, yes!

Was that a boolean logic joke?
Re: [node-dev] Re: Request / Thread local storage hasanyasin 7/4/12 6:55 AM
@Ben: Awesome!

@Gaurav: If this was really a joke, I am really naive! :p
Re: [node-dev] Re: Request / Thread local storage Elad Ben-Israel 7/10/12 11:53 AM
Done a little bit of digging and found out the undocumented `process.domain`, which actually allows one to access the current domain from anywhere. This is very similar to "thread local storage", but it's way awesomer because it automatically traverses async calls.

For example (node 0.8.2):

```
var domain = require('domain');
var http = require('http');

function do_some_async_stuff(callback) {
  setTimeout(function() {
    console.log('url of current request is', process.domain && process.domain.url);
    return callback();
  }, 500);
}

var server = http.createServer(function(req, res) {
  var d = domain.create();

  // attach `req.url` to the domain as arbitrary context
  d.url = req.url;

  d.run(function() {
    do_some_async_stuff(function(err) {
      res.end('done');
    });
  });
});

server.listen(5000);
```

This is an undocumented feature so I would advise not to use it before someone from node core gives their blessing.

Any thoughts from the core maintainers on this? Why did you guys choose not to document it? Any caveats that people should be aware of?

Cheers,
Elad.
--
Elad.
Re: [node-dev] Re: Request / Thread local storage Isaac Schlueter 7/10/12 2:51 PM
Consider this the opposite of a blessing.

That is not only an undocumented feature, but an undocumented
implementation detail of an experimental feature.  You can pretty well
be assured it will change in a future release.  (Though, not in
v0.8.x, which is API and ABI frozen.)  The reason that it's not
documented is because it's sort of a kludgey way to keep track of the
current domain, and we want to leave the option open for more elegant
or performant approaches in the future.

If you want per-request storage, what's wrong with putting stuff on
the request object like everyone else does?

Sorry if I'm missing the point of your question here.  "Thread-local"
only really makes sense in servers that spawn a new thread (or
thread-like thing) for each request.  Node doesn't do that.  It's
relevant in those environments because threads can preempt one
another, so you have to be careful to use thread-safe data structures
or else you'll have problems.  Node's JavaScript is single-threaded,
and cannot be preempted, so you never have that issue.  And child
processes don't share memory (mostly) so you don't have corruption
issues.

So, (a) the thing you're talking about doesn't really exist, because
(b) you probably don't actually need it.  You've got global (which is
per-process), module-local (the var's you define in your module),
exports (specific to a module, but visible from the outside), and
several objects relating to different conceptual constructs (like the
req, the req.socket, the response, child processes, etc.)  Your
program isn't ever preempted, so "thread local" is not relevant here.
Re: [node-dev] Re: Request / Thread local storage Elad Ben-Israel 7/10/12 3:17 PM
Not sure exactly what use case Gaurav is interested in, but I was interested in this thread because I think there is an unmet need around logging which domain-attached context could help solve (even a single pointer to a global instance). Logs are emitted everywhere across application (and library) stack and in node, mostly via `console.xxx`. One useful feature would be to allow, for example, correlating all console logs emitted during the processing of an incoming request. A few logging libraries do allow pushing context but they all require passing along some state throughout the async hoops, which means modifying the way logging is done throughout the entire stack.

I found the ability to push implicit context to logs pretty useful in debugging large and chatty systems, and it is used in many environments. In yakky synchronous multi threaded environments, it's easy to implement nowadays using thread local storage. In the async environment of node I am wondering if it requires some additional support.

I think proper support from node to this need would be valuable. Domains seemed like a natural approach to me, given their basic attribute is passing along implicit context across asynchronous call chains, and I ran into `process.domain` following this thread of thought.

Cheers,
Elad.

--
Elad.
Re: [node-dev] Re: Request / Thread local storage James Howe 7/11/12 2:01 AM
My first though on reading about Domains was also "yay, I can access request-specific state (for logging, bail-out on abort, etc.) without having to pass request objects through every single function in the codebase (including libraries unrelated to http servers)".

There's definitely a desire to be able to do that, and if there's a feature that lets you do it, people are going to start using it.

James
Re: [node-dev] Re: Request / Thread local storage Matt Sergeant 7/11/12 7:25 AM
On Tue, Jul 10, 2012 at 6:17 PM, Elad Ben-Israel <elad.be...@gmail.com> wrote:
Not sure exactly what use case Gaurav is interested in, but I was interested in this thread because I think there is an unmet need around logging which domain-attached context could help solve (even a single pointer to a global instance). Logs are emitted everywhere across application (and library) stack and in node, mostly via `console.xxx`. One useful feature would be to allow, for example, correlating all console logs emitted during the processing of an incoming request. A few logging libraries do allow pushing context but they all require passing along some state throughout the async hoops, which means modifying the way logging is done throughout the entire stack.

Well yes, you can't use console.log(), but "Don't Do That" anyway, it's bad practice. Look at how Haraka does logging - every log line gets a UUID associated with the current connection/transaction. It has made tracing issues with mail at Craigslist an absolute dream.

Matt.
Re: [node-dev] Re: Request / Thread local storage Elad Ben-Israel 7/11/12 8:15 AM


Haraka is awesome but it has the privilege of being a 'mini ecosystem' where plugins must adhere to the Haraka environment (in which case, use the Haraka `this.logxxx()` functions which emit the request UUID by extracting them from the current context in `this`). 

Now say you would want to create a Hakara plugin that used a library like socket.io for example (just an example) which emits it's own logs - now what do you do? You will need to pass the Haraka `this` pointer all the way to every log callsite in the socket.io codebase. Not very practical.

 
Matt.



--
Elad.
Re: [node-dev] Request / Thread local storage Guillermo Rauch 7/11/12 8:34 AM
Socket.io's logger is pluggable

On Wednesday, July 11, 2012, Elad Ben-Israel wrote:


On Wed, Jul 11, 2012 at 5:25 PM, Matt <helpme@gmail.com> wrote:



--
Elad.


--
Guillermo Rauch
LearnBoost CTO
http://devthought.com

Re: [node-dev] Request / Thread local storage Elad Ben-Israel 7/11/12 8:51 AM
On Wed, Jul 11, 2012 at 6:34 PM, Guillermo Rauch <rau...@gmail.com> wrote:
Socket.io's logger is pluggable

Unfortunately not all libraries employ pluggable logging :-)



--
Elad.
Re: [node-dev] Request / Thread local storage Isaac Schlueter 7/11/12 10:18 AM
They could.  Bunyan is pretty nice and has this capability as well.

On Wed, Jul 11, 2012 at 8:51 AM, Elad Ben-Israel
<elad.be...@gmail.com> wrote:
> On Wed, Jul 11, 2012 at 6:34 PM, Guillermo Rauch <rau...@gmail.com> wrote:
>>
>> Socket.io's logger is pluggable
>
>
> Unfortunately not all libraries employ pluggable logging :-)
>
>>
>>
>>
>> On Wednesday, July 11, 2012, Elad Ben-Israel wrote:
>>>
>>>
>>>
>>> On Wed, Jul 11, 2012 at 5:25 PM, Matt <hel...@gmail.com> wrote:
>>>>
>>>> On Tue, Jul 10, 2012 at 6:17 PM, Elad Ben-Israel
Re: [node-dev] Request / Thread local storage Elad Ben-Israel 7/12/12 7:21 AM
I'm getting a vibe that no one thinks this is actually a need and wondering where my perception is skewed. In systems I worked on, we could not control all the modules we were using and make them support pluggable logging. I'm also wondering if this makes sense as the best practice since I wouldn't really want to care about this all the time and pass some logging object to every async call I'm making.

--
Elad.
Re: [node-dev] Request / Thread local storage Matt Sergeant 7/12/12 9:21 AM
In my experience most libraries don't even bother logging anything, and so all logging ends up being custom, giving you full control over it. What are you using that logs that you can't customise?
Re: [node-dev] Request / Thread local storage Elad Ben-Israel 7/18/12 9:36 AM
Re: [node-dev] Re: Request / Thread local storage MikeS 7/31/12 5:18 PM
Something I've found useful, just in the error handling, umm, domain, when processing a set of work that will result in asynchronous callbacks (e.g. web service calls) is to

  1. Save off the current domain, which is the one associated with some HTTP request.
  2. Create a new domain
  3. Do all the work under the control of the domain created in 2, which will process any errors
  4. When all the work is done, process the resulting data (including any errors) under the control of the domain saved in step 1. 
Being able to use process.domain in step 1 is crucial.
Re: [node-dev] Re: Request / Thread local storage Isaac Schlueter 8/1/12 12:10 AM
MikeS,

Why can't you just create the new domain, enter it, do the stuff, then
exit it, and do the rest?

When you enter/exit domains, it's a stack.  So, when you exit one
domain, you're back in the original one.
Re: [node-dev] Re: Request / Thread local storage MikeS 8/1/12 1:54 PM
The work done in the new domain involves async callbacks (sometimes multiple levels of them.)  They preserve the current domain (via the eventing system), but not the entire stack.

On Wednesday, August 1, 2012 12:10:53 AM UTC-7, Isaac Schlueter wrote:
MikeS,

Why can't you just create the new domain, enter it, do the stuff, then
exit it, and do the rest?

When you enter/exit domains, it's a stack.  So, when you exit one
domain, you're back in the original one.

On Tue, Jul 31, 2012 at 5:18 PM, MikeS  wrote:
>> Something I've found useful, just in the error handling, umm, domain, when
>> processing a set of work that will result in asynchronous callbacks (e.g.
>> web service calls) is to
>
>
> Save off the current domain, which is the one associated with some HTTP
> request.
> Create a new domain
> Do all the work under the control of the domain created in 2, which will
> process any errors
> When all the work is done, process the resulting data (including any errors)
> under the control of the domain saved in step 1.
>
> Being able to use process.domain in step 1 is crucial.
Re: [node-dev] Re: Request / Thread local storage Isaac Schlueter 8/2/12 3:05 PM
Mike,

Can you share some actual code for a program that requires access to
process.domain to work properly?  I'm not getting it from the english
description.
Re: [node-dev] Re: Request / Thread local storage MikeS 8/2/12 5:36 PM
Here's some working code:
 
"use strict";
var domain = require('domain');
var assert = require('assert');
// Implement a session class that keeps track of async operations
// Is uses a domain to handle errors in these operations
function Session()  {
    this.domain = domain.createDomain();
    this.started = 0;
    this.completed = 0;
    this.isClosed = false;
    this.parentDomain = process.domain;
    this.errors = [];
    this.results = [];
    var theSession = this;
    this.domain.on("error", function(err) {
        theSession.errors.push(err);
        operationComplete(theSession);
    });
}
// Add a new (potentially async) operation to the session
function addOperation(operation) {
    this.started++;
    var session = this;
    this.domain.run(function() {
        operation(function(result){
            operationComplete(session, result);
        });
    });
}
Session.prototype.addOperation = addOperation;
// close a session, that is, no longer allow new operation to be added
function closeSession(cb) {
    this.isClosed = true;
    this.callback = cb;
    checkSessionComplete(this);
}
Session.prototype.closeSession = closeSession;
// Mark that an operation has completed
function operationComplete(session, result) {
    if (result) {
        session.results.push(result);
    }
    session.completed++;
    checkSessionComplete(session);
}
// check whether all operations have completed
function checkSessionComplete(session) {
    if (session.isClosed && session.completed == session.started) {
        session.parentDomain.run(function() {
            session.callback(session);
        });
    }
}
// Create and exercise a session
var mainDomain = domain.createDomain();
mainDomain.run(function() {
    var session = new Session();
    for (var i = 0; i < 10; i++) {
        session.addOperation(function(cb) {
            var shouldThrow = i % 2 == 1;
            process.nextTick(function() {
                if (shouldThrow) {
                    throw new Error();
                }
                else {
                    cb("done");
                }
            })
        });
    }
    session.closeSession(allWorkDone)
});
function allWorkDone(session) {
    assert.ok(session.isClosed);
    assert.equal(session.errors.length, 5);
    assert.equal(session.results.length, 5);
    assert.equal(process.domain, mainDomain);
    console.log("Done.");
}

This is simplified to save space, so, yes, I know it lacks error-checking and handling of corner cases and that what it does it trivial :-)

The idea is that the Session manages a group of async operations.  It runs them all under a domain specific to the Session, and any errors caught are simply stored in the Session.  When all the operations are complete, the Session calls back into the main logic, which should run under a different domain.  As you can see, they way it does this is to save the domain that was active at the time the Session was created and use it for the call back into the main logic (in checkSessionComplete()).. 

In real life, this is used to orchestrate web service calls.  node.js creates an HTTP server. Requests it receives are analyzed and a set of web service calls are made under the control of a Session (which allows them to be run in parallel inserted of series.)  When all the web service calls are complete, their responses are merged into the response to node's caller.  All of the work to process a node request is done under a request-specific domain (to allow any unhandled errors to be formatted into an HTTP response). The final callback from the Session must be done under that same domain, which, since much async processing has occurred, is no longer in the stack.
More topics »