require calls fs.statSync every time ??????

Showing 1-32 of 32 messages
require calls fs.statSync every time ?????? Oleg Slobodskoi 1/14/11 4:49 PM
I am not sure if it is a bug or a feature.

I wanted to mock a module using require.cache before it will be
required and wondered why it says "Error: Cannot find module".

Then I looked into the code of node.js and landet in findModulePath
method.

This method makes always fs.statSync without to consider if the module
is already cached.


That means __EVERY__ require call involves statSync call.


Is it not really bad and hits disk every time?


If all require calls in a program are done inside of module wrapper
function, then this is more or less not an issue, its just a bit
slower startup.

But if any require call are done inside of any function which is
called by every request - then this should be __dramatically___ bad.

I think there are a lot of people doing that, because they think the
module is cached and require call just returnes cached exports object.


Oleg
Re: require calls fs.statSync every time ?????? Isaac Z. Schlueter 1/14/11 4:53 PM
On Fri, Jan 14, 2011 at 16:49, Oleg Slobodskoi <ole...@googlemail.com> wrote:
> But if any require call are done inside of any function which is
> called by every request - then this should be __dramatically___ bad.

Yes.  Don't do that.  require() should be done at startup, or if done
lazily, should be cached to a reference so as to not be repeated too
often.

> I think there are a lot of people doing that, because they think the
> module is cached and require call just returnes cached exports object.

This is not as much of an issue if you use relative requires.

--i

Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/14/11 4:58 PM
what do you mean by "relative requires",  relative path passed to require method?

Why can't we do this more robust and if the module is cached just return it without to check via statSync?


2011/1/15 Isaac Schlueter <i...@izs.me>



--
regards,
Oleg @oleg008

Re: require calls fs.statSync every time ?????? Tom Robinson 1/14/11 5:04 PM
Why aren't they cached after the first call? Every other implementation of CommonJS modules I've seen / written does.
Re: require calls fs.statSync every time ?????? Wade Simmons 1/14/11 5:04 PM
On Fri, Jan 14, 2011 at 5:53 PM, Isaac Schlueter <i...@izs.me> wrote:
> On Fri, Jan 14, 2011 at 16:49, Oleg Slobodskoi <ole...@googlemail.com> wrote:
>> But if any require call are done inside of any function which is
>> called by every request - then this should be __dramatically___ bad.
>
> Yes.  Don't do that.  require() should be done at startup, or if done
> lazily, should be cached to a reference so as to not be repeated too
> often.

The node built-in libraries do these lazy requires all over the place.
Even if those require's are optimized for built-ins, it shows that
they are very useful. Quick example:

    https://github.com/ry/node/blob/496be457/lib/net.js#L496

Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/14/11 5:07 PM
it is cached, but it will be checked despite of cache. This is because I ask why should we check it if the module is in cache.

2011/1/15 Tom Robinson <tlrob...@gmail.com>


Re: require calls fs.statSync every time ?????? Mikeal Rogers 1/14/11 5:14 PM
this is all happening incredibly fast, i'm pretty sure the kernel caches this anyway so all we're really arguing about is the syscall overhead.
Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/14/11 5:18 PM
1. I don't know how much overhead is that ... but why should this be done at all, if there is no reason better not to do that
2. I have a situation where I have to mock a module before it will be required, because this file does not exists and this issue is the only thingy which prevents me to do it :)

2011/1/15 Mikeal Rogers <mikeal...@gmail.com>

this is all happening incredibly fast, i'm pretty sure the kernel caches this anyway so all we're really arguing about is the syscall overhead.



--
regards,
Oleg @oleg008

Re: require calls fs.statSync every time ?????? Isaac Z. Schlueter 1/14/11 5:53 PM
Modules are cached based on the resolved filename, not the argument
passed to require().  Since require.paths is mutable, this is
necessary.  require("foo") might mean different things at different
times.

With system modules, this is never an issue.  The require() function
*first* checks process.binding("natives"), so require("path") cannot
be overridden, ever, and is always IO-free.

See test/simple/test-module-loading.js for many edge cases.

When I say "relative requires", yes, I mean requiring something that
starts with either "." or "/".  require("./foo") only has to do a
maximum of 5 stats.  require("./foo.js") will probably only have to do
one.  On the other hand, require("foo") has to potentially do
5*(require.paths.length) stat calls.

I'd like to get away from the expense of our current system, without
losing the flexibility and robustness it affords.  It's not simple to
change, and really, as long as you're not doing require() in a highly
speed-sensitive area, it's almost never an issue.

--i

Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/15/11 9:43 AM
I'm working on refactoring the module system right now, so I think I can comment on this.

There are two problems that came up in this discussion:
  1. require will perform I/O in situations where most users would expect a fully cached result
  2. it is impossible to simulate the presence of a non-existing module for testing purposes
Problem #1 is hard. We would have to cache resolved module paths with respect to the current require.paths as well the modules location (if it is a relative request). It can be done, but the overhead and complexity is probably not justifiable [1].

Problem #2 has multiple solutions and I'm not sure yet which one is best:
  • There could be a way to replace require() with your own function, which would allow you to return mock objects. My fear is that such a hook would give people all kinds of crazy ideas and we might wish to have never added it.
  • We could expose some of path resolving mechanisms of the module system. This would help finding a require.cache path suitable for placing a fake mock module to be loaded by the module under test.
  • There could be explicit support for this, for example: require.andInject('./my_module', {'sys': {}, './my_other_module': {})
I'm really not sure what the best solution is yet, but the last two options should both be suitable.

--fg

[1] Lazy loading can be done entirely without this, by simply caching the reference of the required module yourself instead of relying on require() to do the caching. If that's too much trouble for you, ask yourself if you really need lazy-loading to begin with.
Re: require calls fs.statSync every time ?????? Evan Larkin 1/15/11 10:49 AM
I'm a bit confused here. Is this only a problem when using the syntax require("foo")? Or is it also the case when you specify the extension (require("foo.js"))?

In the former case, I don't believe it would be possible to perform correctly without the stats; in the latter case it's possible(albeit difficult) to refactor them out. Is that basically what you're saying Felix?

-Evan Larkin
-mobile
Re: require calls fs.statSync every time ?????? Tom Robinson 1/15/11 11:34 AM
I'm not sure the stats are even required after the first call in any case. Do you really need to support the case when paths changes after a module in both the old and new paths has been required?

AFAIK its not defined in CommonJS. I've never supported this in loaders I've worked on. It seems like an uncommon edge case, since usually you modify paths immediately before requiring any modules.

If I'm wrong please enlighten me.

-tom
Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/15/11 3:37 PM
On Saturday, January 15, 2011 7:49:48 PM UTC+1, Evan Larkin wrote:
I'm a bit confused here. Is this only a problem when using the syntax require("foo")? Or is it also the case when you specify the extension (require("foo.js"))? 
 
In the former case, I don't believe it would be possible to perform correctly without the stats; in the latter case it's possible(albeit difficult) to refactor them out. Is that basically what you're saying Felix?

Yes, as long as require.paths stays the same, it would be possible to handle both cases without stat calls [1].

On Saturday, January 15, 2011 8:34:46 PM UTC+1, Tom Robinson wrote:
I'm not sure the stats are even required after the first call in any case.

Theoretically, they are not required.
 
Do you really need to support the case when paths changes after a module in both the old and new paths has been required?

Every time require.paths is changed, all non-relative/absolute require calls need be re-evaluated. Not supporting this would lead to very surprising behavior.

AFAIK its not defined in CommonJS. I've never supported this in loaders I've worked on. It seems like an uncommon edge case, since usually you modify paths immediately before requiring any modules.

It's an uncommon case, yes. But if people do lazy loading, any of the lazy-loaded modules could modify the require.paths again.

--fg

[1] Even if require.paths is updated, stat calls may still be avoided if the new path is added to the end. But that only complicates things further.
Re: require calls fs.statSync every time ?????? Tim Caswell 1/15/11 4:49 PM
How about simply caching the result and skipping all the stat calls.
And any time the require paths is modified clear all the stat caches.
(It's ok to still cache the actual file contents and parsing if the
fully resolved path didn't change after search path modification).

As far as why do we need the feature to return different modules in
the same run time, it's a feature of npm so that different modules can
require different versions of their dependencies in the same process.

Re: require calls fs.statSync every time ?????? Mikeal Rogers 1/15/11 5:57 PM
this all sounds like a lot of work that makes the modules implementation more complicated at a time when we're still considering deep API changes.

what we have works and is reliable even if a little slow (which i actually don't buy), I don't think we should take *any* big implementation changes when we think the API might change.
Re: require calls fs.statSync every time ?????? Ryan Dahl 1/16/11 12:31 AM

Bug. Thanks for the notification.

Re: require calls fs.statSync every time ?????? Ryan Dahl 1/16/11 1:24 AM
> Bug. Thanks for the notification.

I committed a test
https://github.com/ry/node/commit/f35773ad0767232aafaa9dfe9f788ca55b0a6138
but I haven't fixed it.

Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/16/11 3:44 AM
Ok, I had a good idea how to fix this:


This patch is really simple, and it also takes care of the case where require.paths is modified.

--fg

PS: I put this branch on top of my cleanup2 branch, maybe you want to merge the entire thing while you're at it:

Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/16/11 3:56 AM
NVM, we need to create a cache key that includes the request and paths:


(ry, you will need the first two commits from here: https://github.com/felixge/node/commits/cleanup2)

--fg
Re: require calls fs.statSync every time ?????? Ryan Dahl 1/16/11 3:01 PM

Thanks. Landed in f39fdf2 and 0263f01.

Re: require calls fs.statSync every time ?????? Isaac Z. Schlueter 1/17/11 7:12 PM
So, just to be clear, the new logic is this?

From /foo/bar/baz.js, call require("../boo"), require.paths = "/asdf"

1. inspect cache for [[/asdf],/foo/boo].  If found, return that.
2. Search in this order for a non-directory file:
/foo/boo
/foo/boo.js
/foo/boo.node
/foo/boo/index.js
/foo/boo/index.node
3. If not found, throw
4. If found, read, parse, etc., cache to [[/asdf],/foo/boo]

.: Any change in require.paths invalidates the cache (but changing it
back re-validates those entries)
.: Writing /foo/boo.js will not change anything if /foo/boo.node has
been loaded.
.: Fewer stat calls.

If that's how it works, for relative paths lookups, +1, LGTM, etc.


How does this affect system module-path lookups?  If I do
require("foo"), then what's the cache key that gets used?  The
filename that ends up being found, or the require.paths and "foo"?

--i

Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/18/11 12:54 AM
Isaac: The cache key is this [1]:

var cacheKey = JSON.stringify({request: request, paths: paths});

So as you can see, it is a mix of the request (what is passed to require), and the current paths. Note that for relative module inclusion, paths is just the single parent directory of the module doing the inclusion.

Let me know if this makes sense.

--fg



Re: require calls fs.statSync every time ?????? aheckmann 1/18/11 6:36 AM
I just ran a simple non-scientific benchmark to see the gains by requiring 1000 files:

v0.3.5: ~320ms
v0.3.4: ~500ms
v0.2.6: ~270ms

Why is 0.2.6 faster?

Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/18/11 11:30 AM
On Tuesday, January 18, 2011 3:36:37 PM UTC+1, aheckmann wrote:
Why is 0.2.6 faster?

v0.2.6: require.paths.length === 1
v0.3.5: require.paths.length === 3

So v0.3.6 needs more stat calls.

--fg
Re: require calls fs.statSync every time ?????? aheckmann 1/18/11 11:59 AM
I see. Thanks Felix--
Aaron


Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/19/11 10:16 AM
thanks Felix,

I now have even a usecase, where require is actually, as I think, the right method to use and it is called by every request.

I build an mvc structure and there are controllers which handle request, so I map the url to the right controller and have to require it to call a proper action, by every request.

I am not sure if it is a good Idea to use require here even if it doesn't doing statSync anymore, because require still does a lot of stuff and even not designed to be used in high performance places.


Oleg



2011/1/16 Felix Geisend?rfer <fe...@debuggable.com>
Ok, I had a good idea how to fix this:


This patch is really simple, and it also takes care of the case where require.paths is modified.

--fg

PS: I put this branch on top of my cleanup2 branch, maybe you want to merge the entire thing while you're at it:




--
regards,
Oleg @oleg008

Re: require calls fs.statSync every time ?????? Isaac Z. Schlueter 1/19/11 10:26 AM
Why not just have your controllers export a function, and cache that
function, and call it over and over again?

There is always another way. :D

--i

Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/19/11 10:44 AM
my controllers export action handlers, ofcaurse I can cache them manually, but to start doing this I have to know all this stuff about require. Never thought that even to get cached module require have to to do a lot of other function calls to resolve the id etc.

And why for hell are debug() calls there? 

I mean if you need to do it, then  do it only in debug mode:

f.e.:

var debug = true;

function require() {
//do some stuff

if (debug) {
    var _require = require;
    require = function() {
       // debug whatever you want
       _require.apply(this, arguments);
    }
}

Also there are JSON.stringify calls in some functions, which are also called in by every require call.
This is absolutely unneeded overhead and can be done in debug mode only.

Oleg
 


2011/1/19 Isaac Schlueter <i...@izs.me>
Re: require calls fs.statSync every time ?????? Isaac Z. Schlueter 1/19/11 11:01 AM
The calls to debug() actually get stripped out if you're not running
in debug mode, if I remember correctly.

--i

Re: require calls fs.statSync every time ?????? Oleg Slobodskoi 1/19/11 11:05 AM
Where and how that?

I supposed that this is the only performance optimization for debug lol:

    if (debugLevel & 1) {

      debug = function(x) { console.error(x); };

    } else {

      debug = function() { };

    }


2011/1/19 Isaac Schlueter <i...@izs.me>
Re: require calls fs.statSync every time ?????? aheckmann 1/19/11 12:58 PM
Not sure how debug is stripped out but you can also use 

require.registerExtension('.js', function(contents){
  // parse contents here
  return cleanedUpContents;
})

which lets you preprocess your modules before they're run. I couldn't find it in the docs tho.--
Aaron


Re: require calls fs.statSync every time ?????? Felix Geisendörfer 1/20/11 12:57 AM
On Wednesday, January 19, 2011 8:05:14 PM UTC+1, Oleg Slobodskoi wrote:
Where and how that?

It's done via tools/js2c.py, and more specifically:


but to start doing this I have to know all this stuff about require.

Don't call require() during the runtime of your program unless you know what you're doing. And if you know what you're doing, why would you want to call require during runtime?

That being said, when the cache is hot, you can expect require() to be very fast now.

--fg
More topics »