Recommendation on handling npm module dependencies

850 views
Skip to first unread message

realguess

unread,
Dec 17, 2011, 7:51:50 PM12/17/11
to nodejs
For example, in a node_modules/:

├─┬ js...@0.2.10
│ ├── conte...@0.0.7
│ ├── cs...@0.2.1
│ ├── htmlp...@1.7.3
│ └── req...@2.2.9
└── req...@2.2.9

Should I delete the `request` package inside the `jsdom`
node_modules/, so there is only one copy of the module, or it's better
just to leave it as it is?

Mark Hahn

unread,
Dec 19, 2011, 2:20:53 PM12/19/11
to nod...@googlegroups.com
Since there is no harm I would leave it alone.  I wouldn't want to take over responsibility for package management when npm already does a good job.


--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

fent

unread,
Dec 19, 2011, 4:39:44 PM12/19/11
to nodejs
Leave it alone. Let npm handle all of that for you. It might be a
different version than the one in the parent directories.

I'm not sure but I think npm checks if the same dependency with the
same version already exists in a parent directory, it won't install it
again.

On Dec 19, 12:20 pm, Mark Hahn <m...@hahnca.com> wrote:
> Since there is no harm I would leave it alone.  I wouldn't want to take
> over responsibility for package management when npm already does a good job.
>
>
>
>
>
>
>
> On Sat, Dec 17, 2011 at 4:51 PM, realguess <realgu...@gmail.com> wrote:
> > For example, in a node_modules/:
>
> > ├─┬ js...@0.2.10

> > │ ├── context...@0.0.7
> > │ ├── cs...@0.2.1
> > │ ├── htmlpar...@1.7.3
> > │ └── requ...@2.2.9
> > └── requ...@2.2.9

Isaac Schlueter

unread,
Dec 21, 2011, 6:44:33 PM12/21/11
to nod...@googlegroups.com
I'm guessing that you installed request *after* installing jsdom (or
are using an outdated copy of npm). If you reinstall jsdom, you
shouldn't have two copies of request (unless jsdom bundles its own,
which would probably be silly.)

Isaac Schlueter

unread,
Dec 21, 2011, 6:44:53 PM12/21/11
to nod...@googlegroups.com
In other words, yes, you may remove it. I think npm would agree with me ;)

Jeff Barczewski

unread,
Dec 22, 2011, 1:50:47 PM12/22/11
to nod...@googlegroups.com
Are there any tools or secret npm commands that would prune these duplicates away? 

I was hoping that maybe npm prune would do something like this where if you ran it from the top level it could indentify duplicates and remove them, but it only removes extraneous packages.

Maybe a new --option for prune?

Or is there a better way?

Thanks in advance.

Jeff

Martin Cooper

unread,
Dec 22, 2011, 8:02:26 PM12/22/11
to nod...@googlegroups.com
On Thu, Dec 22, 2011 at 10:50 AM, Jeff Barczewski
<jeff.ba...@gmail.com> wrote:
> Are there any tools or secret npm commands that would prune these duplicates
> away?

They may look like duplicates, but it's not clear you want to prune
them. In the tree in the original post, there are two copies of
'request', but they're there for a reason. One is there because jsdom
has a declared dependency on request. The other is there presumably
because the parent package declared its own dependency on request.
What's there now is two copies of the same version, but there's
nothing to say that that would always be the case; it depends on how
the version constraints are specified. It's entirely possible, for
example, for jsdom to require a different version from the top level
package. The tree structure within node_modules is specifically
designed to allow for that.

--
Martin Cooper


> I was hoping that maybe npm prune would do something like this where if you
> ran it from the top level it could indentify duplicates and remove them, but
> it only removes extraneous packages.
>
> Maybe a new --option for prune?
>
> Or is there a better way?
>
> Thanks in advance.
>
> Jeff
>

Scott González

unread,
Dec 22, 2011, 8:35:37 PM12/22/11
to nod...@googlegroups.com
The fact that they can differ doesn't mean you couldn't write a tool or add an option to npm to remove duplicates when they don't differ.

Martin Cooper

unread,
Dec 22, 2011, 9:48:07 PM12/22/11
to nod...@googlegroups.com
2011/12/22 Scott González <scott.g...@gmail.com>:

> The fact that they can differ doesn't mean you couldn't write a tool or add
> an option to npm to remove duplicates when they don't differ.

You may also need to modify the way Node finds a module's dependencies
to load. Or perhaps only remove duplicates when it's clear that the
loader will find another copy in the module load path? You'd also need
to modify npm to check all of the dependencies in an application's
tree when one module is updated, to ensure that prior removals would
still be valid with respect to version constraints, and potentially
then reinstate them if they're not. That sounds a bit messy to me.
Perhaps not impossible, but I'm not sure I see the advantage to
pruning duplicates. Most Node packages are very small, so there's no
real cost to having each module own its own dependencies, and it does
simplify things.

To each his own, though. I'll be interested in looking at such a tool,
should you choose to build one.

--
Martin Cooper

Mark Hahn

unread,
Dec 23, 2011, 1:29:28 AM12/23/11
to nod...@googlegroups.com
You are solving a problem that doesn't exist. 

Jeff Barczewski

unread,
Dec 23, 2011, 1:48:50 PM12/23/11
to nod...@googlegroups.com
Below is a simple example using the latest copy of npm, which shows the duplication problem we are describing.

If I create a package.json for my project which includes other modules and do an npm install, if each package uses other modules not defined at the root package, then you will have duplicate copies of modules.

Yes, one can ignore them and everything should run fine (but that doesn't make them go away). They are taking up disk space and potentially rebuild time, if they use compiled code. 

So some potential solutions for solving the issue are:

1. Have npm or another tool which puts all the duplicate dependencies in the root and removes them from the children
2. (for platforms that support symbolic links) npm could instead install all dependencies at root level and use symbolic links for children to the right version
3. Create tool which can walk the npm dependencies and add any duplicate ones to package.json (except that now it isn't clear what is really a dependency for the root package or what is needed just to remove duplicates)

I think option 1 is probably the most straight forward way to go, creating a tool which can remove this duplication or an option for npm which can do it.


Simple example of the issue
-------------------

$ npm --version
1.0.106

# create package.json
{
  "name": "foo",
  "version": "0.0.1",
  "dependencies": {
    "event-stream": "*",
    "forever": "*"
  }
}

$ npm install

$ find . -name optimist -type d -exec ls -id {} +
5676431 ./node_modules/event-stream/node_modules/optimist
5677319 ./node_modules/forever/node_modules/optimist

$ npm ls | grep optimist
│ └─┬ optimist 0.2.8
  ├─┬ optimist 0.2.8


Now one can go into your root package.json and add all of the duplicated dependencies from every package below, then clean and rebuild to get rid of the duplicates, but this could be quite a bit of manual work to find the duplicates and add to the root package.json. Also now it is misleading in the root package.json because I have specified dependencies for children which I did not need for the parent, so it is no longer clear what are the root dependencies and which are just to get around the child dependency duplication.


$ rm -rf node_modules

# edit package.json - adding optimist at root level
{
  "name": "foo",
  "version": "0.0.1",
  "dependencies": {
    "event-stream": "*",
    "forever": "*",
    "optimist": "0.2.8"
  }
}

$ npm install

$ find . -name optimist -type d -exec ls -id {} +
5687310 ./node_modules/optimist

Marco Rogers

unread,
Dec 24, 2011, 11:33:48 AM12/24/11
to nod...@googlegroups.com
Can you help me understand why this is really a concern? request is less than 200k even with ancillary files like tests and README. Most native modules I've seen build in only a few seconds. Even libxmljs and things like mysql and postgres driver modules.

The nice thing you get here is transparent updates. Let's say optimist comes out with a new version and you want to update the top level package to use the kickass new features. You can do that without potentially breaking the forever module. It has it's own copy of optimist that doesn't change. In this instance you still end up with 2 installs of the same name. The size of the code is still pretty similar. If it's native, you have to build both, and it's important because they're now different. But it's also good, because you can use these 2 different versions in the right spot transparently. A lot of thought went into how node_modules works now. Why change it other than a vague sense that it sets off your hacker OCD (yes I had this too, but I got over it)?

:Marco

Jeff Barczewski

unread,
Dec 24, 2011, 12:25:34 PM12/24/11
to nod...@googlegroups.com
For small modules, especially those that are pure javascript, it probably isn't worth talking about. 

My concern is that as we build larger and larger systems, especially ones using compiled modules, it changes things. For example if you build a system that needs to deploy to a large number of systems in the cloud, now each and every one needs to build redundant modules and require redundant space, it all adds to the time to build and deploy.

 - Mongo driver is 2.2M compiled
 - Hook.io 13.4M compiled

If these are required in many modules and they are not pulled in at the top then you can start to see how it could become more significant.

Unless I am missing something, each of the solutions I was suggesting would still allow modules to use specific versions, only duplicate identical versions would be combined where possible.

I raised the question because on my team we are already starting to see lots of module duplication and we're manually starting to put things in at the root and/or not declare some dependencies in lower modules to prevent this, both of which don't feel right to me.

So I just wondered if anyone had any utilities or way to deal with this.

If if isn't bothering anyone else, then maybe I am the only one.


I consider the Node community to be among the brightest groups on the planet. I am continually impressed by the Node.js community's elegant solutions to difficult problems (which are ugly, complex, or unsolved in other developer communities).

I want to do my part to understand best practices, promote them, and to contribute back improving the great tools to make them even better. I want to make our toolset the best in the industry. When new adopters come to Node.js, I would prefer to have a solution to the question "hey, how come I have duplicate modules all over the place and how do I fix it". I wouldn't want stupid issues like this to be a reason that companies might choose not to adopt this awesome platform.

So I guess in closing, it sounds like there aren't any existing solutions to the duplication and it isn't a problem for most people. 

When I get a chance, I'll prototype something up and see if I can get something that works how I would like, then it will be available for the community to use if they so choose.

All the best,

Jeff

Glenn Block

unread,
Dec 24, 2011, 1:48:26 PM12/24/11
to Marco Rogers, nod...@googlegroups.com
+1 on this. To me this one of the really strong aspects of node. It allows me to have independently versioned dependencies thus my system doesn't just break due to incompatibilities. Over time this is a big benefit.

The size is minimal compared to the gains. This benefit is something only node's module system has AFAIK.

Sent from my Windows Phone

From: Marco Rogers
Sent: 12/24/2011 8:33 AM
To: nod...@googlegroups.com
Subject: Re: [nodejs] Recommendation on handling npm module dependencies

Glenn Block

unread,
Dec 24, 2011, 1:49:17 PM12/24/11
to Jeff Barczewski, nod...@googlegroups.com
Space is so cheap today, is it really something to be concerned with?


Sent from my Windows Phone

From: Jeff Barczewski
Sent: 12/24/2011 9:25 AM

To: nod...@googlegroups.com
Subject: Re: [nodejs] Recommendation on handling npm module dependencies

--

Mark Hahn

unread,
Dec 24, 2011, 2:36:52 PM12/24/11
to nod...@googlegroups.com, Jeff Barczewski
If these are required in many modules and they are not pulled in at the top then you can start to see how it could become more significant. 

No I don't see that.  I wouldn't mind if modules were 100M.  You can fit 10,000 such modules in terabyte.


Thomas Blobaum

unread,
Dec 24, 2011, 3:40:19 PM12/24/11
to nod...@googlegroups.com, Jeff Barczewski
This is a feature not an issue with NPM.  Multiple levels of
dependencies allows for module x to use the latest version of module z
whereas module y uses an older version of module z in the same
application. This adds stability.

>Now one can go into your root package.json and add all of the duplicated dependencies from every package below

... don't do this.


On Sat, Dec 24, 2011 at 1:36 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
> >  If these are required in many modules and they are not pulled in at the top then you can start to see how it could become more significant.
>
> No I don't see that.  I wouldn't mind if modules were 100M.  You can fit 10,000 such modules in terabyte.
>
>

Jeff Barczewski

unread,
Dec 25, 2011, 10:57:09 AM12/25/11
to nod...@googlegroups.com, Jeff Barczewski
I agree with not adding the dependencies to the root package, that is why I suggested a couple other alternatives which would work and would allow one to keep independent versions for modules.

1. Have npm command or another tool which moves all the duplicate dependency modules to the root and removes them from the children (not changing the package.json, but just moving the modules up. One could only move one version up, if there are multiple versions that are duplicated, would probably just take latest and leave the others as is.)

OR

2. (for platforms that support symbolic links) npm command or another tool could instead move duplicate dependencies to root level or point to a cache and replace with symbolic links for children to the right version. Again not affecting the package.json.


What I envision would NOT remove the ability for any module to use any version, but if they use the same version of a module there would be one copy. All package.json's would still just indicate what they need as normal.


I'm not sure why so many are arguing against this? I'm not asking for anyone to divert their time to work on it, I can build when I have time. I would make this an optional step like prune which you could run if you choose. I don't believe this would remove or break any functionality but simply save disk space and build/rebuild time. If you want to keep the duplication, you can. If you would like to consolidate, then you can do that too. You would still have the ability to have different versions of modules. I agreed that it should not affect the package.json, it would do its work on the file system.

Now I agree that it isn't a high priority item to work on for now. It is more like adding some final polish to our toolset.

But if this was available as a utility in the future, and it didn't break compatibility or remove functionality with node/npm, wouldn't anyone find it useful to be able to consolidate duplicates and save disk space? Maybe I am not doing a good job communicating and should just build the prototype so people can see how it would work.

Anyway thanks for the discussion. 

Wishing you all a Merry Christmas!

Jeff


Jann Horn

unread,
Jan 5, 2012, 6:38:58 AM1/5/12
to nod...@googlegroups.com

How about leaving deduplication to a filesystem or so?

Jeff Barczewski

unread,
Jan 5, 2012, 8:13:33 AM1/5/12
to nod...@googlegroups.com
Jann,

From tests I have run, it appears to me that if both copies exist (and are not simply symbolic links to the same directory), node will load multiple copies of these modules even if they are the same version of the module.

So not only will do you have duplicate space on the file system and possibly duplicate build time (for compiled modules), you also have different copies of modules loaded into memory (even if they are the same version) (and any data they cache).

That's another reason I was suggesting it would be nice to have a tool which can do just that (to clean/move duplicates to top or to move and create symbolic links). I'm still working through some of the thought process as to which approach might work better. 

IMO, If you care about any of the above, you may want to remove the duplicate directory from the lower paths.

Jeff

Jann Horn

unread,
Jan 5, 2012, 11:58:56 AM1/5/12
to nod...@googlegroups.com
2012/1/5 Jeff Barczewski <jeff.ba...@gmail.com>:

> From tests I have run, it appears to me that if both copies exist (and are
> not simply symbolic links to the same directory), node will load multiple
> copies of these modules even if they are the same version of the module.

Right.


> So not only will do you have duplicate space on the file system and possibly
> duplicate build time (for compiled modules), you also have different copies
> of modules loaded into memory (even if they are the same version) (and any
> data they cache).

People might be using that to be able to monkeypatch their own copy of a module.

Tony Tam

unread,
Mar 10, 2014, 12:37:30 PM3/10/14
to nod...@googlegroups.com
Jeff,

My company is running into the size issues you have cited.  I know this is an old thread, but I am wondering if you have ever resolved this with something like npm dedup

We currently have a 300 MB package in our company with many (> 20) duplicates of various module.

I am curious whether the use of consistent 'tags' would lead the community to solve this by encouraging modules to upgrade to 'tested', 'production ready' tags in the same time frame

Tony

Goff

unread,
Apr 2, 2014, 1:06:49 PM4/2/14
to nod...@googlegroups.com
I concur. Sure, small libraries, but the package modules is not used very effectively when loading the same lib over and over, but from different paths. For example, Underscore.js will almost all dependencies share.
Reply all
Reply to author
Forward
0 new messages