Note: We already told people on Hacker News and elsewhere that SPDY was available for testing in Nightly, but it got backed out because of [1].
Looks like releng and build people are working on resolving [1] and/or other bugs that could get mozilla-central to be reopened. I proposed on #developers that, if that doesn't work, or that takes too long, that we should split out parts of Necko/PSM into a shared library separate from libxul so that we can continue landing changes to Necko and PSM as we have been planning.
Previous work [2] has already identified some code that looks reasonable to split out, if we need to split out code. IMO, it would be worth looking into splitting out uconv first. But, the advantage of splitting out (part of) Necko and/or PSM is that Necko/PSM teams could manage that independently of other work, because it is a large part of the codebase that one team understands. And, we are highly motivated to find some solution so that we can get SPDY back in the tree and keep landing other changes we want to land for FF11.
jduell and I agreed to re-assess the situation in 24 hours, hoping that the tree opens or that we have a good ETA for the tree opening without needing to do such surgery.
I have appended the chat log from #developers where this was discussed. Note that not all people are not yet in favor of doing this work (e.g. see roc's comments).
Cheers,
Brian
[1]
https://bugzilla.mozilla.org/show_bug.cgi?id=709480
[2]
https://bugzilla.mozilla.org/show_bug.cgi?id=674579#c2
<bsmith> jduell: By the way, do you have any idea about what to do about the fact that mozilla-central is basically closed indefinitely?
<jduell> bsmith: actually I hadn't heard. What's the issue, and define "indefinitely"
<bsmith> maybe it would be realistic to split off (part of) PSM and/or Necko into a separate shared library
<bsmith> jduell:
https://bugzilla.mozilla.org/show_bug.cgi?id=709193
<bsmith> jduell:
https://bugzilla.mozilla.org/show_bug.cgi?id=709193#c38
* mconley has quit (Input/output error)
* mike5w3c has quit (Client exited)
<bsmith> indefinitely means potentially for multiple days, IIRC.
<jduell> bsmith: yikes. Well, necko was it's own shared lib within 2 years ago,
<jduell> so it probably wouldn't be too hard
* mike5w3c (
Mi...@moz-9C3A69E9.pool.e-mobile.ne.jp) has joined #developers
<bsmith> jduell: I am just thinking that if that drags on, we're going to be kind of screwed
<bsmith> bcause we have a lot of checkins to make, and they're all in libxul
<Waldo> except for js currently -_-
<bsmith> Waldo: I mean "we" as in necko team
<Waldo> :-)
<Waldo> sure
* Waldo tends to agree he has no idea what the path forward is here
<bsmith> I think many teams will be screwed
<bsmith> merge date is 9 days away
<bsmith> end of wharter is 20 days away
<bsmith> quarte3r
<bsmith> lots of code is going to (want to) land now
<bsmith> we are not going to have a net reduction in code, for sure
* Waldo isn't so much worried about "now" as when this will actually change somehow
* florian has quit (Quit: Instantbird 1.2a1pre)
<Waldo> rapid release === mega-chillax about specifics of landings
<Waldo> but still
<Waldo> we need to be able to land stuff
<bsmith> Waldo: well, our team defined some goals as "get XXX into mozilla-central by the end of the quarter"
<bsmith> e.g. SPDYT
<bsmith> SPDY
<bsmith> In retrospect, that wasn't a great way to define the goal, but...
<Waldo> I think JS uses goals less than other teams may
* Waldo is pretty sure the tree will be opened again within a week
<Waldo> one way
<Waldo> or another
<Waldo> but how we get there, no idea at all
<bsmith> Yeah, but some features can't or shouldn't land unless they have *some* bake time on Nightly
<Waldo> I agree
* Notify: benadida is offline (
irc.mozilla.org).
* ewong|away is now known as ewong
<bsmith> jduell: I am thinking we could pull out at least SPDY, non-SSL parts of PSM, FTP, and other things that don't affect startup, into a new shared lib
<bsmith> with relatively little effort
<jduell> bsmith: any reason why we wouldn't just yank all of necko into its own shared lib?
<bsmith> and less chance of startup perf regressions.
<jduell> ah, startup
<bsmith> jduell: startup perf refressions
<jduell> Right. We might run into symbol sharing issues
<bsmith> jduell: and, extra disk seek to load the extra library
<bsmith> Oh.
<bsmith> Yeah
<jduell> Necko is written to assume that symbols are within the same shared lib. Not sure how much it would hit us in practice
<bsmith> gSocketThread
<bsmith> for example
<jduell> right,
* romeo has quit (Quit: Leaving)
* IRCMonkey26866 (To...@16BAC97A.933EA279.AC7F8427.IP) has joined #developers
<bsmith> The problem is the linker running out of memory
<bsmith> maybe it would help if we made more things static, that should/can be static
<bsmith> so the linker doesn't see them
<bsmith> Do we have some kind of code analysis tool that could help automate that?
<jduell> bsmith: maybe. Sounds like it wouldn't buy much time, given that the RAM need grows exponentially
<bsmith> And/or, elminate unnecessary #includes
<jduell> Taras would be the one to ask about the analysis tools.
<jduell> Not sure if elim #includes would help. I assume by PGO time we're only dealing with actual symbols referred-to?
<jduell> bsmith: But splitting all of necko out temporarily might not be the worst solution to this, short-term.
<bsmith> jduell: I think with PGO, the compiler runs in the linker, effectively
<bsmith> so, less source code to compile -> less memory
<bsmith> basically, the compiler just outputs an AST
<bsmith> and the linker does the compiling
<jduell> bsmith: ah, possibly. Depends on where in the compilation process we're running out of memory
* Hughman (
Hug...@moz-1727A300.static.tpgi.com.au) has joined #developers
<bsmith> If we did anything, how would we verify that it is "enough"
<bsmith> we might delay things by just a few days
<bsmith> or we could fix the problem basically permanently. how would we tell the difference
<bsmith> ?
<jduell> bsmith: I guess just watch how much RAM PGO takes up after we land fixed.
<jduell> fixes
<bsmith> Making more things static might not help for the PGO case, because all the code is going to be in memory
<bsmith> regardless of static vs extern
<bsmith> jduell: is there something bigger than necko, that we could pull out, that wouldn't likely affect startup?
<jduell> bsmith: I'm not the one to ask--bsmedberg, bz might know.
<bsmith> jduell: I think that necko is likely in the startup path, because of nsIURI et al., at least
<jduell> bsmith: how much do we expect startup to suffer from loading 2 DLLs instead of one?
<bsmith> same with PSM. That is why we have bugs on file to merge all NSS DLLs into libxul
* heycam is now known as heycam|away
<bsmith> I think taras told me 50ms-100ms per DLL, potentially
<bsmith> obviously, it depends on the system a lot
<bsmith> if you assume one seek per DLL
<jduell> bsmith: sounds like a reasonable hit to suffer for a week or two until we get a better fix.
<bsmith> then it is seek time * number of DLLs, at least
* jduell tries to find bug that changed necko to link into libxul
* jgilbert (
jgil...@moz-2B3CF81C.hsd1.ca.comcast.net) has joined #developers
* Notify: mcmanus is online (
irc.mozilla.org).
* mcmanus (
mcm...@moz-FE9B5BFD.twcny.res.rr.com) has joined #developers
<nthomas> fwiw, it's not a RAM issue (these machines have 8G), it's an address space issue
<nthomas> hmm, or 4G some of them, but still it's the 3G virtual address space available to the linker
<jduell> bsmith: oh, right, I think the change was actually that we changed libnecko to be part of the DOM lib. So orthogonal to libxul or not. Though we could probably still link it separately.
<jduell> But
https://bugzilla.mozilla.org/show_bug.cgi?id=674579#c2 makes it sound like we're small potatoes
<bsmith> nthomas: do we know if it is running out of memory in the compilation phase or the linking phase?
<jduell> bsmith: though not as small as the other things listed in that comment
<bsmith> nthomas: or are the two phases of the LTO compiletely intertwined?
<nthomas> I trust the people who have been looking at inbound and say it's the linker
<khuey> njn: why can't you use .get?
<bsmith> jduell: I think the problem is that every module needs to get Init()d during startup
<bsmith> e.g. because it must be inited on the main thread, and that is the only place we can somehow guarantee that
* Waldo wonders why the default fedora 15 mirrors seem to be so dog-slow right now
<khuey> bsmith: in PGO the compilation phase and the linking phase are basically the same thing
<khuey> code generation is deferred until linking
<jduell> bsmith: I'm not an XPCOM whiz, but what's the problem with Init being called for each module.
<jduell> ?
<khuey> also, Necko is totally not on my list of things to split out
<khuey> we should start with video codecs
<khuey> snappy
<khuey> other non-XPCOM stuff that isn't important
* cjones has quit (Quit: Leaving)
<bsmith> jduell: whatever modules you split out, you'd have to load at startup anyway, and then pay that seek cost
* tH has quit (Quit: ChatZilla 0.9.87-rdmsoft [XULRunner
1.9.0.1/2008072406])
<jduell> khuey: agree with you on long run, but for short term fix to get tree open?
<Unfocused> could shove them all in omni.ja, with a custom dlopen like android has
<khuey> the problem is not the runtime linker ...
<bsmith> khuey: The reason we are discussing Necko is because that is what jduell and I could most reasonably control
<khuey> sure
<jduell> And we know it was split out into its own lib not long ago, and likely to not have external link issues. But that may be true of codecs, etc, too
<bsmith> The thing is, SPDY wasn't a *huge* patch. A couple thousand new lines
<khuey> I wouldn't assume that it deosn't have external linking issues
<khuey> w'eve done a fair amount of deCOM since we killed libxul
<edmorley> i've just built seamonkey (after fixing it post libreg bustage); will things go horribly wrong if I use the same objdir to build thunderbird (to hopefully save build time)?
<khuey> yeah, we're standing right on the edge of the cliff now :-/
<bsmith> what kind of linker problems should we expect, if we split out (part of) Necko into its own library?
<jduell> khuey: yeah, but the necko APIs really don't refer to things higher up in the food chain. Until a few years ago we were committed to being able to ship it as a separare product
<bsmith> I could see references to global variables being a problem
<bsmith> but, i think most of our global variable usage being DEBUG-only
<khuey> jduell: but other things might be relying on necko symbols now
<khuey> bsmith: any non-virtual call into Necko will fail
<bsmith> khuey: but, won't we know what breaks at build time?
<khuey> right
<khuey> you will
<bsmith> why would any non-virtual call fail?
<bsmith> we would have to export all the symbols being relied on from the new library, of course
<jduell> khuey: it's possible, but I'd still be mildly surprised.
<khuey> we really don't want to export those symbols though
<bsmith> jduell: I suggest that we have this discussion this time tomorrow.
<bsmith> Maybe the problem will be solved by then
<bsmith> and if not, then our team sohuld find some way to resolve the issue locally
<jduell> bsmith: yeah, fair enough. I've got to run anyway.
* khuey should write that email that he's been thinking about
* glob|away is now known as glob
<bsmith> Is there a place where the log of #developers is kept, that I can link to?
* cjones (cjo...@B07356C0.695C1090.1F72B910.IP) has joined #developers
<roc> I don't think we should be trying to break things out of libxul
<roc> that way lies madness
* gwagner (
ide...@moz-E5023712.dynamic.hinet.net) has joined #developers
<mbrubeck> bsmith: The only public IRC logs I know of for
irc.mozilla.org are at
http://irclog.gr/ and they don't seem to include #developers
<bsmith> mbrubeck: thanks
<bsmith> roc: I agree.
<glob> mbrubeck, do we want a logging bot in here?
<bsmith> But, depending on net reduction of code size using dead code removal seems unrealistic too
<philor> madness? this is mozilla!
<mbrubeck> I don't know if there's a reason there hasn't been a logging bot here. It seems like a good idea to have one, to me.
<bsmith> roc: so, it seems like making a major change to the build system or build machines is required
<bsmith> or pulling things out
* logbot (
log...@moz-785868D2.glob.com.au) has joined #developers
* heycam|away is now known as heycam
<roc> just fix bug 709480. If someone can get Win64-built 32-bit builds coming out of tryserver, as Nick started doing there, we'll be a long way towards solving this
<khuey> can we do that without changing the compiler version?
<khuey> I thought the win64 machines didn't have MSVC 2005
* dbaron (
dba...@moz-27F87443.dsl.dynamic.sonic.net) has joined #developers
* ChanServ gives channel operator status to dbaron
* cjones has quit (Ping timeout)
<mfinkle> I have some Java and JS patches to land for Mobile
<mfinkle> who is handing out approvals?
<khuey> a=me
<mfinkle> thank you
<nigelb> as long as its not C++, its fine? :)
<khuey> pretty mcuh
<khuey> *much
<Unfocused> and i have a small js patch
* gwagner has quit (Ping timeout)
* hub (h...@83874EA1.EB7C1AF9.6F478678.IP) has joined #developers
* gal (g...@A2413EEC.695C1090.1F72B910.IP) has joined #developers
<gal> nthomas: ping
<firebot> Check-in:
http://hg.mozilla.org/mozilla-central/rev/71dfb2adaf0f - Chris Peterson - Bug 706984 - Check whether profile directory exists to avoid NullPointerException. r=dougt a=khuey
<firebot>
http://hg.mozilla.org/mozilla-central/rev/1ce022be38d4 - Mark Finkle - Bug 709048 - Over usage of haptic buzz [r=mbrubeck a=khuey]
* tonymec (ton...@48AA3C8A.C18A3479.277517C1.IP) has joined #developers