Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

devtools re-fetching problem

8 views
Skip to first unread message

Tom Tromey

unread,
Dec 18, 2015, 4:56:12 PM12/18/15
to dev-tech...@lists.mozilla.org
I mentioned this on #necko the other day.

We've had a "re-fetching" problem in devtools for a while now.

The basic issue is that the platform drops the original text of some
things, like style sheets. (Instead just the parsed form is preserved.)

So, in order to make some tools work with the source text, devtools
re-fetches the sources. However, this is sub-optimal -- maybe the
sources have changed on the server, leading to confusing results.

We were wondering if there was some way we could solve this problem.

One idea we had is to always cache these things, in a way that would let
the devtools retrieve them. Is this possible?

Or ... some other idea here.


On irc jduell mentioned:

<jduell> tromey: I think you want to set the LOAD_ONLY_FROM_CACHE loadflag on
your network request

... which isn't really want we want, I think (but which in context made
perfect sense) -- but maybe a Plan B would be to use that to at least
inform users: "hey, devtools had to re-fetch this, be warned".

Tom

Valentin Gosu

unread,
Dec 18, 2015, 5:54:54 PM12/18/15
to Tom Tromey, dev-tech-network
On 18 December 2015 at 22:56, Tom Tromey <ttr...@mozilla.com> wrote:

> I mentioned this on #necko the other day.
>
> We've had a "re-fetching" problem in devtools for a while now.
>
> The basic issue is that the platform drops the original text of some
> things, like style sheets. (Instead just the parsed form is preserved.)
>
> So, in order to make some tools work with the source text, devtools
> re-fetches the sources. However, this is sub-optimal -- maybe the
> sources have changed on the server, leading to confusing results.
>
> We were wondering if there was some way we could solve this problem.
>
> One idea we had is to always cache these things, in a way that would let
> the devtools retrieve them. Is this possible?
>
> Or ... some other idea here.
>
>
Is the problem that we're not caching these resources? Maybe because of the
response headers?
Or is there another reason that we can't use the previously downloaded
files?


>
> On irc jduell mentioned:
>
> <jduell> tromey: I think you want to set the LOAD_ONLY_FROM_CACHE loadflag
> on
> your network request
>
> ... which isn't really want we want, I think (but which in context made
> perfect sense) --


nsIRequest::LOAD_FROM_CACHE should disable revalidation, and give you what
is in the cache. It falls back on the network if the resource is missing,
but if you combine it with nsICachingChannel::LOAD_ONLY_FROM_CACHE, it
would fail it the resource wasn't cached.

Of course, if the file is not cached at all, this wouldn't work, and a
re-fetch would be necessary :)

Ehsan Akhgari

unread,
Dec 18, 2015, 6:15:59 PM12/18/15
to Tom Tromey, dev-tech...@lists.mozilla.org
On 2015-12-18 4:56 PM, Tom Tromey wrote:
> I mentioned this on #necko the other day.
>
> We've had a "re-fetching" problem in devtools for a while now.
>
> The basic issue is that the platform drops the original text of some
> things, like style sheets. (Instead just the parsed form is preserved.)
>
> So, in order to make some tools work with the source text, devtools
> re-fetches the sources. However, this is sub-optimal -- maybe the
> sources have changed on the server, leading to confusing results.
>
> We were wondering if there was some way we could solve this problem.
>
> One idea we had is to always cache these things, in a way that would let
> the devtools retrieve them. Is this possible?

We can't cache the resources if the web server tells us "don't cache
this", so even if we use things such as LOAD_ONLY_FROM_CACHE, we still
need a solution for that case.

> Or ... some other idea here.

How about forcing Gecko retain the original source text when devtools
are being used?

Jason Duell

unread,
Dec 19, 2015, 6:37:33 PM12/19/15
to Ehsan Akhgari, dev-tech-network, Bambas, Honza, Tom Tromey, Michal Novotny
On Fri, Dec 18, 2015 at 6:15 PM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:
Sounds like we need something like a CACHE_GECKO_COPY flag that keeps a
"secret" cache only for internal gecko use?

1) if the resource would normally be cached, does nothing (the resource
gets cached normally)
2) If the resource wouldn't be cached, cache it (perhaps only in RAM? Or
maybe we'd need that only if INHIBIT_PERSISTENT_CACHING is set), with some
sort of flag that indicates it should be invisible to cache reads unless
CACHE_HIDDEN_COPY is again present.

Jason

P.S. Honza/Michal: right now nsIRequest.idl says that "For HTTPS,
[INHIBIT_PERSISTENT_CACHING] is set automatically." That's out of date now
IIRC--we should change the comment.

Honza Bambas

unread,
Jan 6, 2016, 8:52:32 AM1/6/16
to dev-tech...@lists.mozilla.org
On 12/20/2015 0:36, Jason Duell wrote:
> On Fri, Dec 18, 2015 at 6:15 PM, Ehsan Akhgari <ehsan....@gmail.com>
> wrote:
>
>> On 2015-12-18 4:56 PM, Tom Tromey wrote:
>>
>>> I mentioned this on #necko the other day.
>>>
>>> We've had a "re-fetching" problem in devtools for a while now.
>>>
>>> The basic issue is that the platform drops the original text of some
>>> things, like style sheets. (Instead just the parsed form is preserved.)
>>>
>>> So, in order to make some tools work with the source text, devtools
>>> re-fetches the sources. However, this is sub-optimal -- maybe the
>>> sources have changed on the server, leading to confusing results.
>>>
>>> We were wondering if there was some way we could solve this problem.
>>>
>>> One idea we had is to always cache these things, in a way that would let
>>> the devtools retrieve them. Is this possible?
>>>
>> We can't cache the resources if the web server tells us "don't cache
>> this", so even if we use things such as LOAD_ONLY_FROM_CACHE, we still need
>> a solution for that case.
>>
>> Or ... some other idea here.
>> How about forcing Gecko retain the original source text when devtools are
>> being used?
>
>
> Sounds like we need something like a CACHE_GECKO_COPY flag that keeps a
> "secret" cache only for internal gecko use?

If that would be enforced only for active devtools, we could go with that.

>
> 1) if the resource would normally be cached, does nothing (the resource
> gets cached normally)
> 2) If the resource wouldn't be cached, cache it (perhaps only in RAM?

Easy to evict on devtools close but easy to waste memory...

> Or
> maybe we'd need that only if INHIBIT_PERSISTENT_CACHING is set), with some
> sort of flag that indicates it should be invisible to cache reads unless
> CACHE_HIDDEN_COPY is again present.

Such a devtools cache is very hard to build on the Necko level. We need
new flags or somehow expose the info on the loadgroup. If we use HTTP
force caching, we need to ensure it goes away with closing the devtools
window. Being in-memory only might be the way here. Use the
defer-caching capability from bug 1203113 could be other way to achieve
it (we are save with eviction on devtools close and also don't waste
memory!)

Lot of work. Don't we have more important stuff to do? ;)

Also, this caching enforce on the HTTP level would actually break the
web - the behavior would simply be different. I think this kind of
caching should do devtools, not necko.


-hb-

>
> Jason
>
> P.S. Honza/Michal: right now nsIRequest.idl says that "For HTTPS,
> [INHIBIT_PERSISTENT_CACHING] is set automatically." That's out of date now
> IIRC--we should change the comment.
> _______________________________________________
> dev-tech-network mailing list
> dev-tech...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-network
>

Jason Duell

unread,
Jan 6, 2016, 8:09:19 PM1/6/16
to Honza Bambas, Tom Tromey, dev-tech-network
On Wed, Jan 6, 2016 at 8:52 AM, Honza Bambas <hba...@mozilla.com> wrote:

> I think this kind of caching should do devtools, not necko.

If I read you correctly, Honza, you're saying devtools should cache the
original page content, not necko. But I'm not sure that this is
possible--I assume they don't have access to the original data, just the
post-parsed DOM and whatnot. (Is that true?)

>> some sort of flag that indicates it should be invisible to cache reads
unless CACHE_HIDDEN_COPY is again present.
>
> Such a devtools cache is very hard to build on the Necko level. We need
new flags or somehow expose the info on the loadgroup.

So I understand that we'd need a new flag. I don't understand why that
would be very hard (but I don't know the cache code). Couldn't we just
have docshell or uriloader, etc, set the HIDDEN_COPY flag if devtools is
active?

> If we use HTTP force caching, we need to ensure it goes away with closing
the devtools window. Being in-memory only might be the way here.

I know in the old cache we had namespaces (and with B2G we have storage per
appid). Could we just use some magic appID for items that we force-cache
for devtools, and then blow it away either at shutdown or when devtools are
closed?

> this caching enforce on the HTTP level would actually break the web - the
behavior would simply be different.

Well, right--we'd only want to read hidden entries if it's the devtools
code that's asking for them. But that should be a matter of having a
loadFlag.

> nsIRequest::LOAD_FROM_CACHE should disable revalidation, and give you what
> is in the cache. It falls back on the network if the resource is missing,
> but if you combine it with nsICachingChannel::LOAD_ONLY_FROM_CACHE, it
> would fail it the resource wasn't cached.

Tom: could you do a quick experiment to see what % of resources you wind up
getting correctly (i.e. from the cache, without revalidation or hitting the
network) if you use LOAD_FROM_CACHE? I'm wondering if using that (plus
maybe LOAD_ONLY_FROM_CACHE, as described above, so we could at least warn
developers when we did need to re-fetch the source), would be good enough
for now.

> Lot of work. Don't we have more important stuff to do?

Good question. I don't know the answer yet. I think the results of the
experiment I suggest would be useful--if we get a high enough hit rate in
practice, then maybe we don't need to do the HIDDEN_COPY stuff for now.

Jason
--

Jason

Michal Novotny

unread,
Jan 7, 2016, 10:03:15 AM1/7/16
to dev-tech...@lists.mozilla.org
On 12/20/2015 12:36 AM, Jason Duell wrote:
>> We can't cache the resources if the web server tells us "don't cache
>> this", so even if we use things such as LOAD_ONLY_FROM_CACHE, we still need
>> a solution for that case.
>>>

I might be wrong, but I think we cache resources whenever possible.
no-store entries are kept in memory and no-cache entries are never
reused, but we have them for viewsource etc...


>> How about forcing Gecko retain the original source text when devtools are
>> being used?
>
> Sounds like we need something like a CACHE_GECKO_COPY flag that keeps a
> "secret" cache only for internal gecko use?
>
> 1) if the resource would normally be cached, does nothing (the resource
> gets cached normally)
> 2) If the resource wouldn't be cached, cache it (perhaps only in RAM? Or
> maybe we'd need that only if INHIBIT_PERSISTENT_CACHING is set), with some
> sort of flag that indicates it should be invisible to cache reads unless
> CACHE_HIDDEN_COPY is again present.

How does this differ from what we already do as described above?

Btw, I think it's really hard to solve this on the cache level. Let's
say there is some no-cache, no-store resource which is different on
every load. We load it in 2 tabs so every tab has a different version.
We would need to keep multiple different copies that we received from
the server. And how would devtools identify which one it wants?

Michal

Honza Bambas

unread,
Jan 7, 2016, 11:41:59 AM1/7/16
to Jason Duell, Tom Tromey, dev-tech-network
I'll first try to sum what we actually want here:
- devtools want access to the raw content network requests already made
w/o rerequsting from network and just by doing the request again (use
the channel to obtain it)
- hence, when devtools are open, stuff that would not be cached (based
on load flags or response headers) we want to temporarily cache anyway
(best in some temp cache, ideally deferred)
- when devtools want the content again, use LOAD_FROM_CACHE |
LOAD_ONLY_FROM_CACHE flags (have to verify) or some devtools-specific
thing set on the channel to force use of the cached content
- when these flags are specified and a cache entry is not found in a
usual place, look into the temporary deferred cache we create just for
devtools
- if it's not found in any cache (usual nor devtools specific) -> fail
the load, so that devtools can decide to go to the network and
potentially prompt the user before that (or show something in the UI)

Makes sense?

Note that I really would like to avoid hacking this with appid or so.
We have patches for deferred caching already, relatively simple, just
waiting for review from Michal. He is not doing them because for the
original purpose (NSec) they may change even more. But personally, if
defer commit founds another usage purpose, we can review and land now,
updates for NSec can be done incrementally.

-hb-

On 1/7/2016 2:08, Jason Duell wrote:
> On Wed, Jan 6, 2016 at 8:52 AM, Honza Bambas <hba...@mozilla.com> wrote:
>
>> I think this kind of caching should do devtools, not necko.
> If I read you correctly, Honza, you're saying devtools should cache the
> original page content, not necko. But I'm not sure that this is
> possible--I assume they don't have access to the original data, just the
> post-parsed DOM and whatnot. (Is that true?)
>
>>> some sort of flag that indicates it should be invisible to cache reads
> unless CACHE_HIDDEN_COPY is again present.
>>>>> We can't cache the resources if the web server tells us "don't cache
>>>> this", so even if we use things such as LOAD_ONLY_FROM_CACHE, we still
>>>> need
>>>> a solution for that case.
>>>>
>>>> Or ... some other idea here.
>>>> How about forcing Gecko retain the original source text when devtools are
>>>> being used?
>>>>
>>>
>>> Sounds like we need something like a CACHE_GECKO_COPY flag that keeps a
>>> "secret" cache only for internal gecko use?
>>>
>> If that would be enforced only for active devtools, we could go with that.
>>
>>
>>> 1) if the resource would normally be cached, does nothing (the resource
>>> gets cached normally)
>>> 2) If the resource wouldn't be cached, cache it (perhaps only in RAM?
>>>
>> Easy to evict on devtools close but easy to waste memory...
>>
>> Or
>>> maybe we'd need that only if INHIBIT_PERSISTENT_CACHING is set), with some
>>> sort of flag that indicates it should be invisible to cache reads unless
>>> CACHE_HIDDEN_COPY is again present.
>>>

Honza Bambas

unread,
Jan 7, 2016, 11:54:03 AM1/7/16
to Michal Novotny, dev-tech...@lists.mozilla.org
On 1/7/2016 15:50, Michal Novotny wrote:
> On 12/20/2015 12:36 AM, Jason Duell wrote:
>>> We can't cache the resources if the web server tells us "don't cache
>>> this", so even if we use things such as LOAD_ONLY_FROM_CACHE, we
>>> still need
>>> a solution for that case.
>>>>
>
> I might be wrong, but I think we cache resources whenever possible.
> no-store entries are kept in memory and no-cache entries are never
> reused,

no-cache entries are used and very often. no-cache means to revalidate
(do a conditional request) if possible (having LM/Etag).

> but we have them for viewsource etc...
>
>
>>> How about forcing Gecko retain the original source text when
>>> devtools are
>>> being used?
>>
>> Sounds like we need something like a CACHE_GECKO_COPY flag that keeps a
>> "secret" cache only for internal gecko use?
>>
>> 1) if the resource would normally be cached, does nothing (the resource
>> gets cached normally)
>> 2) If the resource wouldn't be cached, cache it (perhaps only in
>> RAM? Or
>> maybe we'd need that only if INHIBIT_PERSISTENT_CACHING is set), with
>> some
>> sort of flag that indicates it should be invisible to cache reads unless
>> CACHE_HIDDEN_COPY is again present.
>
> How does this differ from what we already do as described above?
>
> Btw, I think it's really hard to solve this on the cache level. Let's
> say there is some no-cache, no-store resource which is different on
> every load. We load it in 2 tabs so every tab has a different version.

A deferred cache storage would keep a storage per-tab (ideally). So, no,
it would be separated between tabs.

> We would need to keep multiple different copies that we received from
> the server. And how would devtools identify which one it wants?

As said above, docshell or chrome window would be the ident (holder of
the deferred cache storage)

-hb-

>
> Michal

Jason Duell

unread,
Jan 7, 2016, 1:23:45 PM1/7/16
to Honza Bambas, Tom Tromey, dev-tech-network, Michal Novotny
Tom:

After talking to Honza and Michal some more, and given this:

> I think we cache resources whenever possible. no-store entries are kept
in memory and no-cache entries
> are never reused, but we have them for viewsource etc.

I think our next step is for you guys to try using LOAD_FROM_CACHE |
LOAD_ONLY_FROM_CACHE and see what % of your resources are/aren't in the
cache. That will give us some baseline for understanding how big of an
issue we have here.

If you're around next Thursday at around 9:30 AM PST, you could also join
our necko conference call and we could talk synchronously, which might be
useful.

Jason
--

Jason
0 new messages