Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Adding checksums to source maps

49 views
Skip to first unread message

Andy Sterland

unread,
Jun 9, 2014, 6:43:41 PM6/9/14
to dev-js-s...@lists.mozilla.org, Ron Buckton
One of the features we're adding to the IE F12 developer tools is the ability for developers to load a source map for any file in the debugger without the need for a comment in the file or in a http header. This would enable scenarios where developers have shipped generated JavaScript files to a server and either strip out comments or simply don't want to publish the sourcemap file and/or sources. Basically a mechanism to load symbols :).

This feature would decouple the link between the generated file and the source map which makes it easier to load a source map unrelated to either the source or generated files. In cases where the developer has loaded the wrong source map we'd like to warn them in the debugger that the sources don't match so they can load another source map or carry on regardless. To do this we need to do know that the sources don't all line up which can be worked out by embedding a checksum for both the generated and source files in the source map which the consumer (F12) can them compare. Of course it would be up to the consumer and producers to do what they felt best and these properties would be optional. Right now we're looking at adding them to the TypeScript compiler and F12.

Basically the proposal would add two new properties that would contain the checksums. The proposal below has them as Murmur 2 hashes but this is a key point to get feedback on especially if it should be something like SHA1, SHA2, MD5, CRC32 etc. Overall we choose Murmur v2 because it's fast, widely available, and used in the TypeScript compiler already.

x_ms_fileMurmur2Hash - This property of type number would be the Murmur2 generated hash for the file. This hash should match the entire contents of the file as received by the debugger. Consumers may take liberties with this to account for other build steps for example the hash could be of the entire file excluding any comments at the end of the file*.

x_ms_sourcesMurmur2Hash - This property is an array of numbers each of which is an Murmur2 generated hash for a source file correlated by their indices in sources.

Example:
{
"version": 3,
"file": "combined.js",
"sources": ["a.ts", "b.ts"],
"names": ["method"],
"mappings": "AAAA,...",
"x_ms_fileMurmur2Checksum": 398506827,
"x_ms_sourcesMurmur2Checksum": [714841382, 473447668 ]
}

Thoughts?

-Andy

* It's a bit of a complication but our implementation in F12/VS needs to take a checksum of the generated file that excludes trailing comments at the end of the file. We need to do this mainly because some build systems like to add a signature at the end of JavaScript files as part of a signing process that happens after compilation. Thus the debugger that consume a source map and uses this feature will need to be tolerant and compare a checksum that excluded any trailing comments in the file (inc. the sourceMappingURL comment).


John Lenz

unread,
Jun 9, 2014, 10:22:39 PM6/9/14
to Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
One concern I raised when this was discussed previously was whether or not
to ignore or normalize line endings when calculating the hash. Currently
source maps are line ending agnostic. And it would be good if it worked
for inlined scripts.

Ignoring ending comments is an interesting complication. I'm not sure what
I think about it.
> _______________________________________________
> dev-js-sourcemap mailing list
> dev-js-s...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>

Brian Slesinsky

unread,
Jun 9, 2014, 11:44:02 PM6/9/14
to John Lenz, Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
I wonder if it would make as much sense for the comment at the end of the
JavaScript file to have the checksum of its sourcemap? The debugger would
then use this as a key to look up the proper sourcemap in a
content-addressable database somehow.

This would get us away from using URL's which is convenient in some ways;
the sourcemap and source files can be stored in a private or public
repository and the debugger configured to look up sourcemaps there.

But I'm not sure which use cases you're thinking about. What does
"unrelated" when you say "a source map unrelated to either the source or
generated files?" Surely there's some relationship or it wouldn't work.

- Brian

Andy Sterland

unread,
Jun 10, 2014, 1:23:16 AM6/10/14
to Brian Slesinsky, John Lenz, dev-js-s...@lists.mozilla.org, Ron Buckton
(Not sure if convention on the group is for inline commenting or top posting...)

Re: Dealing with normalized line endings
I’m not sure what to do in this case. Naively I was imagining that the checksum would be calculated without any extra transformation on the input to take into account line ending normalization. The checksum would take the input as a JavaScript string. In general I think the source map would either have to be explicit that a transformation happened or, by default, assume that none happened.

Though ultimately a consumer of the checksum could run the checksum with both normalized and not-so-normalized endings and if either match run with it. Though it kind of defeats the purpose of specifying it but having more permissive clients would still get the same end result as the chance of hash collisions would be so low but perf would be a pain as the combination of all the ‘permissive’ states could quickly get out of hand ☺. So probably not a great idea.

Re: Checksum in comment
I think that ignoring end comments is something that F12 would do simply to be more permissive rather than something that the spec would even need to cover.

Re: Checksum in file
Having the checksum in the URL would help validate that a source map is for that file but I think we’d still need something to verify that the sources are for the source map. Though for our scenarios I think we still need to have something that is encapsulated in the source map. A solution that requires the developer to ship the comment at the end of the file would lock out developers who need to debug in cases where they can’t have comments in the file (say on a production server with comments stripped).

Re: Unrelated files
By unrelated I meant that an sourcemap chosen by a developer from the debugger may not be for the generated file or the source file. In the case of the generated file an example of where it would be broken is if the developer picked the sourcemap for a different version of the generated file. In the case of source files the most common broken, unrelated, use case is also likely to be mismatched versions. For example if the developer updated the generated file on the server but not the source files the relative paths might all resolve and files would be fetched but the source files wouldn’t match the actual source files. In both cases if the files don’t match it can be a very subtle issue for a developer to spot thus the need to programmatically verify the symbols are for the right files.

From: Brian Slesinsky [mailto:skyb...@google.com]
Sent: Monday, June 9, 2014 8:44 PM
To: John Lenz
Cc: Andy Sterland; dev-js-s...@lists.mozilla.org; Ron Buckton
Subject: Re: Adding checksums to source maps

I wonder if it would make as much sense for the comment at the end of the JavaScript file to have the checksum of its sourcemap? The debugger would then use this as a key to look up the proper sourcemap in a content-addressable database somehow.

This would get us away from using URL's which is convenient in some ways; the sourcemap and source files can be stored in a private or public repository and the debugger configured to look up sourcemaps there.

But I'm not sure which use cases you're thinking about. What does "unrelated" when you say "a source map unrelated to either the source or generated files?" Surely there's some relationship or it wouldn't work.

- Brian

On Mon, Jun 9, 2014 at 7:22 PM, John Lenz <conca...@gmail.com<mailto:conca...@gmail.com>> wrote:
One concern I raised when this was discussed previously was whether or not
to ignore or normalize line endings when calculating the hash. Currently
source maps are line ending agnostic. And it would be good if it worked
for inlined scripts.

Ignoring ending comments is an interesting complication. I'm not sure what
I think about it.
> dev-js-s...@lists.mozilla.org<mailto:dev-js-s...@lists.mozilla.org>
> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>
_______________________________________________
dev-js-sourcemap mailing list
dev-js-s...@lists.mozilla.org<mailto:dev-js-s...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-js-sourcemap

Brian Slesinsky

unread,
Jun 10, 2014, 2:34:01 AM6/10/14
to Andy Sterland, dev-js-s...@lists.mozilla.org, John Lenz, Ron Buckton
On Mon, Jun 9, 2014 at 10:23 PM, Andy Sterland <Andy.S...@microsoft.com>
wrote:

> (Not sure if convention on the group is for inline commenting or top
> posting...)
>

I don't think we have any particular convention yet? I'm not consistent
about it.

Re: Checksum in file
>
> Having the checksum in the URL would help validate that a source map is
> for that file but I think we’d still need something to verify that the
> sources are for the source map.
>

To clarify, I was thinking of a checksum as an alternative to a URL, rather
than part of it. The debugger could ignore a given URL and generate its own
(perhaps to a private server) that includes the checksum, or perhaps the
debugger wouldn't even use HTTP(S) to fetch the sourcemap. But if the
debugger does use the given URL, it would probably make sense to pass the
checksum as well, perhaps as a query parameter or HTTP header. (Since the
debugger doesn't actually need to do any checksum calculation but just
hands back what it was given, it's actually more of an opaque token in this
scenario.)

On the other hand, it would also make sense to use the checksum for its
original purpose (detecting corrupted data). I think we might want to
checksum just the JavaScript lines that are actually included in the
"mappings" field, not including trailing blanks or line endings, and
perhaps use a different line separator like a pipe when computing the
checksum. Having the actual checksum as well at the end of the file might
be useful since the debugger could detect file corruption without even
doing a sourcemap lookup. It also makes it easier to detect
interoperability bugs where the compiler and debugger disagree about how to
calculate a checksum.

I didn't comment on the checksums for the source files because I'm okay
with that part. (The checksum points in the same direction as the URL, so
it would already serve as an alternative lookup mechanism.)

Though for our scenarios I think we still need to have something that is
> encapsulated in the source map. A solution that requires the developer to
> ship the comment at the end of the file would lock out developers who need
> to debug in cases where they can’t have comments in the file (say on a
> production server with comments stripped).
>

I don't understand this use case yet. If they strip comments, that changes
both the sourcemap and the checksum, right? It seems like a JavaScript
minimizer has to generate a new sourcemap anyway, in which case it could
also calculate a new checksum and add a new comment.

Also, putting in a checksum doesn't reveal any information that you don't
already have (unless there is file corruption), so it seems safe to leave
it in?


>
> Re: Unrelated files
>
> By unrelated I meant that an sourcemap chosen by a developer from the
> debugger may not be for the generated file or the source file. In the case
> of the generated file an example of where it would be broken is if the
> developer picked the sourcemap for a different version of the generated
> file. In the case of source files the most common broken, unrelated, use
> case is also likely to be mismatched versions. For example if the developer
> updated the generated file on the server but not the source files the
> relative paths might all resolve and files would be fetched but the source
> files wouldn’t match the actual source files. In both cases if the files
> don’t match it can be a very subtle issue for a developer to spot thus the
> need to programmatically verify the symbols are for the right files.
>

Okay, I misunderstood that part. I was thinking more about automatically
finding the right file rather than asking the developer to do it and
detecting mismatches.

- Brian

Conrad Irwin

unread,
Jun 10, 2014, 3:21:57 AM6/10/14
to Brian Slesinsky, John Lenz, Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
I like this idea in general, but am ambivalent about the proposal in
its current form. At Bugsnag we also want a mechanism for identifying
which source-maps go with which minified code, without the developer
making the source-map public.

The problem is equivalent to the problem that iPhone developers have
for debug symbols in iPhone apps. The app (equivalent to the minified
javascript) has a dSYM (equivalent to the sourcemap) with all the
debug information. To tell whether they are related, they both contain
a UUID. If the UUIDs match, you know you are looking at the right
thing. The nice thing about Apple's approach is that I don't have to
know how they implement the UUID (it turns out it's an MD5 of the text
section of the binary, so I could calculate it if I wanted), all I
have to do is compare two strings.

Is there anywhere in the minified javascript you could put the hash/uuid?

Ideally a comment like // @sourceMappingUUID= that post-processing
steps could be instructed to preserve. (There's no downside to
publishing a hash/uuid of the code). This has the doubly-nice property
that a developer can use the search feature of their laptop's
hard-drive to find the source-map given the dSYM.

I also fail to see the need for also hashing the sources, I guess
that's a separate problem where you want to be able to recover from
the build tools inserting a bogus sourceRoot?

Conrad
> _______________________________________________
> dev-js-sourcemap mailing list
> dev-js-s...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-js-sourcemap

Fitzgerald, Nick

unread,
Jun 10, 2014, 12:12:10 PM6/10/14
to dev-js-s...@lists.mozilla.org
On 6/9/14, 11:34 PM, Brian Slesinsky wrote:
> Re: Checksum in file
>> Having the checksum in the URL would help validate that a source map is
>> for that file but I think we’d still need something to verify that the
>> sources are for the source map.
>>
> To clarify, I was thinking of a checksum as an alternative to a URL, rather
> than part of it. The debugger could ignore a given URL and generate its own
> (perhaps to a private server) that includes the checksum, or perhaps the
> debugger wouldn't even use HTTP(S) to fetch the sourcemap. But if the
> debugger does use the given URL, it would probably make sense to pass the
> checksum as well, perhaps as a query parameter or HTTP header. (Since the
> debugger doesn't actually need to do any checksum calculation but just
> hands back what it was given, it's actually more of an opaque token in this
> scenario.)

This use case seems mostly beneficial to internal tools (such as
deobfuscating client-side error stacks on the server), and I don't feel
that it really needs to be mentioned and formalized in the source map
spec the way the `//# sourceMappingURL` comment is.

For the use case where you want to give the use an option to supply
their own source map but want to warn them if its the wrong one, having
the hashes in the source map itself is enough.

Brian Slesinsky

unread,
Jun 10, 2014, 12:32:30 PM6/10/14
to fit...@mozilla.com, dev-js-s...@lists.mozilla.org
It seems like we might have consensus that there should be a standard way
of computing a checksum on a JavaScript file.

We should probably have a standard "linecount" field (similar to
x_google_linecount) that should be used to decide which JavaScript lines to
include in the checksum. Then a tool can append arbitrary content to the
end of the JavaScript file without changing the sourcemap or the checksum.

This is off-topic, but it would be nice to give tools the chance to prepend
a JavaScript header as well. Perhaps have a "// begin sourcemap URL"
comment in the JavaScript that says where the mapped content begins. But
there's no backwards-compatible way to do this like there is with the
footer.



On Tue, Jun 10, 2014 at 9:12 AM, Fitzgerald, Nick <nfitz...@mozilla.com>
wrote:

John Lenz

unread,
Jun 11, 2014, 2:14:08 PM6/11/14
to Brian Slesinsky, dev-js-s...@lists.mozilla.org, fit...@mozilla.com
Let's start a separate thread for pretending. The index map helps here.

Andy Sterland

unread,
Jun 12, 2014, 6:09:44 PM6/12/14
to Conrad Irwin, Brian Slesinsky, John Lenz, dev-js-s...@lists.mozilla.org, Ron Buckton
I think the main reason I was leaning away from the embedded identifier approach is that one of the scenarios where the feature needs to work is when the generated JS file has no source map comments. That enables developers debugging production sites that strip comments to use source maps still. Take for example jQuery which now removes the sourceMappingURL comment from the minified jQuery file.

Perhaps the best way forward would be to support the a UUID embedded in the generated file for the common case of having source mapping comments in the generated file. Still keep, the optional, checksums for source files as I don't think modifying sources to embed a UUID is going to be feasible. On the plus side a UUID takes the work off of the compiler which is great as the perf impact of the proposal on compilers is large (if they want to support it).

Thoughts?


Fwiw the Microsoft's PDB symbol format makes use ID's* to match a PDB with the native image (DLL/EXE/Etc.) and uses a checksum in the PDB to match source files. If those ID's don't match then tools like VS refuse to load the symbols and tell the developer to go find a new PDB. If the checksum for the source don't match tools like VS warn the user of the mismatch but can let the developer carry on with the mismatched sources.

* The ID is a combination of the PDB name, a GUID and a version number.

-----Original Message-----
From: conrad...@gmail.com [mailto:conrad...@gmail.com] On Behalf Of Conrad Irwin
Sent: Tuesday, June 10, 2014 12:22 AM
To: Brian Slesinsky
Cc: Andy Sterland; dev-js-s...@lists.mozilla.org; John Lenz; Ron Buckton
Subject: Re: Adding checksums to source maps

I like this idea in general, but am ambivalent about the proposal in its current form. At Bugsnag we also want a mechanism for identifying which source-maps go with which minified code, without the developer making the source-map public.

The problem is equivalent to the problem that iPhone developers have for debug symbols in iPhone apps. The app (equivalent to the minified
javascript) has a dSYM (equivalent to the sourcemap) with all the debug information. To tell whether they are related, they both contain a UUID. If the UUIDs match, you know you are looking at the right thing. The nice thing about Apple's approach is that I don't have to know how they implement the UUID (it turns out it's an MD5 of the text section of the binary, so I could calculate it if I wanted), all I have to do is compare two strings.

Is there anywhere in the minified javascript you could put the hash/uuid?

Ideally a comment like // @sourceMappingUUID= that post-processing steps could be instructed to preserve. (There's no downside to publishing a hash/uuid of the code). This has the doubly-nice property that a developer can use the search feature of their laptop's hard-drive to find the source-map given the dSYM.

I also fail to see the need for also hashing the sources, I guess that's a separate problem where you want to be able to recover from the build tools inserting a bogus sourceRoot?

Conrad

On Mon, Jun 9, 2014 at 11:34 PM, Brian Slesinsky <skyb...@google.com> wrote:
> On Mon, Jun 9, 2014 at 10:23 PM, Andy Sterland
> <Andy.S...@microsoft.com>
> wrote:
>
>> (Not sure if convention on the group is for inline commenting or top
>> posting...)
>>
>
> I don't think we have any particular convention yet? I'm not
> consistent about it.
>
> Re: Checksum in file
>>
>> Having the checksum in the URL would help validate that a source map
>> is for that file but I think we’d still need something to verify that
>> the sources are for the source map.
>>
>
> To clarify, I was thinking of a checksum as an alternative to a URL,
> rather than part of it. The debugger could ignore a given URL and
> generate its own (perhaps to a private server) that includes the
> checksum, or perhaps the debugger wouldn't even use HTTP(S) to fetch
> the sourcemap. But if the debugger does use the given URL, it would
> probably make sense to pass the checksum as well, perhaps as a query
> parameter or HTTP header. (Since the debugger doesn't actually need to
> do any checksum calculation but just hands back what it was given,
> it's actually more of an opaque token in this
> scenario.)
>

Brian Slesinsky

unread,
Jun 12, 2014, 6:46:56 PM6/12/14
to Andy Sterland, Conrad Irwin, John Lenz, dev-js-s...@lists.mozilla.org, Ron Buckton
I think that sounds pretty reasonable. Agreed that we're not going to put
UUID's in source files, so a checksum makes more sense.

For matching the JavaScript to the sourcemap, I don't think we'd want to
require an official UUID, but rather any unique identifier generated in a
way that mismatches are unlikely. (For a build system that requires its
output to be a deterministic function of its input, we would want to use a
checksum of some sort.)

Perhaps put the scheme in front:

//# sourceMapId=sha1:3da541559918a808c2402bba5012f6c60b27661c

//# sourceMapId=uuid:db99a770-f281-11e3-ac10-0800200c9a66

This looks suspiciously like a URN so maybe we could just use that:
http://en.wikipedia.org/wiki/Uniform_resource_name

On Thu, Jun 12, 2014 at 3:09 PM, Andy Sterland <Andy.S...@microsoft.com>
wrote:

> I think the main reason I was leaning away from the embedded identifier
> approach is that one of the scenarios where the feature needs to work is
> when the generated JS file has no source map comments. That enables
> developers debugging production sites that strip comments to use source
> maps still. Take for example jQuery which now removes the sourceMappingURL
> comment from the minified jQuery file.
>

The jQuery file is here:
http://code.jquery.com/jquery-1.11.1.min.js

They have a comment at the top, so it seems they're not opposed to all
comments?

In the release notes [1], they say: "If you want to use the map file for
debugging the minified code, copy the minified file and add a //#
sourceMappingURL comment to the end of the file."

So they're aware of the issue and they could have kept the sourceMappingURL
comment if they wanted to, but decided it was better to make the developer
do it.

We would have to ask them why they took it out, but I would guess that one
issue is that it's unclear where the URL should point to avoid a broken
link. (Absolute versus relative URL, and so on.) This wouldn't be an issue
with a URN, so maybe they'd be fine with leaving it in?

- Brian

[1] http://blog.jquery.com/2014/05/01/jquery-1-11-1-and-2-1-1-released/

Ron Buckton

unread,
Jun 12, 2014, 6:53:51 PM6/12/14
to Brian Slesinsky, Andy Sterland, Conrad Irwin, John Lenz, dev-js-s...@lists.mozilla.org
Although, if this were to be a URN it would be: urn:uuid:db99a770-f281-11e3-ac10-0800200c9a66 as per RFC 4122.

RFC 4122 also specifies algorithms for generating a UUID from a string using either SHA1 or MD5, which means it could fill the same purpose as your first example below.

Ron

From: Brian Slesinsky [mailto:skyb...@google.com]
Sent: Thursday, June 12, 2014 3:47 PM
To: Andy Sterland
Cc: Conrad Irwin; dev-js-s...@lists.mozilla.org; John Lenz; Ron Buckton
Subject: Re: Adding checksums to source maps

I think that sounds pretty reasonable. Agreed that we're not going to put UUID's in source files, so a checksum makes more sense.

For matching the JavaScript to the sourcemap, I don't think we'd want to require an official UUID, but rather any unique identifier generated in a way that mismatches are unlikely. (For a build system that requires its output to be a deterministic function of its input, we would want to use a checksum of some sort.)

Perhaps put the scheme in front:

//# sourceMapId=sha1:3da541559918a808c2402bba5012f6c60b27661c

//# sourceMapId=uuid:db99a770-f281-11e3-ac10-0800200c9a66

This looks suspiciously like a URN so maybe we could just use that:
http://en.wikipedia.org/wiki/Uniform_resource_name

John Lenz

unread,
Jun 13, 2014, 11:05:47 AM6/13/14
to Brian Slesinsky, Conrad Irwin, Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
URN for this sounds pretty good to me but if we wanted a "normalized line
ending md5" how do we defined it?


On Thu, Jun 12, 2014 at 3:46 PM, Brian Slesinsky <skyb...@google.com>
wrote:

> I think that sounds pretty reasonable. Agreed that we're not going to put
> UUID's in source files, so a checksum makes more sense.
>
> For matching the JavaScript to the sourcemap, I don't think we'd want to
> require an official UUID, but rather any unique identifier generated in a
> way that mismatches are unlikely. (For a build system that requires its
> output to be a deterministic function of its input, we would want to use a
> checksum of some sort.)
>
> Perhaps put the scheme in front:
>
> //# sourceMapId=sha1:3da541559918a808c2402bba5012f6c60b27661c
>
> //# sourceMapId=uuid:db99a770-f281-11e3-ac10-0800200c9a66
>
> This looks suspiciously like a URN so maybe we could just use that:
> http://en.wikipedia.org/wiki/Uniform_resource_name
>
> On Thu, Jun 12, 2014 at 3:09 PM, Andy Sterland <
0 new messages