Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Symbolic resolution for locals using source maps

265 views
Skip to first unread message

Andy Sterland

unread,
Sep 20, 2014, 1:16:17 PM9/20/14
to dev-js-s...@lists.mozilla.org, Ron Buckton
Hi all,

One of the things that we (well mostly Ron) have been working on is an extension to the source map spec to enable debuggers to resolve identifiers that have been renamed during compilation. In TypeScript the classic of example is the this pointer which in TS is different from the JS this pointer. Of course renaming identifiers is also one of the primary functions of a minifier.

I've uploaded a gist (https://gist.github.com/asterland/edf028ed7947c8c258d1) with details of the extension we've implemented and it would be great to hear what folk think. Both of the specific extension and the general idea.

I'm happy to share the modified TypeScript compiler and private F12 bits if you're curious about trying out the proposal. Just email me.

-Andy

Brian Slesinsky

unread,
Sep 21, 2014, 2:00:21 AM9/21/14
to Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
Interesting. It seems like we could save space for a commonly-used global
symbol by specifying its mapping once and saying it applies over the entire
file, rather than doing it explicitly for each usage of the identifier as
the spec supports now.

But what about fields? For example, suppose you have two unrelated pair
types:

class Pair1 {
int x;
int y;
}

class Pair2 {
int x;
int y;
}

Suppose that for Pair1, the minifier chooses x=a, y=b and for Pair2, it
chooses x=b, y=a.

The minified code could look like

p1.a = p2.b;
p1.b = p2.a;

So I don't think scopes work for this and we would still have to explicitly
map each occurrence of a field.

- Brian
> _______________________________________________
> dev-js-sourcemap mailing list
> dev-js-s...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-js-sourcemap
>

Andy Sterland

unread,
Sep 22, 2014, 12:22:42 PM9/22/14
to Brian Slesinsky, dev-js-s...@lists.mozilla.org, Ron Buckton
Good catch. I forgot to mention the limitations of the design. Sorry!

Just pondering about your suggestion and want to make sure I’ve understood it ☺. I think the idea is to have another entry with a list of all global names in. Then when it comes to resolve symbols first check the scopes list, as a global could be overridden in a local scope (eugh!), and then to check the global list?

As you notified the big limitation is that fields can't be resolved. We had originally pushed that out to simplify the problem and tackle a subset that catches a lot of source mapped scenarios. In the long run it would be great to have a format that covers fields. Any thoughts on how best to embed type information or such to enable resolving of fields?

The other big open limitation is around the lack of expression evaluation. Expression evaluation would enable a developer to enter an expression in the source language and have it execute an inspect correctly. We had done some thinking here based on how Visual Studio works but it's probably a whole other conversation. On the plus side it would build on symbol resolution. Another problem that would be great to solve but likely after the field issue.


From: Brian Slesinsky [mailto:skyb...@google.com]
Sent: Saturday, September 20, 2014 11:00 PM
To: Andy Sterland
Cc: dev-js-s...@lists.mozilla.org; Ron Buckton
Subject: Re: Symbolic resolution for locals using source maps

Interesting. It seems like we could save space for a commonly-used global symbol by specifying its mapping once and saying it applies over the entire file, rather than doing it explicitly for each usage of the identifier as the spec supports now.

But what about fields? For example, suppose you have two unrelated pair types:

class Pair1 {
int x;
int y;
}

class Pair2 {
int x;
int y;
}

Suppose that for Pair1, the minifier chooses x=a, y=b and for Pair2, it chooses x=b, y=a.

The minified code could look like

p1.a = p2.b;
p1.b = p2.a;

So I don't think scopes work for this and we would still have to explicitly map each occurrence of a field.

- Brian


On Sat, Sep 20, 2014 at 10:16 AM, Andy Sterland <Andy.S...@microsoft.com<mailto:Andy.S...@microsoft.com>> wrote:
Hi all,

One of the things that we (well mostly Ron) have been working on is an extension to the source map spec to enable debuggers to resolve identifiers that have been renamed during compilation. In TypeScript the classic of example is the this pointer which in TS is different from the JS this pointer. Of course renaming identifiers is also one of the primary functions of a minifier.

I've uploaded a gist (https://gist.github.com/asterland/edf028ed7947c8c258d1) with details of the extension we've implemented and it would be great to hear what folk think. Both of the specific extension and the general idea.

I'm happy to share the modified TypeScript compiler and private F12 bits if you're curious about trying out the proposal. Just email me.

-Andy
_______________________________________________
dev-js-sourcemap mailing list
dev-js-s...@lists.mozilla.org<mailto:dev-js-s...@lists.mozilla.org>
https://lists.mozilla.org/listinfo/dev-js-sourcemap

Brian Slesinsky

unread,
Sep 22, 2014, 3:06:37 PM9/22/14
to Andy Sterland, dev-js-s...@lists.mozilla.org, Ron Buckton
On Mon, Sep 22, 2014 at 9:22 AM, Andy Sterland <Andy.S...@microsoft.com>
wrote:

> Good catch. I forgot to mention the limitations of the design. Sorry!
>
>
>
> Just pondering about your suggestion and want to make sure I’ve understood
> it J. I think the idea is to have another entry with a list of all global
> names in. Then when it comes to resolve symbols first check the scopes
> list, as a global could be overridden in a local scope (eugh!), and then to
> check the global list?
>

Yes, but I was thinking that this is implicit in your design. That is, a
scope is just a range in the JavaScript file for which a JavaScript->source
language symbol mapping applies, so a global scope is just a special case
of that, where the range is the entire file. Or do you expect there to be
more semantics around sourcemap scopes that requires them to actually
correspond to language-level scopes? If so, what can a debugger infer from
them?

As you notified the big limitation is that fields can't be resolved. We had
> originally pushed that out to simplify the problem and tackle a subset that
> catches a lot of source mapped scenarios. In the long run it would be great
> to have a format that covers fields. Any thoughts on how best to embed type
> information or such to enable resolving of fields?
>

Well I think the spec already covers that, if you're willing to pay the
overhead. The idea is that you split sourcemap ranges at the beginning and
end of each JavaScript identifier, and then set the name for that range.
This is fully general since it can be done individually for each usage of
JavaScript identifier. Of course this doesn't give you any semantics; it's
purely about translating symbol names. It's also somewhat expensive with
regards to sourcemap size, since otherwise you can usually get away with
one sourcemap range per source line. (I reduced the size of GWT sourcemaps
by 70% by not doing any symbol mapping at all and consolidating sourcemap
ranges, and it would be nice if we didn't have to undo this to get symbol
mappings back again. That's why I'm interested in more efficient ways to
map globals and some locals.)

Incidentally, I've been experimenting with an alternate way of doing symbol
mapping for JavaScript objects. The idea is that a JavaScript object can
have a special getter (called "$java" for GWT, but perhaps it could be
something generic like "$debug") that returns another JavaScript object
with an alternate view of the object that's more user-friendly. The
debugger then takes this object and displays it in the variable inspector
in the usual way, perhaps with a toggle to switch between JavaScript and
"pretty" variable views. This would allow the compiler to generate code
that does whatever translation makes sense in the source language; for
example we can just add the special getter to the java.lang.Object
prototype and all GWT objects pick it up.

This isn't formally related to the sourcemap spec at all; it would be a
convention in the generated JavaScript when compiled in debug mode. Chrome
lets you evaluate JavaScript getters in the variable inspector, so it sorta
works without debugger changes, though the UI is confusing.

- Brian

Fitzgerald, Nick

unread,
Sep 22, 2014, 4:22:34 PM9/22/14
to dev-js-s...@lists.mozilla.org
Awesome!

This is exactly the type of problem I was talking about solving in
http://fitzgeraldnick.com/weblog/55/

From what I've seen, the JS code generated by the TypeScript compiler
is pretty similar to the original TypeScript source. Because of that, it
may be tempting to take shortcuts that gloss over some details that
other compilers doing more extreme transformations (such as emscripten)
will have to deal with, but TypeScript doesn't. At first glance, I don't
see anything jumping out at me other than the object fields that Brian
brought up, but this is something we need to be vigilant about.

Let's definitely do this, but let's make sure we do this Right(tm) :)

Still going over this, but I have some questions:

* Why repeat starting line/column in the scope when that data is almost
assuredly already encoded in an existing mapping? Instead, can this
reference a start mapping (via an index into the mappings) on which the
scope begins?

* This isn't spelled out explicitly, but it seems that the locals data
is parallel to the scopes data, right? As in, the first scope's locals
are in the first segment in x_ms_locals, and the second scope's locals
are in the second segment of x_ms_locals, etc... One thing we didn't
originally do, but would be incredibly valuable moving forward would be
to specify algorithms for using the data, rather than just describing
the format of the encoded data. Would love to see the algorithm for this
here.

* In x_ms_locals, the second field is "a base 64 VLQ relative to the
first field in this segment." Why relative to the first field? Why not
relative to the last value? It seems like the latter would save more
space, but maybe I'm wrong?

* In x_ms_locals, the second field "is an identifier or expression for
the scope". Do you mean it is an identifier or expression to evaluate to
get the value of the given local binding? The wording would lead to me
believe that it describes the name of the scope, but this seems like the
wrong place for that. Which leads me to

* Is there a way to name a scope? Give it the
function/method/object/module/whatever name? This seems like something
that should be supported, if not now then in the future, which leads me to

* How do you imagine future extensions interacting with this? What if we
want to extend the locals data with a way to specify how the debugger
should display the given value (for example the value is a mori[0]
immutable map, but the debugger should display it's key-value pairs
directly instead of its internal representation. It seems to me that
object field renaming is a simple version of this problem. It's ok not
to solve every problem now, but the format must be future extensible to
accommodate when we start solving those other problems.

[0] https://swannodette.github.io/mori/

General thoughts:

* I like how you encode the scopes as an implicit tree, the same way
DWARF does with its DIEs[1]. I think this is definitely the way to go.

[1]
http://eli.thegreenplace.net/2011/09/29/an-interesting-tree-serialization-algorithm-from-dwarf

* I like using arbitrary JS expressions to get the value bound to a
local. A compiler is free to do whatever transformations and
optimizations it wants as long as the semantics remain the same, this
includes debugging-hostile things like exploding a struct's fields into
individual registers so there is no single place where the struct
exists. To deal with this kind of thing, DWARF uses its own location
operations and language that is Turing complete (and I hear from ex-GDB
maintainers that a fuzzer would likely knock down GDB's implementation
in a few minutes, but I digress). LDB[2] uses PostScript generated by
compilers to deal with this, re-target-ability, and multi-language
support. Its a hairy problem that pretty much requires evaluating
arbitrary code from the compiler. We already have a battle-hardened,
thoroughly-fuzzed, and sandboxed language that is implemented by all
browser vendors: JS. Big plus one for providing JS expressions to
evaluate and get a binding's value.

* Given that last point, I think it would make sense to also have a JS
prologue that would give compilers a chance to define common functions
that are re-used in the individual expressions used to get a binding's
value. This is going to cut down on space a bunch.

Looking forward to your thoughts,

Nick

On 9/20/14, 10:16 AM, Andy Sterland wrote:
> Hi all,
>
> One of the things that we (well mostly Ron) have been working on is an extension to the source map spec to enable debuggers to resolve identifiers that have been renamed during compilation. In TypeScript the classic of example is the this pointer which in TS is different from the JS this pointer. Of course renaming identifiers is also one of the primary functions of a minifier.
>
> I've uploaded a gist (https://gist.github.com/asterland/edf028ed7947c8c258d1) with details of the extension we've implemented and it would be great to hear what folk think. Both of the specific extension and the general idea.
>
> I'm happy to share the modified TypeScript compiler and private F12 bits if you're curious about trying out the proposal. Just email me.
>
> -Andy
> _______________________________________________
> dev-js-sourcemap mailing list
> dev-js-s...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-js-sourcemap

Fitzgerald, Nick

unread,
Sep 22, 2014, 4:50:26 PM9/22/14
to dev-js-s...@lists.mozilla.org
On 9/22/14, 9:22 AM, Andy Sterland wrote:
> As you notified the big limitation is that fields can't be resolved.

What use case do we need field resolution for?

It seems to me that the expression that gets the binding's value could
just do the translation itself and return a "display" value rather than
the "internal" value. A struct might not even have a single "actual"
value in its compiled form and instead have its fields in local values
on the stack[*] or in a typed array for performance. The JS expression
that gave you the value of the struct's binding would just create a JS
object with the original fields whose values would come from those
variables.

Take this example code:

struct Foo {
int bar;
int baz;
}

int addFoo() {
Foo f = { 1, 2 };
return f.bar + f.baz;
}

It would be completely valid for that example to be compiled to this:

function addFoo() {
var r, z;
r = 1;
z = 2;
return r + z;
}

And the JS expression in the source map could be something equivalent to:

(function () {
return { bar: r, baz: z };
}());

This approach could easily handle property name mangling on objects, and
also returning a plain old JS object instead of a mori Map so you see
the key-value pairs rather than compiler representation, as well.

The only exception that I can think of where you would *really* need
property resolution is a break-on-modification/watchpoint style feature.
Definitely nice to have in the future, but as others have said, maybe
something that we can wait on?

[*] Of course JS-the-language doesn't really have stack variables, but
the JS compilers will optimize them into stack and register variables
when possible.

Ron Buckton

unread,
Sep 22, 2014, 7:31:12 PM9/22/14
to Brian Slesinsky, Andy Sterland, dev-js-s...@lists.mozilla.org


> -----Original Message-----
> From: Brian Slesinsky [mailto:skyb...@google.com]
> Sent: Monday, September 22, 2014 12:07 PM
> To: Andy Sterland
> Cc: dev-js-s...@lists.mozilla.org; Ron Buckton
> Subject: Re: Symbolic resolution for locals using source maps
>
>
> > On Mon, Sep 22, 2014 at 9:22 AM, Andy Sterland
> > <Andy.S...@microsoft.com <mailto:Andy.S...@microsoft.com> >
> > wrote:
> >
> >
> > Good catch. I forgot to mention the limitations of the design. Sorry!
> >
> >
> >
> > Just pondering about your suggestion and want to make sure I’ve
> > understood it :). I think the idea is to have another entry with a list of all
> > global names in. Then when it comes to resolve symbols first check the
> > scopes list, as a global could be overridden in a local scope (eugh!), and then
> > to check the global list?
> >
>
> Yes, but I was thinking that this is implicit in your design. That is, a scope is
> just a range in the JavaScript file for which a JavaScript->source language
> symbol mapping applies, so a global scope is just a special case of that, where
> the range is the entire file. Or do you expect there to be more semantics
> around sourcemap scopes that requires them to actually correspond to
> language-level scopes? If so, what can a debugger infer from them?

As currently designed, the scopes in the extension specify the range for the language-level scope in the generated output file. If you wanted to resolve the underlying value for a symbol in the source file in a data-tip on mouse over, you would:

1. Determine the text span for the identifier in the source
2. Map the text span to the correct location in the generated output using the "mappings" field
3. Use the mapped range to find the enclosing scope in the generated output.
4. That in turn would let you find any possible symbol mappings in that scope and its ancestors to determine if the symbol under the cursor has been renamed.

Also, given enough context from a debugger, the same information could be used to map locals in the current scope, as well as to hide generated locals that only exist in the output. If the text range of the scope doesn't accurately match up with what the underlying debugger might conclude is the text range of the current scope, it would be difficult to properly resolve renamed symbols for locals.

> > As you notified the big limitation is that fields can't be resolved. We
> > had originally pushed that out to simplify the problem and tackle a subset
> > that catches a lot of source mapped scenarios. In the long run it would be
> > great to have a format that covers fields. Any thoughts on how best to
> > embed type information or such to enable resolving of fields?
> >
>
> Well I think the spec already covers that, if you're willing to pay the
> overhead. The idea is that you split sourcemap ranges at the beginning and
> end of each JavaScript identifier, and then set the name for that range. This is
> fully general since it can be done individually for each usage of JavaScript
> identifier. Of course this doesn't give you any semantics; it's purely about
> translating symbol names. It's also somewhat expensive with regards to
> sourcemap size, since otherwise you can usually get away with one
> sourcemap range per source line. (I reduced the size of GWT sourcemaps by
> 70% by not doing any symbol mapping at all and consolidating sourcemap
> ranges, and it would be nice if we didn't have to undo this to get symbol
> mappings back again. That's why I'm interested in more efficient ways to map
> globals and some locals.)

One option might be to include information about tokens that indicate qualified identifiers or property access, and add additional information about renamed symbols, e.g.:

```
X.longSymbolName -> X.a
```

Becomes something like (conceptually):

```
MemberAccessToken: "."
...
MemberAccess(X, longSymbolName) -> MemberAccess(X, a)
```

Alternatively, you could embed that information in the local mapping itself:

```
"names": [..., "X.longSymbolName", "X.a"],
...
"locals": "AC"
```

Here you would rely on the debugger/data tip parser having enough knowledge of the source language to accurately determine the extent/operands of the member bind operator under the cursor to be able to look up a possible match. This has the downside of further increasing the size of the source map for any renamed fields that are constantly remapped. This could be mitigated by storing symbol mapping information for the *shape* of the object, e.g.:

```
// ...inefficient representation...
"scopes": "...",
"shapes": [{ "longSymbolName": "a" }],
"localShapes": [{ scope: 0, identifier: "X", shape: 0 }]

// ...efficient representation?...
"names": ["X", "longSymbolName", "A"],
...
"scopes": "...",
"shapes": "CC", // [0] delta offset into "names" for source, [1] delta offset into "names" for generated, [2?] possible delta offset for a nested shape?
"localShapes": "AAA" // [0] delta offset into "scopes", [1] delta offset into "names" for source identifier (possibly also remapped using "locals"), [2] delta offset into "shapes"
```

Ron

Ron Buckton

unread,
Sep 22, 2014, 8:04:29 PM9/22/14
to fit...@mozilla.com, dev-js-s...@lists.mozilla.org
> -----Original Message-----
> From: dev-js-sourcemap [mailto:dev-js-sourcemap-
> bounces+ron.buckton=micros...@lists.mozilla.org] On Behalf Of
> Fitzgerald, Nick
> Sent: Monday, September 22, 2014 1:23 PM
>
> Awesome!
>
> This is exactly the type of problem I was talking about solving in
> http://fitzgeraldnick.com/weblog/55/
>
> From what I've seen, the JS code generated by the TypeScript compiler is
> pretty similar to the original TypeScript source. Because of that, it may be
> tempting to take shortcuts that gloss over some details that other compilers
> doing more extreme transformations (such as emscripten) will have to deal
> with, but TypeScript doesn't. At first glance, I don't see anything jumping out
> at me other than the object fields that Brian brought up, but this is
> something we need to be vigilant about.
>
> Let's definitely do this, but let's make sure we do this Right(tm) :)
>
> Still going over this, but I have some questions:
>
> * Why repeat starting line/column in the scope when that data is almost
> assuredly already encoded in an existing mapping? Instead, can this
> reference a start mapping (via an index into the mappings) on which the
> scope begins?

I found that, in TypeScript at least, the offsets of the mappings currently emitted didn't align well with where an actual lexical scope starts and ends in the generated output. We could emit a few extra mapping records, but we would still need to indicate when to push/pop a scope and index into those offsets. Since you don't need to emit scope ranges for scopes that don't explicitly contain a renamed symbol, you can generally save space by just embedding the line/column offsets directly in the "scopes" property. I'd have to play around with a number of different scenarios to really see if there's a benefit to switching.

> * This isn't spelled out explicitly, but it seems that the locals data is parallel to
> the scopes data, right? As in, the first scope's locals are in the first segment in
> x_ms_locals, and the second scope's locals are in the second segment of
> x_ms_locals, etc... One thing we didn't originally do, but would be incredibly
> valuable moving forward would be to specify algorithms for using the data,
> rather than just describing the format of the encoded data. Would love to
> see the algorithm for this here.

That is correct, the "scopes" and "locals" fields are parallel. Andy and I can look into modifying the gist to include any necessary algorithms.

> * In x_ms_locals, the second field is "a base 64 VLQ relative to the first field in
> this segment." Why relative to the first field? Why not relative to the last
> value? It seems like the latter would save more space, but maybe I'm wrong?

Generally, when a local is renamed, its renamed value tends to immediately follow it in the "names" array (e.g. [..., "this", "_this"]), so it's more efficient to encode the increment here. Imagine several locals renamed in the same scope:

```
"longId1" -> "a"
"longId2" -> "b"
"longId3" -> "c"
```

They would (excepting cases where the name was already encoded), be encoded as:

```
"names": ["longId1", "a", "longId2", "b", "longId3", "c"],
"locals": "AC,CC,CC"
```

Whereas, if they increment independently you might end up with:

```
"locals": "AC,EE,EE"
```

You might be able to be more efficient by reordering the names though:

```
"names": ["longId1", "longId2", "longId3", "a", "b", "c"],
"locals": "AG,CC,CC"
```

But you end up with the same amount of variance in the output as without the reordering. Using the first approach you could theoretically end up with a long line of "C,CC,CC,C..." that could be more efficiently compressed.

The reality though is that many names may be reused and the offsets will likely jump all over the "names" array, so it may not really matter either way.

> * In x_ms_locals, the second field "is an identifier or expression for the
> scope". Do you mean it is an identifier or expression to evaluate to get the
> value of the given local binding? The wording would lead to me believe that it
> describes the name of the scope, but this seems like the wrong place for
> that. Which leads me to

You are correct, it is effectively the expression to execute to get the value of the local symbol.

> * Is there a way to name a scope? Give it the
> function/method/object/module/whatever name? This seems like
> something that should be supported, if not now then in the future, which
> leads me to

Generally, the name of a scope can be inferred from the name offset of a nearby "mapping" record (though this may need to be more concretely defined).

> * How do you imagine future extensions interacting with this? What if we
> want to extend the locals data with a way to specify how the debugger
> should display the given value (for example the value is a mori[0] immutable
> map, but the debugger should display it's key-value pairs directly instead of
> its internal representation. It seems to me that object field renaming is a
> simple version of this problem. It's ok not to solve every problem now, but
> the format must be future extensible to accommodate when we start solving
> those other problems.
>
> [0] https://swannodette.github.io/mori/

I have a few thoughts on handling renamed fields that I expressed in a separate reply to Brian's mail on this thread. I haven't given much thought yet on how to provide a "debugger view" over an object, though Brian also mentioned this. I could imagine storing some information about the "shape" of an object in a separate set of fields in the source map, which could include such things as how to create a "debugger view" over a value tagged with that shape. I relate this to something like the DebugViewAttribute and DebuggerTypeProxyAttribute classes in C#.

Ron

Ron Buckton

unread,
Sep 22, 2014, 8:11:26 PM9/22/14
to fit...@mozilla.com, dev-js-s...@lists.mozilla.org


> -----Original Message-----
> From: dev-js-sourcemap [mailto:dev-js-sourcemap-
> bounces+rbuckton=micros...@lists.mozilla.org] On Behalf Of Fitzgerald,
> Nick
> Sent: Monday, September 22, 2014 1:50 PM


> On 9/22/14, 9:22 AM, Andy Sterland wrote:
> > As you notified the big limitation is that fields can't be resolved.
>
> What use case do we need field resolution for?

This is one of the reasons we haven't yet included fields in this extension, the core cases for TypeScript so far have been more about locals than fields, though we have been thinking about it for other scenarios like minifiers.

>
> It seems to me that the expression that gets the binding's value could just do
> the translation itself and return a "display" value rather than the "internal"
> value. A struct might not even have a single "actual"
> value in its compiled form and instead have its fields in local values on the
> stack[*] or in a typed array for performance. The JS expression that gave you
> the value of the struct's binding would just create a JS object with the original
> fields whose values would come from those variables.
>
> Take this example code:
>
> struct Foo {
> int bar;
> int baz;
> }
>
> int addFoo() {
> Foo f = { 1, 2 };
> return f.bar + f.baz;
> }
>
> It would be completely valid for that example to be compiled to this:
>
> function addFoo() {
> var r, z;
> r = 1;
> z = 2;
> return r + z;
> }
>
> And the JS expression in the source map could be something equivalent to:
>
> (function () {
> return { bar: r, baz: z };
> }());

Or just `({bar:r,baz:z})`, though with either case you'd have to accurately capture the locals.

>
> This approach could easily handle property name mangling on objects, and
> also returning a plain old JS object instead of a mori Map so you see the key-
> value pairs rather than compiler representation, as well.
>
> The only exception that I can think of where you would *really* need
> property resolution is a break-on-modification/watchpoint style feature.
> Definitely nice to have in the future, but as others have said, maybe
> something that we can wait on?
>
> [*] Of course JS-the-language doesn't really have stack variables, but the JS
> compilers will optimize them into stack and register variables when possible.

Ron

Brian Slesinsky

unread,
Sep 22, 2014, 9:11:31 PM9/22/14
to Ron Buckton, Andy Sterland, dev-js-s...@lists.mozilla.org
"Scope" is an overloaded term. To make things language-independent, instead
of talking about scopes, maybe we should just talk about text ranges?
Here's a stab at it:

1. A source-level definition ("source definition") is represented as a text
range in a source file where the definition occurs. (The debugger doesn't
know how to interpret it, just where it begins and ends.) A source
definition might point to another source definition that it overrides.

2. A source usage range is a represented by a source-level identifier (a
string), together with a text range where it appears. To find the usages,
we search the text range for occurrences of that string. Since we don't
know the language, this is a naive text search that at most knows not to
match on part of a word. If there are false positives (such as in comments
and string literals) then the compiler has to mask them out by splitting
the range into multiple ranges. A source usage range points to its
source-level definition.

3. A JavaScript usage range is represented by a qualified JavaScript
identifier and a text range within a JavaScript file where it applies.
Since the debugger knows that the language is JavaScript, it can be smart
about just looking for actual identifiers and avoiding false positives
(comments and string literals), but the compiler still should mask out
inner scopes by splitting ranges. A JavaScript usage range points to its
corresponding source usage range, which in turn points to the source-level
definition. When a JavaScript debugger stops at a breakpoint that's within
a JavaScript usage range, evaluating the JavaScript identifier should
return the JavaScript value corresponding to the source-level definition.

So, to map between source-level definitions and source-level usages, we
wouldn't need to look at the JavaScript at all. The mapping from a
JavaScript usage range to its source usage range is used to find out which
source definitions are active at a breakpoint and to find the JavaScript
value of each definition.

To be clear, I'm not working on this and haven't tested it at all so I
don't know whether range splitting or some kind of nested ranges would be
the best way to mask out false positives. It seems like they're logically
equivalent, so this should be chosen based on which one results in smaller
file sizes and makes it reasonably quick for the debugger to load the
symbol map and build its indexes.

One interesting side-effect is that reusing symbols in inner scopes would
result in more range splitting (to mask out false positives) while
uniquely-chosen symbols can be represented as a single large range without
splitting. So, the way the minifier chooses symbols could make the source
map bigger or smaller.

- Brian

Fitzgerald, Nick

unread,
Sep 23, 2014, 12:26:32 PM9/23/14
to Ron Buckton, dev-js-s...@lists.mozilla.org, fit...@mozilla.com
On 9/22/14, 5:11 PM, Ron Buckton wrote:
> Or just `({bar:r,baz:z})`, though with either case you'd have to accurately capture the locals.

Ha, of course :)

The evaluation of the expression could certainly be expanded upon.

Describing the security model and how it would fit in with the Same
Origin Policy, etc.

It seems to me that the expression should be evaluated in the same frame.
0 new messages