Wrong SourceCodeInfo Location for options?

52 views
Skip to first unread message

Clément Jean

unread,
May 22, 2023, 8:59:21 PM5/22/23
to Protocol Buffers
I'm working a parser for Protobuf and recently I am working with SourceLocationInfo.
I've been noticing something weird when using options. However, I'm not sure if my understanding is not complete or if its a bug.

I have the following enum:

enum Test {
option deprecated = true;

TEST_UNSPECIFIED = 0;
}

And I get the following FileDescriptorSet (simplified):

file {
  ...
  source_code_info {
    location {
      span: 0
      span: 0
      span: 4
      span: 1
    }
    location {
      path: 5
      path: 0
      span: 0
      span: 0
      span: 4
      span: 1
    }
    location {
      path: 5
      path: 0
      path: 1
      span: 0
      span: 5
      span: 9
    }
    location {
      path: 5
      path: 0
      path: 3
      span: 1
      span: 8
      span: 33

    }
    location {
      path: 5
      path: 0
      path: 3
      path: 3
      span: 1
      span: 8
      span: 33

    }
    location {
      path: 5
      path: 0
      path: 2
      path: 0
      span: 3
      span: 8
      span: 29

    }
    location {
      path: 5
      path: 0
      path: 2
      path: 0
      path: 1
      span: 3
      span: 8
      span: 24

    }
    location {
      path: 5
      path: 0
      path: 2
      path: 0
      path: 2
      span: 3
      span: 27
      span: 28

    }
  }
}


I'm confused by the ones I wrote in red. These doesn't seem to be correct.
An example is span: 1 span: 8 span: 33. To the best of my knowledge, this means
that we have an element at line 1 (2 in an IDE) column 8 (9 in IDE) that finishes on the same line (third span omitted) and at column 33. However, the option line is only 27 characters long...

Is my mental model about SourceCodeInfo wrong? Or is it a bug?

Clément Jean

unread,
May 22, 2023, 10:54:36 PM5/22/23
to Protocol Buffers
Also to be complete, I use the following command to turn the proto file into a desc file:

protoc --include_source_info --descriptor_set_out=test.desc enum.proto

and then to have a human readable format, I run:

cat test.desc | protoc --decode=google.protobuf.FileDescriptorSet -I/usr/local/include/google/protobuf /usr/local/include/google/protobuf/descriptor.proto

finally, protoc --version returns:

libprotoc 23.1

Adam Cozzette

unread,
May 25, 2023, 2:07:17 PM5/25/23
to Clément Jean, Protocol Buffers
I'm not sure what's going wrong, but I agree with you that those spans in red don't look right. This might be a bug in the code that generates the source code info.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/a2ce2413-325d-4609-9150-804baa16d177n%40googlegroups.com.

Clément Jean

unread,
May 25, 2023, 8:51:03 PM5/25/23
to Protocol Buffers
Should I create an issue on Github for this? Is there any way I can contribute to make things go faster?

Clément Jean

unread,
May 25, 2023, 9:47:49 PM5/25/23
to Protocol Buffers
I seems that the test SourceInfoTest, ScopedOptions in google/protobuf/compiler/parser_unittest.cc checks the case of options.

I also checked the Spans there and they seem to be correct. For:

message Foo {
$a$option mopt = 1;$b$
}
enum Bar {
$c$option eopt = 1;$d$
}
service Baz {
$e$option sopt = 1;$f$
rpc M(X) returns(Y) {
$g$option mopt = 1;$h$
}
rpc MS4($1$stream$2$ X) returns($3$stream$4$ Y) {
$k$option mopt = 1;$l$
}
}

We get:

SPAN: a:b
-> 1
-> 2
-> 18
SPAN: c:d
-> 4
-> 2
-> 18
SPAN: e:f
-> 7
-> 2
-> 18
SPAN: g:h
-> 9
-> 4
-> 20
SPAN: k:l
-> 12
-> 4
-> 20
SPAN: 1:2
-> 11
-> 10
-> 16
SPAN: 3:4
-> 11
-> 28
-> 34

Clément Jean

unread,
May 26, 2023, 12:58:10 AM5/26/23
to Protocol Buffers
Turns out this is not a bug! The Tokenizer counts the tabs like the following:

else if (current_char_ == '\t') {
column_ += kTabWidth - column_ % kTabWidth;
}

where kTabWidth is 8. I was aware that Protobuf documentation recommend indenting by 2 but I didn't know why. Now, I know.

Thank you anyway.
Reply all
Reply to author
Forward
0 new messages