Hello,
This bug is more like a follow up of the message #61 [1] in bug #533916. I have
finally finished implementing symbol patterns to full extent, i.e. the code has
undergone clean up and it is as nice as I can write it. What is more, manual
page update and a complete test suite are included.
To make reviewing a lot easier and faster for you, all my work is split into 14
mostly self-contained commits (the last being copyright update). Please note
that they are not my typical work tree. On the contrary, they are much more
polished and each of them is supposedly bug-free. However, from now on, I'll
fix bugs if any are found on top of those initial 14 patches so don't be
surprised if you see more commits in the remote branch.
All my work is in 'symbol-patterns' branch. Remote repository is located at:
git://git.debian.org/users/modax/dpkg.git
http://git.debian.org/?p=users/modax/dpkg.git;a=shortlog;h=refs/heads/symbol-patterns
Oneline log:
$ git log --oneline master..symbol-patterns
2761109 Bump copyright years and use @debian.org address.
b55f6b3 Document patterns in the manual page.
090873a Add testsuite for symbol patterns functionality.
12ea63a Collection of tweaks to the existing Dpkg_Shlibs tests and codebase.
93168fc Improve dpkg-gensymbols output, add "verbose output" option.
9676657 Improve SymbolFile::dump() output.
5f58c24 Fix indentation style in SymbolFile::dump().
40ca127 Implement pattern matching.
0b7ef8e Simplify and shorten get_new_symbols().
77a9b89 Split SymbolFile::merge_symbols().
c268d0d Initialize patterns.
c206e18 Add pattern related subs to Symbol interface.
76b077b Replace Symbol::clone() with dclone() and sclone().
a7533ad Add Cppfilt module - interface to the c++filt utility.
In order to know more what this feature is about, I recommend to read manual
page update first:
$ git show b55f6b3
I recommend the following debian/changelog entries when closing this bug (where
#nnnnnn is a number of this bug):
[ Modestas Vainius ]
* Implement symbol patterns (Closes: #nnnnnn). From now on, it is possible to
match multiple symbols with a single entry in the symbol file template.
While the concept is not new (wildcards also match multiple symbols),
patterns cover much more ground and are a lot more flexible. Together with
the framework, 3 atomic pattern types are supported:
- c++ - matching C++ symbols by their demangled name (as emitted by
c++filt);
- wildcard - matching by symbol version. Wildcard pattern is not new, but
it has been reimplemented on top of the patterns framework;
- regexp - matching perl regular expression against raw symbol names.
What is more, atomic patterns may be combined where it makes sence.
* As a positive side effect of the new symbol patterns implementation,
patterns are now treated like normal symbols whenever possible, e.g. a
pattern is MISSING if it does not match anything. As a result,
dpkg-gensymbols is now able to detect NEW/MISSING symbols when patterns are
present in the symbol file (Closes: #541464). Please note, however, that
there is no way to detect symbol changes in the pattern match sets.
Consider #533916 blocked by this bug. I still think substs (except vt of
course) are useful even with symbol pattern support. I'll will probably
re-implement them once/if symbol patterns get accepted.
It would be great to see this feature in squeeze...
1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=533916#61
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (101, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.31-1-amd64 (SMP w/1 CPU core)
Locale: LANG=lt_LT.UTF-8, LC_CTYPE=lt_LT.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages dpkg-dev depends on:
ii base-files 5.0.0 Debian base system miscellaneous f
ii binutils 2.20-4 The GNU assembler, linker and bina
ii bzip2 1.0.5-3 high-quality block-sorting file co
ii dpkg 1.15.5.5 Debian package management system
ii libtimedate-perl 1.1900-1 Time and date functions for Perl
ii lzma 4.43-14 Compression method of 7z format in
ii make 3.81-7 An utility for Directing compilati
ii patch 2.6-2 Apply a diff file to an original
ii perl [perl5] 5.10.1-8 Larry Wall's Practical Extraction
ii perl-modules 5.10.1-8 Core Perl modules
ii xz-utils 4.999.9beta+20091116-1 XZ-format compression utilities
Versions of packages dpkg-dev recommends:
ii build-essential 11.4 Informational list of build-essent
ii fakeroot 1.14.4-1 Gives a fake root environment
ii gcc [c-compiler] 4:4.4.2-2 The GNU C compiler
ii gcc-4.4 [c-compiler] 4.4.2-8 The GNU C compiler
ii gnupg 1.4.10-2 GNU privacy guard - a free PGP rep
ii gpgv 1.4.10-2 GNU privacy guard - signature veri
Versions of packages dpkg-dev suggests:
pn debian-keyring <none> (no description available)
pn debian-maintainers <none> (no description available)
-- no debconf information
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
thanks for your work! Here are the results of my review.
On Tue, 05 Jan 2010, Modestas Vainius wrote:
> a7533ad Add Cppfilt module - interface to the c++filt utility.
What happens when the toolchain starts using a (supposedly new) gnu-v4
method of mangling? How will we deal with that?
Also what about support of other OS? How can we plug it in that code when
necessary? (trying to answer myself) I guess we can use the Config module and
select the right type automatically.
> c206e18 Add pattern related subs to Symbol interface.
Small error spotted:
--- a/scripts/Dpkg/Shlibs/Symbol.pm
+++ b/scripts/Dpkg/Shlibs/Symbol.pm
@@ -241,7 +241,7 @@ sub equals {
# Compare names and tag sets
return 0 if $self->{symbol} ne $other->{symbol};
- return 0 if scalar(@{$self->{tagorder}}) != scalar(@{$self->{tagorder}});
+ return 0 if scalar(@{$self->{tagorder}}) != scalar(@{$other->{tagorder}});
for (my $i = 0; $i < scalar(@{$self->{tagorder}}); $i++) {
my $tag = $self->{tagorder}->[$i];
> 76b077b Replace Symbol::clone() with dclone() and sclone().
Why did you decide to have the two variants? Is it only a matter of
saving memory in some cases when the deep clone is not required?
Does it make a significant difference?
> c268d0d Initialize patterns.
+ # Wildcard is an alias based pattern. It gets recognized here even if it is
+ # not specially tagged.
+ if (my $ver = $self->get_wildcard_version()) {
+ error(_g("you can't use wildcards on unversioned symbols: %s"), $_) if $ver eq "Base
+ $self->init_pattern(($self->is_pattern()) ? 'generic' : 'alias-wildcard');
+ $self->{pattern}{wildcard} = 1;
+ }
Why the test on is_pattern() to initialize it either as generic or alias-wildcard?
I imagined it could be in case you combine wildcard with regexp pattern but I
fail to see how it could work in practice, the initialization as regex pattern
comes right after that block...
> 9676657 Improve SymbolFile::dump() output.
To me it seems more logical to show pattern matches after the pattern symbol,
no? Is there a reason why you put them first?
> b55f6b3 Document patterns in the manual page.
+counterpart defined in the symbol file. Whenever the first matching pattern is
+found, all its tags and properties will be used as a basis specification of the
+symbol. If neither pattern matches, the symbol will be considered as MISSING.
MISSING -> new. Or did I miss something?
+Please also note that patterns are in the same way affected by \fIoptional\fR,
+\fIarch\fR and other standard tags whenever possible.
How "optional" affects patterns ought to be clarified (see below discussion
about how it should maybe apply by default to wildcards).
+At the moment, \fBdpkg\-gensymbols\fR supports three atomic pattern types:
Why "atomic"?
+for them). However, as these collisions happen on the ABI level, they should
+not degrade quality of the symbol file.
Can you explain more precisely why? (not necessarily in the manual page, but at
least for my own culture)
---
About wildcard symbols, I'm not sure it makes sense to keep the current
syntax only. We should introduce a new syntax:
(match-version)<version>
The old syntax *@<version> would auto-translate to:
(match-version|optional)<version>
That way we keep backwards compatibility with the current behaviour where
an unused wildcard is not a problem. (Not sure optional can be used for this
purpose... otherwise we have to add "optional-pattern" and fix
$sym->mark_not_found_in_library accordingly).
I expect the glibc to have some generice wildcard entries used for all arches
but only some of them are used for a given arch (because support for the arch
has been added at a different point in time) and currently dpkg-gensymbols
would fail complaining that a pattern match got removed/is unused.
If you can think of a better name than "match-version", please suggest
it... but right now I can't.
Otherwise it looks mostly fine. Some stylish comments though:
Please avoid using
my $a = shift;
my $b = shift;
my $c = shift;
Prefer:
my ($a, $b, $c) = @_;
readline($fh) should also be replaced by the more common <$fh>
> It would be great to see this feature in squeeze...
I think it should be ok for squeeze. :)
Cheers,
--
Raphaël Hertzog
On šeštadienis 09 Sausis 2010 20:14:39 Raphael Hertzog wrote:
> On Tue, 05 Jan 2010, Modestas Vainius wrote:
> > a7533ad Add Cppfilt module - interface to the c++filt utility.
>
> What happens when the toolchain starts using a (supposedly new) gnu-v4
> method of mangling? How will we deal with that?
When this happens, we will need the code to somehow detect current C++ ABI
being used (what comes to mind is maybe checking which soname of libstdc++ the
lib is linked against). Or alternative is to use c++filt 'auto' format
instead. That one demangles C++ symbols fine at the moment and is likely to
work with newer ABI in the future.
> Also what about support of other OS? How can we plug it in that code when
> necessary? (trying to answer myself) I guess we can use the Config module
> and select the right type automatically.
Hmm, other OSes? I'm not sure what you mean. If that other OS does not use GNU
toolchain, many things will need porting including this one (but is that an
issue at the moment?). What is more, c++ pattern is literally dependent on the
GNU c++filt(1) output.
> > c206e18 Add pattern related subs to Symbol interface.
>
> Small error spotted:
>
> --- a/scripts/Dpkg/Shlibs/Symbol.pm
> +++ b/scripts/Dpkg/Shlibs/Symbol.pm
> @@ -241,7 +241,7 @@ sub equals {
>
> # Compare names and tag sets
> return 0 if $self->{symbol} ne $other->{symbol};
> - return 0 if scalar(@{$self->{tagorder}}) !=
> scalar(@{$self->{tagorder}}); + return 0 if
> scalar(@{$self->{tagorder}}) != scalar(@{$other->{tagorder}});
>
> for (my $i = 0; $i < scalar(@{$self->{tagorder}}); $i++) {
> my $tag = $self->{tagorder}->[$i];
I will fix this.
> > 76b077b Replace Symbol::clone() with dclone() and sclone().
>
> Why did you decide to have the two variants? Is it only a matter of
> saving memory in some cases when the deep clone is not required?
>
> Does it make a significant difference?
When cloning a symbol which matches a pattern (i.e. it has $symbol-
>{matching_pattern}), deep cloning would clone the matching pattern itself and
all its matches ($symbol->{matching_pattern}{pattern}{matches}) and probably
clone the same pattern again for each match etc etc. That's potentially an
infinite loop and huge waste of memory (but I have not tested). So I chose a
more straightforward approach (sclone) rather than deleting specific $symbol
fields before cloning and adding them back after cloning (done that and didn't
like it).
> > c268d0d Initialize patterns.
>
> + # Wildcard is an alias based pattern. It gets recognized here even if
> it is + # not specially tagged.
> + if (my $ver = $self->get_wildcard_version()) {
> + error(_g("you can't use wildcards on unversioned symbols: %s"), $_)
> if $ver eq "Base + $self->init_pattern(($self->is_pattern()) ?
> 'generic' : 'alias-wildcard'); + $self->{pattern}{wildcard} = 1;
> + }
>
> Why the test on is_pattern() to initialize it either as generic or
> alias-wildcard?
>
> I imagined it could be in case you combine wildcard with regexp pattern but
> I fail to see how it could work in practice, the initialization as regex
> pattern comes right after that block...
It's true that wildcard (as it is now) + regexp will never work (* at the
beginning would never match anything). However, it is possible to combine c++
and wildcard (in theory, see testsuite). 'generic' pattern is not just regexp.
It is also any combination of patterns.
Aliases are there as optimization (i.e. hash is used for patterns where it can
be used). I'm pretty sure some people will have (even if it is not
recommended) all symbols demangled in their symbol files... On the other hand,
aliases complicate implementation a bit, but not that much imho.
> > 9676657 Improve SymbolFile::dump() output.
>
> To me it seems more logical to show pattern matches after the pattern
> symbol, no? Is there a reason why you put them first?
That way looked better to me 5 months ago :) Now probably I agree with you. I
will fix this.
> > b55f6b3 Document patterns in the manual page.
>
> +counterpart defined in the symbol file. Whenever the first matching
> pattern is +found, all its tags and properties will be used as a basis
> specification of the +symbol. If neither pattern matches, the symbol will
> be considered as MISSING.
>
> MISSING -> new. Or did I miss something?
E.g. we have a symbol from objdump (aka real symbol in the manpage). First
check if any non-pattern (specific symbol in the manpage) matches, then if any
defined pattern matches. So if neither matches, the objdump symbol is MISSING,
isn't it? That paragraph probably does not have a clear distinction between
objdump symbol and symbol in the symbol file. Feel free to fix this wording
up. I kinda ran into a corner with terminology.
> +Please also note that patterns are in the same way affected by
> \fIoptional\fR, +\fIarch\fR and other standard tags whenever possible.
>
> How "optional" affects patterns ought to be clarified (see below discussion
> about how it should maybe apply by default to wildcards).
Definition of optional is that dpkg-gensymbols won't fail if the symbol is
MISSING (it will show up in diff though). Pattern is considered MISSING if
does not match anything. You mean this info should be added to the manpage?
> +At the moment, \fBdpkg\-gensymbols\fR supports three atomic pattern types:
>
> Why "atomic"?
Because combined patterns are also patterns. So atomic for c++, wildcard,
regexp is the best I came up with. What term do you suggest?
> +for them). However, as these collisions happen on the ABI level, they
> should +not degrade quality of the symbol file.
>
> Can you explain more precisely why? (not necessarily in the manual page,
> but at least for my own culture)
This is partially based on my question [1] and the following answer [2] (e.g.
in case of doubled destructors on the ABI level). With regard to non-virtual
trunks with different offsets, once a class has been created, its virtual
table or class hierarchy cannot be modified or that will instantly break ABI
[3]. As a result, a set of non-virtual thunks will have to remain constant
since class introduction due to:
1) neither base classes nor the class itself being allowed to add new / remove
existing base classes;
2) base classes not being able to define new virtual methods or remove
existing ones.
Only either of the two above could trigger a new non-virtual thunk in the
derivative classes. Sure, usage of the pattern limits detection of such a ABI
breakage, but it is enough for symbol versioning.
> About wildcard symbols, I'm not sure it makes sense to keep the current
> syntax only. We should introduce a new syntax:
> (match-version)<version>
>
> The old syntax *@<version> would auto-translate to:
> (match-version|optional)<version>
>
> That way we keep backwards compatibility with the current behaviour where
> an unused wildcard is not a problem. (Not sure optional can be used for
> this purpose... otherwise we have to add "optional-pattern" and fix
> $sym->mark_not_found_in_library accordingly).
Ok. I see. However I would rather special case *@<version> pattern than
introduce optional-pattern tag. Explanation below.
> I expect the glibc to have some generice wildcard entries used for all
> arches but only some of them are used for a given arch (because support
> for the arch has been added at a different point in time) and currently
> dpkg-gensymbols would fail complaining that a pattern match got removed/is
> unused.
So now there is an arch tag for this purpose. Therefore, I don't think
optional-pattern should exist except maybe for backwards compatibility
purposes with old behaviour.
Btw, given there is a better solution (arch tag on the wildcard), do you think
we still need match-version? Why not just force people upgrade to new (and
arguably better) ways?
> If you can think of a better name than "match-version", please suggest
> it... but right now I can't.
match-version name looks ok to me. However, I'm not sure how symbol name part
should look of such a pattern. Something like
(match-version=GLIBC_2.0)* 2.0
?
>
>
> Otherwise it looks mostly fine. Some stylish comments though:
>
> Please avoid using
> my $a = shift;
> my $b = shift;
> my $c = shift;
> Prefer:
> my ($a, $b, $c) = @_;
>
> readline($fh) should also be replaced by the more common <$fh>
>
> > It would be great to see this feature in squeeze...
>
> I think it should be ok for squeeze. :)
I will this fix those.
Fixes will be committed to symbol-patters branch in a couple of hours.
1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=533916#40
2. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=533916#45
3.
http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C++#The_Do.27s_and_Don.27ts
--
Modestas Vainius <mode...@vainius.eu>
Why not using "auto" right now then? It means "c++" would work for java as
well which is weird but we can rename "c++" in "demangled". It also means
we would auto-support the other mangling method used by other systems
automatically.
Is there a big risk of mixup of types if we use "auto"? Can we still use
c++filt as a daemon in that case or will we have troubles if we process
several types of symbols in the same run?
> Hmm, other OSes? I'm not sure what you mean. If that other OS does not use GNU
> toolchain, many things will need porting including this one (but is that an
> issue at the moment?). What is more, c++ pattern is literally dependent on the
> GNU c++filt(1) output.
Well, I remember someone who tried to port this code on some other OS
(can't remember which one right now). If we can make it easier for them,
we should do it, so it's best if we try to give it a thought now.
(But I will not block on this)
>
> I will fix this.
>
> > > 76b077b Replace Symbol::clone() with dclone() and sclone().
> >
> > Why did you decide to have the two variants? Is it only a matter of
> > saving memory in some cases when the deep clone is not required?
> >
> > Does it make a significant difference?
>
> When cloning a symbol which matches a pattern (i.e. it has $symbol-
> >{matching_pattern}), deep cloning would clone the matching pattern itself and
> all its matches ($symbol->{matching_pattern}{pattern}{matches}) and probably
> clone the same pattern again for each match etc etc. That's potentially an
> infinite loop and huge waste of memory (but I have not tested). So I chose a
Right, makes sense. But maybe we should rather have a single clone() that
is smart enough to not clone what is known to be references to other
objects (as opposed to simple internal hashes). Because it's difficult to
know when we should sclone() rather that dclone().
> more straightforward approach (sclone) rather than deleting specific $symbol
> fields before cloning and adding them back after cloning (done that and didn't
> like it).
You did that at one point in the code, though.
> > > b55f6b3 Document patterns in the manual page.
> >
> > +counterpart defined in the symbol file. Whenever the first matching
> > pattern is +found, all its tags and properties will be used as a basis
> > specification of the +symbol. If neither pattern matches, the symbol will
> > be considered as MISSING.
> >
> > MISSING -> new. Or did I miss something?
>
> E.g. we have a symbol from objdump (aka real symbol in the manpage). First
> check if any non-pattern (specific symbol in the manpage) matches, then if any
> defined pattern matches. So if neither matches, the objdump symbol is MISSING,
> isn't it? That paragraph probably does not have a clear distinction between
> objdump symbol and symbol in the symbol file. Feel free to fix this wording
> up. I kinda ran into a corner with terminology.
If you have a symbol in objdump output, you have a real existing symbol,
but if you can't find it in the template symbols file neither as real nor
through a patter, it's a new symbol, right? It will be added in the
resulting symbols file with the current version associated...
>
> > +Please also note that patterns are in the same way affected by
> > \fIoptional\fR, +\fIarch\fR and other standard tags whenever possible.
> >
> > How "optional" affects patterns ought to be clarified (see below discussion
> > about how it should maybe apply by default to wildcards).
>
> Definition of optional is that dpkg-gensymbols won't fail if the symbol is
> MISSING (it will show up in diff though). Pattern is considered MISSING if
> does not match anything. You mean this info should be added to the manpage?
Yes...
> > +At the moment, \fBdpkg\-gensymbols\fR supports three atomic pattern types:
> >
> > Why "atomic"?
>
> Because combined patterns are also patterns. So atomic for c++, wildcard,
> regexp is the best I came up with. What term do you suggest?
I would suggest "basic" vs "combined".
> > About wildcard symbols, I'm not sure it makes sense to keep the current
> > syntax only. We should introduce a new syntax:
> > (match-version)<version>
> >
> > The old syntax *@<version> would auto-translate to:
> > (match-version|optional)<version>
> >
> > That way we keep backwards compatibility with the current behaviour where
> > an unused wildcard is not a problem. (Not sure optional can be used for
> > this purpose... otherwise we have to add "optional-pattern" and fix
> > $sym->mark_not_found_in_library accordingly).
>
> Ok. I see. However I would rather special case *@<version> pattern than
> introduce optional-pattern tag. Explanation below.
Why do we need optional-pattern? Wouldn't optional work as expected?
> > I expect the glibc to have some generice wildcard entries used for all
> > arches but only some of them are used for a given arch (because support
> > for the arch has been added at a different point in time) and currently
> > dpkg-gensymbols would fail complaining that a pattern match got removed/is
> > unused.
>
[...]
> Btw, given there is a better solution (arch tag on the wildcard), do you think
> we still need match-version? Why not just force people upgrade to new (and
> arguably better) ways?
I want match-version primarily for consistency in the long term. It
doesn't make sense to have an exception that does not use a tag based
approach for wildcard. We should continue to support the old syntax for
backwards compatibility but we should document the new syntax and
deprecate the old syntax.
> > If you can think of a better name than "match-version", please suggest
> > it... but right now I can't.
>
> match-version name looks ok to me. However, I'm not sure how symbol name part
> should look of such a pattern. Something like
>
> (match-version=GLIBC_2.0)* 2.0
My suggestion was:
(match-version)GLIBC_2.0 2.0
Another alternative is to say that regex is enough for this but given that
you don't like regex for performance reasons I suppose you prefer to
implement match-version...
(regex)^.*@GLIBC_2.0$ 2.0
On šeštadienis 09 Sausis 2010 23:14:50 Raphael Hertzog wrote:
> On Sat, 09 Jan 2010, Modestas Vainius wrote:
> > Hello,
> >
> > On šeštadienis 09 Sausis 2010 20:14:39 Raphael Hertzog wrote:
> > > On Tue, 05 Jan 2010, Modestas Vainius wrote:
> > > > a7533ad Add Cppfilt module - interface to the c++filt utility.
> > >
> > > What happens when the toolchain starts using a (supposedly new) gnu-v4
> > > method of mangling? How will we deal with that?
> >
> > When this happens, we will need the code to somehow detect current C++
> > ABI being used (what comes to mind is maybe checking which soname of
> > libstdc++ the lib is linked against). Or alternative is to use c++filt
> > 'auto' format instead. That one demangles C++ symbols fine at the moment
> > and is likely to work with newer ABI in the future.
>
> Why not using "auto" right now then? It means "c++" would work for java as
> well which is weird but we can rename "c++" in "demangled". It also means
> we would auto-support the other mangling method used by other systems
> automatically.
I appears at the moment 'auto' is:
1) try gnu-v3. if success, done, otherwise
2) c++filt (libiberty) goes to great lengths to autoselect between those
special c++ compilers (hp, arm, old gnu) etc.
So apparently 'auto' does not cover java. It is limited to c++ compilers.
Therefore, I will change 'gnu-v3' to 'auto'. Java will need to be implemented
separately (Cppfilt module supports any number of daemons). I wrote pattern
framework with 'java' in mind hence adding it will probably be only a couple
lines of code but I won't do it now.
dclone() when creating a completely new entity (load(),
create_pattern_match()) from the existing Symbol object and sclone() elsewhere
(lookup_* etc.).
> > more straightforward approach (sclone) rather than deleting specific
> > $symbol fields before cloning and adding them back after cloning (done
> > that and didn't like it).
>
> You did that at one point in the code, though.
Yeah, but I would need to do something very similar either way because
create_pattern_match() actually "forks" non-pattern from pattern.
> > > > b55f6b3 Document patterns in the manual page.
> > >
> > > +counterpart defined in the symbol file. Whenever the first matching
> > > pattern is +found, all its tags and properties will be used as a basis
> > > specification of the +symbol. If neither pattern matches, the symbol
> > > will be considered as MISSING.
> > >
> > > MISSING -> new. Or did I miss something?
> >
> > E.g. we have a symbol from objdump (aka real symbol in the manpage).
> > First check if any non-pattern (specific symbol in the manpage) matches,
> > then if any defined pattern matches. So if neither matches, the objdump
> > symbol is MISSING, isn't it? That paragraph probably does not have a
> > clear distinction between objdump symbol and symbol in the symbol file.
> > Feel free to fix this wording up. I kinda ran into a corner with
> > terminology.
>
> If you have a symbol in objdump output, you have a real existing symbol,
> but if you can't find it in the template symbols file neither as real nor
> through a patter, it's a new symbol, right? It will be added in the
> resulting symbols file with the current version associated...
Uh, indeed, you are right. Apparently, I was thinking backwards when I wrote
this :P
> > > +Please also note that patterns are in the same way affected by
> > > \fIoptional\fR, +\fIarch\fR and other standard tags whenever possible.
> > >
> > > How "optional" affects patterns ought to be clarified (see below
> > > discussion about how it should maybe apply by default to wildcards).
> >
> > Definition of optional is that dpkg-gensymbols won't fail if the symbol
> > is MISSING (it will show up in diff though). Pattern is considered
> > MISSING if does not match anything. You mean this info should be added to
> > the manpage?
>
> Yes...
Ok.
>
> > > +At the moment, \fBdpkg\-gensymbols\fR supports three atomic pattern
> > > types:
> > >
> > > Why "atomic"?
> >
> > Because combined patterns are also patterns. So atomic for c++, wildcard,
> > regexp is the best I came up with. What term do you suggest?
>
> I would suggest "basic" vs "combined".
Ok.
>
> > > About wildcard symbols, I'm not sure it makes sense to keep the current
> > > syntax only. We should introduce a new syntax:
> > > (match-version)<version>
> > >
> > > The old syntax *@<version> would auto-translate to:
> > > (match-version|optional)<version>
> > >
> > > That way we keep backwards compatibility with the current behaviour
> > > where an unused wildcard is not a problem. (Not sure optional can be
> > > used for this purpose... otherwise we have to add "optional-pattern"
> > > and fix $sym->mark_not_found_in_library accordingly).
> >
> > Ok. I see. However I would rather special case *@<version> pattern than
> > introduce optional-pattern tag. Explanation below.
>
> Why do we need optional-pattern? Wouldn't optional work as expected?
So we agree that it is not needed.
So *@<version> is 'match-version' with implicit optional tag (we can't add the
tag automatically in order to avoid it appearing in the diff).
> > > I expect the glibc to have some generice wildcard entries used for all
> > > arches but only some of them are used for a given arch (because
> > > support for the arch has been added at a different point in time) and
> > > currently dpkg-gensymbols would fail complaining that a pattern match
> > > got removed/is unused.
>
> [...]
>
> > Btw, given there is a better solution (arch tag on the wildcard), do you
> > think we still need match-version? Why not just force people upgrade to
> > new (and arguably better) ways?
>
> I want match-version primarily for consistency in the long term. It
> doesn't make sense to have an exception that does not use a tag based
> approach for wildcard. We should continue to support the old syntax for
> backwards compatibility but we should document the new syntax and
> deprecate the old syntax.
Ok.
>
> > > If you can think of a better name than "match-version", please suggest
> > > it... but right now I can't.
> >
> > match-version name looks ok to me. However, I'm not sure how symbol name
> > part should look of such a pattern. Something like
> >
> > (match-version=GLIBC_2.0)* 2.0
>
> My suggestion was:
> (match-version)GLIBC_2.0 2.0
Will be done. However, in case of such syntax, 'match-' looks kinda redundant.
What about:
(version)GLIBC_2.0 2.0
or
(symbol-version)GLIBC_2.0 2.0
or
(symver)GLIBC_2.0 2.0
(like http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/gnu-
assembler/symver.html )
?
> Another alternative is to say that regex is enough for this but given that
> you don't like regex for performance reasons I suppose you prefer to
> implement match-version...
>
> (regex)^.*@GLIBC_2.0$ 2.0
Yeah. By the way, do you prefer 'regexp' (as it is now) or 'regex'?
--
Modestas Vainius <mode...@vainius.eu>
Ok.
> > Why do we need optional-pattern? Wouldn't optional work as expected?
>
> So we agree that it is not needed.
>
> So *@<version> is 'match-version' with implicit optional tag (we can't add the
> tag automatically in order to avoid it appearing in the diff).
Yes.
> > My suggestion was:
> > (match-version)GLIBC_2.0 2.0
>
> Will be done. However, in case of such syntax, 'match-' looks kinda redundant.
> What about:
>
> (version)GLIBC_2.0 2.0
> or
> (symbol-version)GLIBC_2.0 2.0
> or
> (symver)GLIBC_2.0 2.0
version is too generic... other possibilities: versionned-symbols,
symbol-with-version
Pick the one you prefer. symver at least seem refers to jargon and might
be less confusing (and people who don't know will have to look up the
doc).
> Yeah. By the way, do you prefer 'regexp' (as it is now) or 'regex'?
regex seems to be more popular and that's the name of the manual page
as well, so maybe rather "regex".
Remote branch [1] has been updated. Notes:
* readline() cannot be replaced with <> easily due to trickiness in perl
syntax. Perl does not like anything but <$simplevar> and we need <$hash-
>{key}> here.
* I kept the name 'symver' purely due to its length. You suggested good names
but their length is kind of distracting for me. However, instructions how to
safely change the name is in the log of the respective commit.
* Double "exists" in is_optional() is needed to avoid creation of redundant
{pattern} hash. Otherwise, "save -> load" tests would fail. {old_wildcard} is
under {pattern} due to how create_pattern_match() eliminates pattern
properties.
* dclone()/sclone() remain.
1. http://git.debian.org/?p=users/modax/dpkg.git;a=summary
--
Modestas Vainius <mode...@vainius.eu>
On Sun, 10 Jan 2010, Modestas Vainius wrote:
> Remote branch [1] has been updated. Notes:
I cleaned up your branch and added some changes of mine, it's here:
http://git.debian.org/?p=users/hertzog/dpkg.git;a=shortlog;h=refs/heads/pu/symbol-patterns
I squashed some small fixes in some of your commits (small typos in the
man page, and a bug in save_load_test()), I hope you don't mind.
Feel free to review. I'm going to merge this soon if you don't see any
problem.
> * readline() cannot be replaced with <> easily due to trickiness in perl
> syntax. Perl does not like anything but <$simplevar> and we need <$hash-
> >{key}> here.
$hash->{key}->getline() would still be nicer IMO but not a big deal, I see
we use readline() at 2 other places in the code...
> * Double "exists" in is_optional() is needed to avoid creation of redundant
> {pattern} hash. Otherwise, "save -> load" tests would fail. {old_wildcard} is
> under {pattern} due to how create_pattern_match() eliminates pattern
> properties.
I dropped old_wildcard and the changes in is_optional() in the last
commit.
On pirmadienis 11 Sausis 2010 20:33:39 Raphael Hertzog wrote:
> Hi,
>
> On Sun, 10 Jan 2010, Modestas Vainius wrote:
> > Remote branch [1] has been updated. Notes:
>
> I cleaned up your branch and added some changes of mine, it's here:
> http://git.debian.org/?p=users/hertzog/dpkg.git;a=shortlog;h=refs/heads/pu/
> symbol-patterns
>
> I squashed some small fixes in some of your commits (small typos in the
> man page, and a bug in save_load_test()), I hope you don't mind.
Sure.
> Feel free to review. I'm going to merge this soon if you don't see any
> problem.
You will need testsuite-fix.diff patch to make the code pass the testsuite. I
suggest --amend it to your old-style wildcards commit because the later
changed cosmetic behaviour a bit (save_load_test() fails because diff is no
longer the same due to wildcards etc.)
In addition (save_load_test_shifts.diff), double "shift"s are back in
save_load_test() because @_ is later passed to save() call. I don't know how
you prefer to handle this type of cases in your style though.
Otherwise, full ACK from me.
--
Modestas Vainius <mode...@vainius.eu>
Oops... applied but with another fix, it contains 106 tests not 105.
> In addition (save_load_test_shifts.diff), double "shift"s are back in
> save_load_test() because @_ is later passed to save() call. I don't know how
> you prefer to handle this type of cases in your style though.
my($a, $b, %opts) = @_;
and then pass along %opts where wanted (or @opts if it's not hash-like
options).
> Otherwise, full ACK from me.
Ok, merged.