parsing XML with awk

Ed Morton

unread,

Jul 11, 2016, 5:20:22 AM7/11/16

to

I'd love to be able to parse XML with awk rather than having to learn xmlstarlet
or similar.

I THINK from reading the documentation
(https://www.gnu.org/software/gawk/manual/gawk.html#gawkextlib) that to do so
I'd have to download gawkextlib code, download the Expat XML parser library,
download GNU Autotools, download the XML parser extension, and then build the
gawkextlib library (and then use it as gawk -i gawkextlib ...?).

No matter how useful this functionality is, I'm just never going to do all of
that - am I right in thinking I can't just do:

gawk -i something ...

to use the gawk XML parsing extension, like I can do

gawk -i inplace ...

to use the gawk inplace editing extension? If so, why?

Ed.

Manuel Collado

unread,

Jul 11, 2016, 11:00:25 AM7/11/16

to

Well, to be able to issue a command like "gawk -i inplace ...", you have
to install gawk first:

- Download and unpack gawk-xxxx.tar.gz
- ./configure, make, make check, make install
(may also require some optional libraries, like MPFR)

Once you have gawk properly installed, the XML extension from gawkextlib
can be installed by a very similar procedure, according to the given
instructions:

1- Install the gawkextlib common base (just once, for all the additional
extensions):
Donwload and unpack the tarball, and
./configure, make, make check, make install

2- Install the XML extension:
Donwload and unpack the tarball, and
./configure, make, make check, make install
(this step requires the expat library)

The idea is to bundle the main gawk distribution with only a small set
of general purpose extensions, and install additional specialized
extensions individually.

Ed Morton

unread,

Jul 11, 2016, 11:43:19 AM7/11/16

to

On 7/11/2016 10:00 AM, Manuel Collado wrote:
> El 11/07/2016 a las 11:20, Ed Morton escribió:
>> I'd love to be able to parse XML with awk rather than having to learn
>> xmlstarlet or similar.
>>
>> I THINK from reading the documentation
>> (https://www.gnu.org/software/gawk/manual/gawk.html#gawkextlib) that to
>> do so I'd have to download gawkextlib code, download the Expat XML
>> parser library, download GNU Autotools, download the XML parser
>> extension, and then build the gawkextlib library (and then use it as
>> gawk -i gawkextlib ...?).
>>
>> No matter how useful this functionality is, I'm just never going to do
>> all of that - am I right in thinking I can't just do:
>>
>> gawk -i something ...
>>
>> to use the gawk XML parsing extension, like I can do
>>
>> gawk -i inplace ...
>>
>> to use the gawk inplace editing extension? If so, why?
>
> Well, to be able to issue a command like "gawk -i inplace ...", you have to
> install gawk first:
>
> - Download and unpack gawk-xxxx.tar.gz
> - ./configure, make, make check, make install
> (may also require some optional libraries, like MPFR)

Every UNIX system I use already has gawk installed so, thankfully, I've never
had to do any of that and if I ever did have to do it then I'd like to think I'd
do it but in reality I'd probably just use whatever other awk was already
present instead unless something came up that I had a real NEED to use gawk for.

> Once you have gawk properly installed, the XML extension from gawkextlib can be
> installed by a very similar procedure, according to the given instructions:
>
> 1- Install the gawkextlib common base (just once, for all the additional
> extensions):
> Donwload and unpack the tarball, and
> ./configure, make, make check, make install
>
> 2- Install the XML extension:
> Donwload and unpack the tarball, and
> ./configure, make, make check, make install
> (this step requires the expat library)
>
> The idea is to bundle the main gawk distribution with only a small set of
> general purpose extensions, and install additional specialized extensions
> individually.
>

OK but why? My laptop, for example, has plenty of memory and I have high speed
internet access so when I install gawk with cygwin why does it benefit me to not
have all of the gawk extensions in the default bundle so I can just use them out
the gate?

Thanks for the quick response,

Ed.

Janis Papanagnou

unread,

Jul 11, 2016, 2:24:36 PM7/11/16

to

On 11.07.2016 17:43, Ed Morton wrote:
> On 7/11/2016 10:00 AM, Manuel Collado wrote:

>> [...]

>
> OK but why? My laptop, for example, has plenty of memory and I have high speed
> internet access so when I install gawk with cygwin why does it benefit me to
> not have all of the gawk extensions in the default bundle so I can just use
> them out the gate?

Well, the gawk software architects and maintainers will probably be able to
explain. One reason *might* be that you want to be able to generate a small
gawk version for space-restricted environments. I would think, though, that
this could be best controlled by a compile time option (a special makefile
target maybe) so that you don't need multiple isolated build steps that are
still depending on each other (which are possible and not uncommon sources
of build incompatibility errors). Myself I think have suggested before that
at least the "gawkextlib common base" could (or should) be part of the core
gawk; this would eliminate one separate compile cycle and result in one less
dependency. WRT the specific modules I'm not that sure; someone has to pack
functions into modules, and compile modules for inclusion. Since modules
will probably come from various sources you can't just put all together in
one big library to simply choose from (externally accessible with a -i gawk
option, internally mapped to a dlopen() function call, or so). I feel a bit
uneasy to suggest [here in c.l.a] to look into the design how (e.g.) perl
solved that issue[*]; it at least looks much more organized and simpler to
use. The gawk method feels more like "here is a knife and a log to carve a
staff, and an anvil and some metal to forge a shovel blade, then put the
two pieces together (pray they fit) to obtain the shovel to do your work".

Janis

[*] http://www.cpan.org/modules/INSTALL.html

Andrew Schorr

unread,

Jul 11, 2016, 4:08:09 PM7/11/16

to

Hi,

I think Perl's approach to distributing modules is certainly worth looking at. As with gawk, there are multiple steps. First, you install perl. Then, you install additional modules that you may want to use. If you use a Linux distribution like Fedora, then some of the perl modules are packaged as rpms. If you are lucky, you can say "dnf install perl-module-name", and it will install the requested module and any dependencies. If you are unlucky enough that the specific perl module you want hasn't been packaged for the linux (or cygwin) distribution, then you will have to install the module using the CPAN mechanisms. This can get pretty messy from a system administration perspective.

I hope that some day the gawkextlib modules may be packaged as part of distributions. It would be really nice to say "dnf install gawk-xml". I'm not sure how to make that happen. If enough people start using this stuff and request it, then maybe Cygwin and Fedora and Ubuntu, etc. will start packaging the gawkextlib modules. Until then, it's simply a matter of following Manuel's instructions, and really the standard instructions for any open-source software: download the tarball, then run ./configure && make && make check && make install. This doesn't seem so onerous to me, but I have been working with open-source software for 30 years now, so I guess I'm used to it.

The current gawkextklib modules actually include rpm spec files already, so it should be very easy to get them included in any rpm-based distro. Maybe it's just a question of asking the Fedora folks to do it. Perhaps we need volunteers as Fedora package maintainers. Also, if anybody would like to help out by contributing packaging info for other distributions, we would welcome that. I don't know what's involved with getting packages added to Cygwin.

Beyond that, there's a question of whether gawkextlib should have its own packaging mechanism. I'm not sure about that. When it comes to Perl modules, it seems like a waste of effort that there's both the cpan packaging mechanism and the rpm packages created on top of that. But if anybody would like to volunteer to develop a package installation mechanism for gawkextlib, I'd be happy to discuss it. At the end of the day, it's just a question of declaring the package dependencies (which is already done inside the rpm spec files included in the tarballs), and writing a script to download all the needed tarballs, unpack them, and then run configure and make and make install. But this is really what the distribution package managers are supposed to do for us, so it seems like reinventing the wheel. My gut is that the better investment is to figure out how to get this stuff included in the various distributions. Any volunteers to help with that?

Regards,
Andy

Kenny McCormack

unread,

Jul 11, 2016, 10:07:36 PM7/11/16

to

In article <9b01fe89-5a57-4f62...@googlegroups.com>,

Andrew Schorr <asc...@telemetry-investments.com> wrote:
>Hi,
>
>I think Perl's approach to distributing modules is certainly worth looking at. As
>with gawk, there are multiple steps. First, you install perl. Then, you install
>additional modules that you may want to use. If you use a Linux distribution like

I have two comments on this thread:

1) The same ground was covered in the recent thread started by me about
the "Lightning" extension. For whatever reason(s), the need for
performing the extra step (compiling gawkextlib) is just too much
bother for many/most users (including, the rather unlikely combo of
myself and Ed Morton). You and others can say all you want that it
shouldn't be that onerous, but facts are facts, and the fact is
that it is.

2) The obvious solution is to make gawkextlib part of the mainline gawk
distribution - and, of course, making building same part of the
basic, normal "configure/make/make install" routine. If you did
this, then it would be there - it would be there pretty much
whenever gawk is there (unless the installer went out of his way to
not do things the "normal" way). Worrying about getting
distributions (e.g., Redhat, Debian, etc) to include it is looking
at it from the wrong perspective (as well as, obviously, being
Linux-centric).

--
"Every time Mitt opens his mouth, a swing state gets its wings."

(Should be on a bumper sticker)

Andrew Schorr

unread,

Jul 11, 2016, 11:23:04 PM7/11/16

to

On Monday, July 11, 2016 at 10:07:36 PM UTC-4, Kenny McCormack wrote:
> I have two comments on this thread:
>
> 1) The same ground was covered in the recent thread started by me about
> the "Lightning" extension. For whatever reason(s), the need for
> performing the extra step (compiling gawkextlib) is just too much
> bother for many/most users (including, the rather unlikely combo of
> myself and Ed Morton). You and others can say all you want that it
> shouldn't be that onerous, but facts are facts, and the fact is
> that it is.

Agreed. This ground has been covered, although I will never be able to understand why it's so hard for you to install this stuff.

> 2) The obvious solution is to make gawkextlib part of the mainline gawk
> distribution - and, of course, making building same part of the
> basic, normal "configure/make/make install" routine. If you did
> this, then it would be there - it would be there pretty much
> whenever gawk is there (unless the installer went out of his way to
> not do things the "normal" way). Worrying about getting
> distributions (e.g., Redhat, Debian, etc) to include it is looking
> at it from the wrong perspective (as well as, obviously, being
> Linux-centric).

Here's another fact -- the gawkextlib library is not going to become part of the core gawk distribution, for reasons that have already been stated in previous discussions. I'm open to any other constructive suggestions and/or contributions for how to improve things.

Regards,
Andy

P.S. I don't see how adding the gawkextlib packages to the Cygwin distribution would be Linux-centric. I presume there must also be some sort of add-on package system for MacOS that we could tackle. I know it's hard, but that's the reality -- every O/S has its own package distribution system. I don't think there are any shortcuts.

Ed Morton

unread,

Jul 12, 2016, 3:53:28 AM7/12/16

to

On 7/11/2016 10:23 PM, Andrew Schorr wrote:
> On Monday, July 11, 2016 at 10:07:36 PM UTC-4, Kenny McCormack wrote:
>> I have two comments on this thread:
>>
>> 1) The same ground was covered in the recent thread started by me about
>> the "Lightning" extension. For whatever reason(s), the need for
>> performing the extra step (compiling gawkextlib) is just too much
>> bother for many/most users (including, the rather unlikely combo of
>> myself and Ed Morton). You and others can say all you want that it
>> shouldn't be that onerous, but facts are facts, and the fact is
>> that it is.
>
> Agreed. This ground has been covered, although I will never be able to understand why it's so hard for you to install this stuff.

Microwaves are extremely useful. If to have one I had to buy separate parts from
multiple vendors and assemble it all myself I wouldn't have one and I'm sure my
local microwave repairman would be bewildered at that decision since the steps
are documented and he has no problem following them.

The steps required to build the library necessary to parse XML with awk are, I'm
sure, clear and trivial for someone who regularly has to follow those or similar
steps but for the vast majority of us who never have to do anything like that
and just want a tool we can call to parse XML, you may as well be showing us
lists of hardware components and addresses we can order them from and telling us
to go build our microwave. On your side of the fence I'm sure that seems
perfectly reasonable but on our side it falls right into the "you must be
joking" bin.

>> 2) The obvious solution is to make gawkextlib part of the mainline gawk
>> distribution - and, of course, making building same part of the
>> basic, normal "configure/make/make install" routine. If you did
>> this, then it would be there - it would be there pretty much
>> whenever gawk is there (unless the installer went out of his way to
>> not do things the "normal" way). Worrying about getting
>> distributions (e.g., Redhat, Debian, etc) to include it is looking
>> at it from the wrong perspective (as well as, obviously, being
>> Linux-centric).
>
> Here's another fact -- the gawkextlib library is not going to become part of the core gawk distribution, for reasons that have already been stated in previous discussions. I'm open to any other constructive suggestions and/or contributions for how to improve things.

Sorry, I don't know enough about the issues to offer a suggestion. I'm not
trying to be facetious but - did anyone state yet why the functionality to parse
XML isn't just available like `-i inplace` is? I don't mean "because you have to
compile blah blah blah" I mean what's the rationale for it not just being
present? Is it that it'd take too much memory or take too long to download or
something else? If someone already provided that info, sorry if I missed it.

I think at one point I saw an "if more people used it..." argument raise it's
head but of course if it was easier to use then far more people would use it so
that's a catch 22.

Ed.

Janis Papanagnou

unread,

Jul 12, 2016, 4:30:04 AM7/12/16

to

On 12.07.2016 09:53, Ed Morton wrote:
>
> Sorry, I don't know enough about the issues to offer a suggestion. I'm not
> trying to be facetious but - did anyone state yet why the functionality to

> parse XML isn't just available like `-i inplace` is? [...]

I upthread suggested the possibility that for restricted devices there may
be a requirement to keep the executable components small. (Like using a
small shell restricted to POSIX features alone instead of a ksh or bash in
environments like BusyBox.)

Another possibility is an issue about not violating licenses, or at least
being independent of third party licenses, if only to not break systems in
future by such a dependency. (I'm not sure, for example, how compatible the
"MIT license" of the Expat XML parser is with the gawk's GPL license.)

Janis

Kenny McCormack

unread,

Jul 12, 2016, 8:02:02 AM7/12/16

to

In article <nm27lm$3qm$1...@dont-email.me>,

Ed Morton <morto...@gmail.com> wrote:
>On 7/11/2016 10:23 PM, Andrew Schorr wrote:
>> On Monday, July 11, 2016 at 10:07:36 PM UTC-4, Kenny McCormack wrote:
>>> I have two comments on this thread:
>>>
>>> 1) The same ground was covered in the recent thread started by me about
>>> the "Lightning" extension. For whatever reason(s), the need for
>>> performing the extra step (compiling gawkextlib) is just too much
>>> bother for many/most users (including, the rather unlikely combo of
>>> myself and Ed Morton). You and others can say all you want that it
>>> shouldn't be that onerous, but facts are facts, and the fact is
>>> that it is.
>>
> Agreed. This ground has been covered, although I will never be able to
>understand why it's so hard for you to install this stuff.
>
>Microwaves are extremely useful. If to have one I had to buy separate
>parts from multiple vendors and assemble it all myself I wouldn't have one
>and I'm sure my local microwave repairman would be bewildered at that
>decision since the steps are documented and he has no problem following
>them.

There's a lot I could say at this point, but I don't really care to
re-engage, as I think this was all covered in the Lightning thread.

I'll just say one thing, though, and that is that these days (in the Linux
world), most code that is running on systems did not get there via the
traditional "configure/make/make-install" route. It got there via (something
like) "apt-get install". The mentality nowadays is that the dirty work of
"configure/make/make-install" is for "other people".

And that's the point. If gawkextlib were part of the core gawk build, then
people who do "apt-get install" would just have it. It would be there.
And their downstream customers (e.g., Mr. Morton) would be happy campers
indeed.

P.S. An even more basic analogy (more basic than your microwaves analogy)
would be cars. You can (still to this day) buy kits for cars, and you
build it yourself. For extremely high-end cars, this is still the norm.
But I think most people still prefer to buy their cars "turnkey" (This is,
in fact, the origin of the word "turnkey" - a word which is now widely used
in the context of computers and software).

P.P.S. The above is also true for hobbyist type airplanes. You can buy
them in kit form and you build them yourself.

--
Donald Drumpf claims to be "the least racist person you'll ever meet".

This would be true if the only other person you've ever met was David Duke.

Marc de Bourget

unread,

Jul 12, 2016, 9:04:10 AM7/12/16

to

Le mardi 12 juillet 2016 14:02:02 UTC+2, Kenny McCormack a écrit :
> In article,
> Ed Morton:

> >On 7/11/2016 10:23 PM, Andrew Schorr wrote:
> >> On Monday, July 11, 2016 at 10:07:36 PM UTC-4, Kenny McCormack wrote:

...

> And that's the point. If gawkextlib were part of the core gawk build, then
> people who do "apt-get install" would just have it. It would be there.
> And their downstream customers (e.g., Mr. Morton) would be happy campers
> indeed.
>

Yes, this could be like CPAN for Perl or WHEELS for Python or GEMS for Ruby:
Python:> pip install some-package.whl
Ruby:> gem install some-package.gem

Andrew Schorr

unread,

Jul 12, 2016, 1:06:49 PM7/12/16

to

On Tuesday, July 12, 2016 at 8:02:02 AM UTC-4, Kenny McCormack wrote:
> I'll just say one thing, though, and that is that these days (in the Linux
> world), most code that is running on systems did not get there via the
> traditional "configure/make/make-install" route. It got there via (something
> like) "apt-get install". The mentality nowadays is that the dirty work of
> "configure/make/make-install" is for "other people".

So we agree -- the best solution is to get the gawkextlib packages added to the standard O/S package distribution mechanisms so you can use commands such as apt-get or dnf.

> And that's the point. If gawkextlib were part of the core gawk build, then
> people who do "apt-get install" would just have it. It would be there.
> And their downstream customers (e.g., Mr. Morton) would be happy campers
> indeed.

Perhaps there's some confusion about terms here. There is a specific gawkextlib shared library that is used by various gawk extension libraries that are published in the gawkextlib project. Bundling the gawkextlib support library with core gawk would accomplish nothing, since you will still have to install an individual gawkextlib module in order to do anything.

This is moot anyway -- there is not a snowball's chance in hell that any of this stuff will ever become part of core gawk. It simply makes no sense. Does perl include each and every perl module in the base perl distribution? It's inconceivable that this would ever happen. And the gawkextlib support library will also not be bundled. It's simply not part of core gawk.

> P.S. An even more basic analogy (more basic than your microwaves analogy)
> would be cars. You can (still to this day) buy kits for cars, and you
> build it yourself. For extremely high-end cars, this is still the norm.
> But I think most people still prefer to buy their cars "turnkey" (This is,
> in fact, the origin of the word "turnkey" - a word which is now widely used
> in the context of computers and software).

These analogies to ovens and cars make no sense to me. We're talking about open-source software here. Please let's confine the discussion to software distribution mechanisms. But if you really insist on silly analogies, a more relevant one is whether a car comes bundled with the bicycles or surfboards that you might strap to the vehicle when you go on vacation.

Realistically speaking, there are 2 options: 1. find a way to get gawkextlib packages added to various O/S distributions standard packaging platforms; 2. develop a gawk-specific module extension system like CPAN, etc. I think #2 is the wrong approach, since #1 is really much better and easier to use. But I'm open to ideas, suggestions, and contributions.

Regards,
Andy

Aharon Robbins

unread,

Jul 12, 2016, 1:12:27 PM7/12/16

to

In article <62a5eb16-58b4-49e0...@googlegroups.com>,

Marc de Bourget <marcde...@gmail.com> wrote:
>Yes, this could be like CPAN for Perl or WHEELS for Python or GEMS for Ruby:
>Python:> pip install some-package.whl
>Ruby:> gem install some-package.gem

Eric Raymond in "The Cathedral and the Bazaar" (http://www.catb.org/esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s02.html)
says

1. Every good work of software starts by scratching a
developer's personal itch.

There is clearly an itch here waiting to be scratched. Neither I nor
Andrew are going to tackle this particular itch; we have our hands full
with our own, other, personal itches.

So, it's clearly time for "the community" to step up to the plate and
put something like this together. As Andrew mentioned, a volunteer to
work on getting gawkextlib added to the various Linux distributions
would also be welcome.

If noone is willing to step forward, then clearly, it's not important
enough to be worth doing.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com

Aharon Robbins

unread,

Jul 12, 2016, 1:26:46 PM7/12/16

to

In article <nm38dq$nbl$1...@dont-email.me>,

I'll even help any potential volunteers out with two starting points:

1. Fedora: https://fedoraproject.org/wiki/Join_the_package_collection_maintainers
2. Debian: https://www.debian.org/doc/manuals/distribute-deb/distribute-deb.html

Go for it!

Ed Morton

unread,

Jul 13, 2016, 10:09:38 PM7/13/16

to

Thanks all for the responses. It sounds like it's probably time to start
learning xmlstarlet :-( ...

Ed.

Andrew Schorr

unread,

Jul 13, 2016, 10:50:38 PM7/13/16

to

On Wednesday, July 13, 2016 at 10:09:38 PM UTC-4, Ed Morton wrote:
> Thanks all for the responses. It sounds like it's probably time to start
> learning xmlstarlet :-( ...

I am gobsmacked that you think this is easier than downloading and installing 2 tarballs and running "gawk -l xml". You could easily have installed the gawk-xml extension in less time than you have wasted on this discussion.

Regards,
Andy

Ed Morton

unread,

Jul 13, 2016, 11:13:34 PM7/13/16

to

Andy - I either have to learn how gawks XML parsing works or how xmlstarlets XML
parsing works. That is probably about a wash.

On top of that, though, if I choose to learn how to do it in gawk then I also
have to learn how to download and install gawks XML library (and remember - I
have never had to do that for any tool so it'd be a whole new learning
experience) on each machine I use (e.g. cygwin on my laptop at home, my macbook
at work, etc.) and any time I switch machines (e.g. if I have to borrow a
colleague's laptop) I have to re-learn how to do it again while if I choose to
learn how to parse XML with xlmstarlet then I just call the tool as it's already
present everywhere I want to do this.

If I need support for parsing XML with gawk, since I'm not sure if many people
are using it, I suspect it'd be hit or miss (and probably OT) asking questions
in this NG while if I use xmlstarlet there is a forum for asking questions
(http://stackoverflow.com/questions/tagged/xmlstarlet) and a large community of
users. When I looked for gawk+xml on stackoverflow there was 1 question posted
(http://stackoverflow.com/questions/tagged/gawk+xml) and a non-gawk answer accepted.

So, while I would love to learn how to do this in gawk, the effort involved in
building/installing the library is just the final nail in the coffin of my
enthusiasm.

My time was not wasted in this discussion, I now know that to use gawk XML
parsing would require the effort I though it did and that doesn't seem like it's
going to change any time soon so I can give up on that and start learning
xmlstarlet or similar. I apologize if this discussion wasted your time.

I appreciate everything the gawk guys have provided and I understand they
already spend a ton of time on gawk and can't do everything.

Hopefully that explanation unsmacks your gob :-).

Ed.

Andrew Schorr

unread,

Jul 14, 2016, 10:10:27 AM7/14/16

to

On Wednesday, July 13, 2016 at 11:13:34 PM UTC-4, Ed Morton wrote:
> On top of that, though, if I choose to learn how to do it in gawk then I also
> have to learn how to download and install gawks XML library (and remember - I
> have never had to do that for any tool so it'd be a whole new learning
> experience) on each machine I use (e.g. cygwin on my laptop at home, my macbook
> at work, etc.) and any time I switch machines (e.g. if I have to borrow a
> colleague's laptop) I have to re-learn how to do it again while if I choose to
> learn how to parse XML with xlmstarlet then I just call the tool as it's already
> present everywhere I want to do this.

That's a fair point. I didn't realize you were planning to use multiple computers with multiple operating system variants.

> If I need support for parsing XML with gawk, since I'm not sure if many people
> are using it, I suspect it'd be hit or miss (and probably OT) asking questions
> in this NG while if I use xmlstarlet there is a forum for asking questions
> (http://stackoverflow.com/questions/tagged/xmlstarlet) and a large community of
> users. When I looked for gawk+xml on stackoverflow there was 1 question posted
> (http://stackoverflow.com/questions/tagged/gawk+xml) and a non-gawk answer accepted.

I think you could find support for gawk-xml in comp.lang.awk. You could also try the gawkextlib-users mailing list. The gawk-xml package comes with extensive documentation and examples.

> My time was not wasted in this discussion, I now know that to use gawk XML
> parsing would require the effort I though it did and that doesn't seem like it's
> going to change any time soon so I can give up on that and start learning
> xmlstarlet or similar. I apologize if this discussion wasted your time.

Actually, this message was quite constructive, unlike some of the previous ones. I now have a much better understanding of your situation. In the long run, I hope we can get gawk-xml and other gawk extensions into the standard O/S distributions so that it will be as easily available as xmlstarlet.

> I appreciate everything the gawk guys have provided and I understand they
> already spend a ton of time on gawk and can't do everything.
>
> Hopefully that explanation unsmacks your gob :-).

It did. Thank you.

Regards,
Andy

MadSharker

unread,

Jul 17, 2016, 4:26:01 AM7/17/16

to

Hi all,
I attach to this subject with just a simple question.
Since I like awk so much and I need to parse xml files, until now I adopted
unconvenient multiple flag tricks to do the job.
So, why don't put xml parsing functions in basic Gawk code like the very
useful (but too limited, for me) networking functions are?

Joe User

unread,

Jul 17, 2016, 3:50:53 PM7/17/16

to

I have written a lot of gawk code to parse GED (genealogy) files. Why
doesn't gawk support GED file parsing natively?

It's kind of the same thing. There's got to be a line drawn. XML
processing is usually outside of the language definition.

Ed Morton

unread,

Jul 17, 2016, 4:50:34 PM7/17/16

to

I understand that point of view but I participate in a few forums that field awk
questions and when there is a particular file format to be parsed, the
overwhelming majority are one of these 3:

CSV
XML
JSON

After those 3, the numbers dwindle towards zero.

I don't think CSV is a surprise and we already have a decent way to handle that
with FPAT (though something a bit more robust would be nice).

Beyond just my observations I think the supporting evidence that there's a high
demand for XML and JSON parsers is that people have provided external tools to
do so, e.g. xmlstarlet and jq, and they have active user communities (e.g.
http://stackoverflow.com/questions/tagged/xmlstarlet and
http://stackoverflow.com/questions/tagged/jq).

Given that, I think XML and JSON parsers as part of gawk would be very useful to
a large community and it's after those that the line should be drawn but it's
just an opinion and I don't expect anything to come of it.

Ed.

Andrew Schorr

unread,

Jul 17, 2016, 7:11:57 PM7/17/16

to

On Sunday, July 17, 2016 at 4:50:34 PM UTC-4, Ed Morton wrote:
> I understand that point of view but I participate in a few forums that field awk
> questions and when there is a particular file format to be parsed, the
> overwhelming majority are one of these 3:
>
> CSV
> XML
> JSON

YAML is another one that pops up.

> Given that, I think XML and JSON parsers as part of gawk would be very useful to
> a large community and it's after those that the line should be drawn but it's
> just an opinion and I don't expect anything to come of it.

Yeah, sorry, I really don't believe these extra parsers will ever become part of the core gawk distribution. We built the gawk extension library API with this problem in mind. We have an XML extension, and there has been some work on a CSV parser. JSON and YAML would also be good candidates for external parsers.

The right solution is to get these popular libraries into the standard O/S distribution mechanisms so that it can be as easy to install them as it is to install gawk. But we can't just throw the kitchen sink into the core gawk distribution -- that would create a maintenance nightmare, as well as a lot of inappropriate dependencies for what's supposed to be a basic O/S tool that should be part of every distribution, even minimal ones. It's the same reason that Perl doesn't include XML and JSON modules in the core perl distribution. These types of added functionality belong in add-on modules.

Regards,
Andy

Kenny McCormack

unread,

Jul 18, 2016, 9:18:54 AM7/18/16

to

In article <b246e$578be1ce$adf2c12d$21...@API-DIGITAL.COM>,

Joe User <ax...@yahoo.com> wrote:
>MadSharker wrote:
>
>> Hi all,
>> I attach to this subject with just a simple question.
>> Since I like awk so much and I need to parse xml files, until now I
>> adopted unconvenient multiple flag tricks to do the job.
>> So, why don't put xml parsing functions in basic Gawk code like the very
>> useful (but too limited, for me) networking functions are?

Andy has explained the "why" of this from a political point of view.
Making it part of the core just isn't in the world view of the current GAWK
developers/maintainers.

A couple of other comments:
1) In a way, the networking functionality *is* an aberration. In the
long view, it probably should have been implemented as an extension
library rather than as native functionality. But it was
implemented before extensions (in their current form) existed.
Note: I am not sure of the exact time-line between the networking
and the implementation of the first version of extension libs. But
given that that first version was "pre-alpha", it probably doesn't
matter much.
2) The primary reason why GAWK XML parsing should not be in the core is
because it is (AFAIK) pretty much just a thin wrapper around
something called "xpat". What this means is that the core would
also have to include xpat itself (one way or the other) - and,
clearly, down that road lies chaos.

>I have written a lot of gawk code to parse GED (genealogy) files. Why
>doesn't gawk support GED file parsing natively?
>
>It's kind of the same thing. There's got to be a line drawn. XML
>processing is usually outside of the language definition.

The essence of these threads is that there is a conceptual difference
between sharing GAWK code (which is what your GED stuff is) and sharing
binaries (which is what the XML [and other extension libs] are). We have
reasonably good methods in place for the former; we don't have them for the
later.

--
"I think I understand delicate, but why do I have to wash my hands, and
be standing in cold water when doing it?"

Kaz Kylheku <k...@kylheku.com> in comp.lang.c

Andrew Schorr

unread,

Jul 18, 2016, 9:57:52 PM7/18/16

to

On Monday, July 18, 2016 at 9:18:54 AM UTC-4, Kenny McCormack wrote:
> A couple of other comments:
> 1) In a way, the networking functionality *is* an aberration. In the
> long view, it probably should have been implemented as an extension
> library rather than as native functionality. But it was
> implemented before extensions (in their current form) existed.
> Note: I am not sure of the exact time-line between the networking
> and the implementation of the first version of extension libs. But
> given that that first version was "pre-alpha", it probably doesn't
> matter much.

I agree completely. With the latest extension API version (not yet released), we have the ability to implement networking in a much better way, including full multiplexing. I think networking would not be inside core gawk had the extension API existed at the time the work was done. We just need somebody now to take the time to program up a good networking library for gawk.

> 2) The primary reason why GAWK XML parsing should not be in the core is
> because it is (AFAIK) pretty much just a thin wrapper around
> something called "xpat". What this means is that the core would
> also have to include xpat itself (one way or the other) - and,
> clearly, down that road lies chaos.

The library name is expat. If XML were inside gawk, then installing gawk would require that the expat library also be installed. And suppose the new version of the expat library had an incompatible API? Then we would have to patch core gawk in sync with it. That approach is not scalable; that way madness lies.

Regards,
Andy

Kenny McCormack

unread,

Jul 19, 2016, 12:12:32 AM7/19/16

to

In article <42cf330f-a92e-4ae7...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...

>The library name is expat. If XML were inside gawk, then installing gawk
>would require that the expat library also be installed. And suppose the
>new version of the expat library had an incompatible API? Then we would
>have to patch core gawk in sync with it. That approach is not scalable;
>that way madness lies.

Glad to see that we agree on both these point.

One question, though: Is it chaos or madness that lies down that road?
Or both?

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain in
compliance with said RFCs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/Reaganomics

Aharon Robbins

unread,

Jul 19, 2016, 1:03:42 PM7/19/16

to

In article <nmk9bf$8a7$1...@news.xmission.com>,

Kenny McCormack <gaz...@shell.xmission.com> wrote:
>One question, though: Is it chaos or madness that lies down that road?
>Or both?

Both. Most defintely both.

MadSharker

unread,

Jul 19, 2016, 1:39:23 PM7/19/16

to

"Andrew Schorr" <asc...@telemetry-investments.com> ha scritto nel messaggio
news:42cf330f-a92e-4ae7...@googlegroups.com...

> I agree completely. With the latest extension API version (not yet
> released), we have the ability to implement networking in a
> much better way, including full multiplexing. I think networking would not
> be inside core gawk had the extension API existed
> at the time the work was done. We just need somebody now to take the time
> to program up a good networking library for gawk.

Great news! So, can I expect gawk will be able to manage https without being
forced to call external programs like wget?

Joe User

unread,

Jul 19, 2016, 3:37:52 PM7/19/16

to

MadSharker wrote:

> Great news! So, can I expect gawk will be able to manage https without
> being forced to call external programs like wget?
>

And right after that, gawk has to be able to evaluate JavaScript, to handle
those pesky JS-heavy webpages.

pjfarley3

unread,

Jul 19, 2016, 8:11:34 PM7/19/16

to

On Monday, July 18, 2016 at 9:57:52 PM UTC-4, Andrew Schorr wrote:
> On Monday, July 18, 2016 at 9:18:54 AM UTC-4, Kenny McCormack wrote:
> > A couple of other comments:

> > 2) The primary reason why GAWK XML parsing should not be in the core is
> > because it is (AFAIK) pretty much just a thin wrapper around
> > something called "xpat". What this means is that the core would
> > also have to include xpat itself (one way or the other) - and,
> > clearly, down that road lies chaos.
>
> The library name is expat. If XML were inside gawk, then installing gawk would require that the expat library also be installed. And suppose the new version of the expat library had an incompatible API? Then we would have to patch core gawk in sync with it. That approach is not scalable; that way madness lies.

Just chiming in here for closer integration of the gawkextapi and core gawk. I have no problem if the XML library and other gawkextlib-dependent extensions are separate from core gawk, but IMHO gawkextapi itself should be bundled with core gawk and build with it if the user chooses to build their own copy.

For non-*ix systems one also ought to be able to download a binary+doc+sample zip package (or gzip, or 7z or whatever compressor you prefer) that includes core gawk and the gawkextapi library and then download separate binary+doc+sample zip packages implementing some extension logic (like XML), so that unzipping both core and extension zipped packages to the same install directory enables gawk to be executed on that non-*ix system with the extensions available. Eli does yeoman's work using Mingw(32) to generate and distribute gawk and many other useful utilities and libraries for the Mingw environment, but he certainly shouldn't be the only one doing it.

I do understand the need for volunteers to package things up for the various *ix distribution archives (rpm or deb or apt or whatever), and I would hope a volunteer may step up to do the same for the non-*ix platforms. With, as I understand it, at least one of the Mingw environments available for cross-platform building on Linux systems, I would hope that makefile targets could be set up to cross-build all the needed non-*ix platform executables and libraries under Linux, but maybe I am dreaming.

For that matter, shouldn't rpm or deb or apt or whatever targets also be able to be set up in the makefiles to automatically create them for *ix distribution archives? I would think that capability has already been invented, hasn't it? Or is it so unique to each distribution that you have to create it using that distribution?

Peter

Andrew Schorr

unread,

Jul 20, 2016, 10:28:04 AM7/20/16

to

On Tuesday, July 19, 2016 at 1:39:23 PM UTC-4, MadSharker wrote:
> Great news! So, can I expect gawk will be able to manage https without being
> forced to call external programs like wget?

That could be possible if somebody writes a suitable extension library. As far as I'm concerned, wget works perfectly well, so I'm not motivated to implement this.

Regards,
Andy

Andrew Schorr

unread,

Jul 20, 2016, 10:35:51 AM7/20/16

to

On Tuesday, July 19, 2016 at 8:11:34 PM UTC-4, pjfarley3 wrote:
> Just chiming in here for closer integration of the gawkextapi and core gawk. I have no problem if the XML library and other gawkextlib-dependent extensions are separate from core gawk, but IMHO gawkextapi itself should be bundled with core gawk and build with it if the user chooses to build their own copy.

Sheesh. I really don't understand why this is such a big issue. It takes literally less than 3 seconds to build and install libgawkextlib. Why does it matter whether it is bundled with core gawk? As far as I'm concerned, it's totally fine that libgawkextlib uses the same installation mechanisms as the actual extension libraries. If you can figure out how to install an extension library, then you will also know how to install libgawkextlib. The bigger issue is how to make it easier to install extension libraries. If we solve that, then libgawkextlib comes along for the ride.

> For non-*ix systems one also ought to be able to download a binary+doc+sample zip package (or gzip, or 7z or whatever compressor you prefer) that includes core gawk and the gawkextapi library and then download separate binary+doc+sample zip packages implementing some extension logic (like XML), so that unzipping both core and extension zipped packages to the same install directory enables gawk to be executed on that non-*ix system with the extensions available. Eli does yeoman's work using Mingw(32) to generate and distribute gawk and many other useful utilities and libraries for the Mingw environment, but he certainly shouldn't be the only one doing it.

Should this zip file also include the expat library? How about the lmdb library? Who will maintain this? What happens when a new version of expat is released?

> I do understand the need for volunteers to package things up for the various *ix distribution archives (rpm or deb or apt or whatever), and I would hope a volunteer may step up to do the same for the non-*ix platforms. With, as I understand it, at least one of the Mingw environments available for cross-platform building on Linux systems, I would hope that makefile targets could be set up to cross-build all the needed non-*ix platform executables and libraries under Linux, but maybe I am dreaming.

I don't know whether this is possible. Cross-compilation seems like a dangerous way to go.

> For that matter, shouldn't rpm or deb or apt or whatever targets also be able to be set up in the makefiles to automatically create them for *ix distribution archives? I would think that capability has already been invented, hasn't it? Or is it so unique to each distribution that you have to create it using that distribution?

The gawkextlib tarballs include RPM spec files already. If you have an RPM-based system like Fedora or RHEL or CentOS, you can simply say:

rpmbuild -tb <blah>.tar.gz

That creates a binary RPM that you can then install using the rpm command. We would be happy to accept contributions to make it easier to install on other distributions.

That being said, I just noticed that the spec file for gawkextlib has an error on newer versions of Fedora that have added some checks for installed but unpackaged files, so that needs a little work.

Regards,
Andy

MadSharker

unread,

Jul 20, 2016, 12:52:05 PM7/20/16

to

"Andrew Schorr" <asc...@telemetry-investments.com> ha scritto nel messaggio

news:e32b8c11-0b31-4074...@googlegroups.com...

It does work well, no conplains at all. But it has to be called as an
external co-processing program. It would be far better if both http and
https were managed in the same way, hopefully as a part of gawk (extension
or not).
I realise it is a hard work to do - I am just a Gawk fanatic user, so be
patient with me :-) - but I wonder if it would not be just a matter of
readapting some piece of open source GNU code someone else has already
developed for managing secure connections over http.

Janis Papanagnou

unread,

Jul 20, 2016, 1:09:36 PM7/20/16

to

Given that wget or curl support tons of options I wonder whether you want
to see some very small subset (which one?) or a complete reimplementation
of all those programs' features. The latter most surely won't happen, and
for good reasons, IMO.

FWIW; awk's "curl ..." | getline var serves me well. What requirements
do you have to want curl/wget directly/natively supported in awk?

Janis

Kaz Kylheku

unread,

Jul 20, 2016, 1:36:06 PM7/20/16

to

I looked at what it takes to do a decent integration of libcurl into a
language and came to the conclusion that the equivalent of a popen
on a curl command line not only achieves the same functionality,
but is far less risky compared to all the surrounding C coding.

Even if you do everything correctly, the piped solution provides better
isolation. The piped curl is in its own process that can be killed off
if it hangs, with no consequences to the parent.

Integrating curl into the same image is only worth it if you're
implementing something sophisticated. Like, say you want to be able to
provide progress callbacks to the application (so it can render some
progress report on a UI). Or you want less copying and buffering for
efficiency.

MadSharker

unread,

Jul 21, 2016, 3:44:27 PM7/21/16

to

"Janis Papanagnou" <janis_pa...@hotmail.com> ha scritto nel messaggio
news:nmob8e$74d$1...@news-1.m-online.net...

>
> FWIW; awk's "curl ..." | getline var serves me well. What requirements
> do you have to want curl/wget directly/natively supported in awk?

It's just a matter of portability. Sometimes I develop scripts for people
who are not aware of what they are using. They just want to make a few
selections and have their results back. In such cases ask them to check for
the presence of curl, wget or something else, or even install new software
in their system is just a bloodshed. Mac users in particular.

Janis Papanagnou

unread,

Jul 21, 2016, 3:50:45 PM7/21/16

to

Your users don't necessarily need to know that they are using curl if you
use it inside your awk program as shown above. Just checked a macbook and
curl is available in the system's default configuration.

Janis

Kaz Kylheku

unread,

Jul 21, 2016, 5:47:00 PM7/21/16

to

Also, suppose you deploy a long-running application which spawns curl
jobs. You can upgrade curl (e.g. to fix a security problem or whatever)
without stopping that application. The app will just begin spawning
the new executable of curl when it replaces the old.

If the app is linking to the curl library, that library has to be wrapped
in a plugin container that cleanly supports unloading and reloading.
That requires taking care of details like getting all the threads to
leave the library before unloading it.

The command line API is more robust against changes. Command line
interfaces require careful versioning, just like C function calls, but
in practice it is easier and resilient. Versioning C API's is a very
fragile business.

pjfarley3

unread,

Jul 22, 2016, 12:03:14 AM7/22/16

to

On Wednesday, July 20, 2016 at 10:35:51 AM UTC-4, Andrew Schorr wrote:
> Sheesh. I really don't understand why this is such a big issue.

Look at it from the point of view of the Windows user of gawk. There is no pre-existing build environment available. If you need to build a tool or an API or a library, first you have to install a whole build environment, Cygwin or mingw or mingw64 to speak of those I know. None of them easy to install and set up, though I must admit that Cygwin can be made almost easy enough (see the babun project, and the new Cygnal project has distinct possibilities when added to that).

But for someone who just wants the text manipulation tool one gets with gawk to perform real-world work, one does NOT want, nor should one need, to build it oneself. Just bundle everything together in a binary+doc+sample archive and let me install that where I can execute it.

As I said previously, I bless Eli Z. for doing that for core gawk via his ezwinports site, but why isn't the process of building that Windows archive part of every release of gawk by the maintainers that is automatically available from github when a release becomes stable?

> Should this zip file also include the expat library? How about the lmdb library? Who will maintain this? What happens when a new version of expat is released?

In a word, yes. As a maintainer, all you have to do is pick the library version that lets the Windows executable pass the "make check" process and bundle that version of the library with the archive that uses that library. The Windows user does not care if the upstream library changes, they are not getting those upstream updates unless they go looking for them, so there is no library change to worry about. When the next gawk or gawkextlib or XML extension is released, it gets bundled with the library version that works for that release.

> > With, as I understand it, at least one of the Mingw environments available for cross-platform building on Linux systems, I would hope that makefile targets could be set up to cross-build all the needed non-*ix platform executables and libraries under Linux, but maybe I am dreaming.
>
> I don't know whether this is possible. Cross-compilation seems like a dangerous way to go.

I don't understand why you think that cross-compilation is any more dangerous than self-hosted compilation. If a self-hosted compilation process causes an issue, you go to the compiler maintainers to find a solution. If there isn't one you stay on the prior working release until it is fixed. The same is true for the cross-compilation process. Where is the danger?

> The gawkextlib tarballs include RPM spec files already. If you have an RPM-based system like Fedora or RHEL or CentOS, you can simply say:
>
> rpmbuild -tb <blah>.tar.gz
>
> That creates a binary RPM that you can then install using the rpm command. We would be happy to accept contributions to make it easier to install on other distributions.

My point was that the build processes for any widely used tool ought by now to include all the archive-building processes needed for any available distribution. It's already been done or there wouldn't be distributions. Just copy in the appropriate code/specs/etc. and build it as part of each stable release. I have to believe it has been automated to a large extent for any "normal" tool like gawk.

Don't get me wrong here - I am *not* faulting the hard-working and selfless volunteer maintainers who keep gawk alive and well for not working hard enough. Did I have the time myself I would be one of them, helping to do the things I have spoken about. I just am asking why these things have not been made priorities yet.

Like I think Ed M. has been trying to say, building tools is not what we are paid to do, using them is. I guess that makes us vampires of your work, and I am sorry for that. I would change it if I could.

Regards,

Peter

Aharon Robbins

unread,

Jul 22, 2016, 3:51:13 AM7/22/16

to

In article <23fc9574-b9ea-41be...@googlegroups.com>,

pjfarley3 <pjfa...@yahoo.com> wrote:
>As I said previously, I bless Eli Z. for doing that for core gawk via
>his ezwinports site, but why isn't the process of building that Windows
>archive part of every release of gawk by the maintainers that is
>automatically available from github when a release becomes stable?

Because we don't use Github at all.

Because the GNU project as policy is only concerned with maintaining and
publishing source code.

Because there isn't enough volunteer time available to do these things.

>Don't get me wrong here - I am *not* faulting the hard-working and
>selfless volunteer maintainers who keep gawk alive and well for not
>working hard enough. Did I have the time myself I would be one of them,
>helping to do the things I have spoken about. I just am asking why
>these things have not been made priorities yet.

Because the volunteers currently active are concerned with fixing
bugs and improving the code. Mucking about with binary releases moves
neither of those objectives forward.

Andrew Schorr

unread,

Jul 22, 2016, 9:34:35 AM7/22/16

to

On Friday, July 22, 2016 at 12:03:14 AM UTC-4, pjfarley3 wrote:
> On Wednesday, July 20, 2016 at 10:35:51 AM UTC-4, Andrew Schorr wrote:
> > Sheesh. I really don't understand why this is such a big issue.
>
> Look at it from the point of view of the Windows user of gawk. There is no pre-existing build environment available. If you need to build a tool or an API or a library, first you have to install a whole build environment, Cygwin or mingw or mingw64 to speak of those I know. None of them easy to install and set up, though I must admit that Cygwin can be made almost easy enough (see the babun project, and the new Cygnal project has distinct possibilities when added to that).

I understand the desire for binaries, but in the absence of binaries, I do not understand the insistence on not having to build libgawkextlib in addition to the desired gawk extension library modules. The gawkextlib maintainers went to a great deal of effort to ensure that the libraries build on unix variants, Cygwin, and on MacOS. Supporting MinGW or providing binary packages would require additional volunteers to step up to work on these issues.

> > Should this zip file also include the expat library? How about the lmdb library? Who will maintain this? What happens when a new version of expat is released?
>
> In a word, yes. As a maintainer, all you have to do is pick the library version that lets the Windows executable pass the "make check" process and bundle that version of the library with the archive that uses that library. The Windows user does not care if the upstream library changes, they are not getting those upstream updates unless they go looking for them, so there is no library change to worry about. When the next gawk or gawkextlib or XML extension is released, it gets bundled with the library version that works for that release.

It's fine if a volunteer wants to step forward to do this, but the current developers simply don't have the time or desire to tackle this type of project.

> I don't understand why you think that cross-compilation is any more dangerous than self-hosted compilation. If a self-hosted compilation process causes an issue, you go to the compiler maintainers to find a solution. If there isn't one you stay on the prior working release until it is fixed. The same is true for the cross-compilation process. Where is the danger?

Cross-compilation is nice in theory, but I wouldn't really trust it to work on the target platform without testing it there. If I'm going to test it on the target, why not simply build it on the target?

> My point was that the build processes for any widely used tool ought by now to include all the archive-building processes needed for any available distribution. It's already been done or there wouldn't be distributions. Just copy in the appropriate code/specs/etc. and build it as part of each stable release. I have to believe it has been automated to a large extent for any "normal" tool like gawk.

I wish it were as easy as you say it should be. I don't believe that it is the case. I would love to be enlightened about how to do this better. Each gawkextlib tarball contains a packaging subdirectory where we would be happy to include support to get this done. Please help us to improve it by providing patches or at least specific instructions on what to do.

> Like I think Ed M. has been trying to say, building tools is not what we are paid to do, using them is. I guess that makes us vampires of your work, and I am sorry for that. I would change it if I could.

I'm not paid to work on gawk or gawkextlib either. I do it because I use these tools and want to see them improve so I can become more productive in the work for which I do get paid. If I were paid to do this stuff, I would happily spend more time on it. The community relies upon volunteers to help improve these open-source projects. There has been lots of discussion in this forum about how the installation of gawkextlib libraries should be improved. That has consumed a lot of time that could be better spent working on actually improving the situation. If people really care so much about improving installation, then why not spend the time working on actually improving it? What we need are concrete patches and/or specific suggestions on what steps to take, and these must be consistent with the design philosophy and spirit of the project.

Regards,
Andy

Janis Papanagnou

unread,

Jul 22, 2016, 10:27:15 AM7/22/16

to

On 22.07.2016 15:34, Andrew Schorr wrote:
>
> Cross-compilation is nice in theory, but I wouldn't really trust it to work
> on the target platform without testing it there. If I'm going to test it on
> the target, why not simply build it on the target?

I think this one is easily answered (and had probably been mentioned before);
because the target might not have a development tools/environment (compilers,
linkers, etc.) installed.

I agree that testing on every target system would always be preferable. But
testing is a process independent from building the tool.

Janis

Andrew Schorr

unread,

Jul 22, 2016, 3:47:31 PM7/22/16

to

On Wednesday, July 20, 2016 at 10:35:51 AM UTC-4, Andrew Schorr wrote:

> That being said, I just noticed that the spec file for gawkextlib has an error on newer versions of Fedora that have added some checks for installed but unpackaged files, so that needs a little work.

FYI, I just uploaded new versions of gawkextlib and gawk-xml that fix some minor RPM spec file issues.

Regards,
Andy

Andrew Schorr

unread,

Jul 23, 2016, 5:34:29 PM7/23/16

to

On Friday, July 22, 2016 at 3:47:31 PM UTC-4, Andrew Schorr wrote:
> FYI, I just uploaded new versions of gawkextlib and gawk-xml that fix some minor RPM spec file issues.

And I have submitted Fedora package requests to get these added to the Fedora distribution:

https://bugzilla.redhat.com/show_bug.cgi?id=1359412
https://bugzilla.redhat.com/show_bug.cgi?id=1359416

If that happens, I hope it may pave the way for getting these added to other distributions. But first, I need a Fedora sponsor to move these forward. I have no idea how long this process may take.

Regards,
Andy

jh

unread,

Aug 12, 2016, 5:17:10 PM8/12/16

to

On Monday, July 11, 2016 at 5:20:22 AM UTC-4, Ed Morton wrote:
> I'd love to be able to parse XML with awk rather than having to learn xmlstarlet
> or similar.
>
> I THINK from reading the documentation
> (https://www.gnu.org/software/gawk/manual/gawk.html#gawkextlib) that to do so
> I'd have to download gawkextlib code, download the Expat XML parser library,
> download GNU Autotools, download the XML parser extension, and then build the
> gawkextlib library (and then use it as gawk -i gawkextlib ...?).
>
> No matter how useful this functionality is, I'm just never going to do all of
> that - am I right in thinking I can't just do:
>
> gawk -i something ...
>
> to use the gawk XML parsing extension, like I can do
>
> gawk -i inplace ...
>
> to use the gawk inplace editing extension? If so, why?
>
> Ed.

Hi Ed,
Depending on the complexity of what you need to do in XML, there are pure AWK solutions still around from the days before the xml extension. They're discussed in chapter 2 of http://gawkextlib.sourceforge.net/xmlgawk.html. HTH

Jim Hart

Ed Morton

unread,

Aug 12, 2016, 11:02:41 PM8/12/16

to

Thanks Jim but I don't want to start learning something that's no longer well
used/supported.

Regards,

Ed.

sean

unread,

Oct 8, 2016, 2:29:14 AM10/8/16

to

On 07/11/2016 02:20 AM, Ed Morton wrote:
> I'd love to be able to parse XML with awk rather than having to learn
> xmlstarlet or similar.
>

Now that why know why certain things are not included with gawk, have
you been able to solve your XML parsing issue?

>
> Ed.

Kenny McCormack

unread,

Oct 8, 2016, 2:44:49 AM10/8/16

to

In article <nta3na$plq$1...@sean.eternal-september.org>,

What does this mean?

--
"You can safely assume that you have created God in your own image when
it turns out that God hates all the same people you do." -- Anne Lamott

Ed Morton

unread,

Oct 8, 2016, 10:00:12 AM10/8/16

to

No, I'm still hacking away at XML with awk as I can never find the time to learn
xmlstarlet and any one file is always easy enough to **seem to** get into a
normalized enough format for me to use awk on it to get what I want out of it.
I'm sure it'll bite me some day... :-( .

Ed.