Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Simulate Delphi's FieldByName function

105 views
Skip to first unread message

Marc de Bourget

unread,
Jun 1, 2016, 3:16:35 PM6/1/16
to
With the Delphi programming language (Object Pascal),
you can use a convenient function named "FieldByName"
to find a field based on its name and use it, e.g.
if (Table1.FieldByName('artist').AsString = 'Madonna') then ...

Database tables can correspond in AWK to files with a header line
and a field separator like tabulator (or "|" in the following example).
So it would be convenient to use the header field names instead of
$1, $2, $3 ... Another advantage is that the code can be used even if
field positions get changed as long as the field names stay the same.

Example file with header line (first line):
title|artist|released
American Life|Madonna|1993
Bad|Michael Jackson|1987

AWK Code to grep for "Madonna" by using the field name "artist":
BEGIN {
FS = OFS = "|"

# Analyze and skip first line of ARGV[1]:
getline line
n = split(line, FIELDS, FS)
for (i=1; i<=n; i++) {
# Store field numbers (Simulate Delphi's FIELDBYNAME function):
FIELDBYNAME[FIELDS[i]] = i
# e.g. FIELDBYNAME["artist"] = "2"
}
delete FIELDS
}

{
if ($(FIELDBYNAME["artist"]) == "Madonna")
print $0
}

It's simple code but I find it very useful and use it daily.
I would be happy if someone else finds this useful, too.

Manuel Collado

unread,
Jun 1, 2016, 4:00:01 PM6/1/16
to
El 01/06/2016 21:16, Marc de Bourget escribió:
>
> Example file with header line (first line):
> title|artist|released
> American Life|Madonna|1993
> Bad|Michael Jackson|1987
>
> AWK Code to grep for "Madonna" by using the field name "artist":
> BEGIN {
> FS = OFS = "|"
>
> # Analyze and skip first line of ARGV[1]:
> getline line
> n = split(line, FIELDS, FS)
> for (i=1; i<=n; i++) {
> # Store field numbers (Simulate Delphi's FIELDBYNAME function):
> FIELDBYNAME[FIELDS[i]] = i
> # e.g. FIELDBYNAME["artist"] = "2"
> }
> delete FIELDS
> }

Your code is unnecessarily complicated:

BEGIN {
FS = OFS = "|"

# Analyze and skip first line of ARGV[1]:
getline
for (i=1; i<=NF; i++) {
# Store field numbers (Simulate Delphi's FIELDBYNAME function):
FIELDBYNAME[$i] = i
# e.g. FIELDBYNAME["artist"] = "2"
}
}

>
> {
> if ($(FIELDBYNAME["artist"]) == "Madonna")
> print $0
> }

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Manuel Collado

unread,
Jun 1, 2016, 4:25:23 PM6/1/16
to
El 01/06/2016 21:55, Manuel Collado escribió:
> El 01/06/2016 21:16, Marc de Bourget escribió:
>>
>> Example file with header line (first line):
>> title|artist|released
>> American Life|Madonna|1993
>> Bad|Michael Jackson|1987
>>
>> AWK Code to grep for "Madonna" by using the field name "artist":
>
> BEGIN {
> FS = OFS = "|"
>
> # Analyze and skip first line of ARGV[1]:
> getline
> for (i=1; i<=NF; i++) {
> # Store field numbers (Simulate Delphi's FIELDBYNAME function):
> FIELDBYNAME[$i] = i
> # e.g. FIELDBYNAME["artist"] = "2"
> }
> }
>>
>> {
>> if ($(FIELDBYNAME["artist"]) == "Madonna")
>> print $0
>> }

If you use the GNU implementation of AWK (gawk) the code can be further
simplified, by taking profit of the SYMTAB feature that allows indirect
access to variables by name:

BEGIN {
FS = OFS = "|"

# Analyze and skip first line of ARGV[1]:
getline
for (i=1; i<=NF; i++) {
# Store field numbers in named variables:
SYMTAB[$i] = i
}
}

{
if ($artist == "Madonna") # <--- !!!
print $0
}

Please note the automatic creation of the "artist" variable.

Marc de Bourget

unread,
Jun 1, 2016, 4:29:14 PM6/1/16
to
Yes Manuel, in this case you are right but now I remember
why I like to use "getline line" with a bit more complicated code:

Most times it is wished that also the header line is printed:
# Analyze, skip and print first line of ARGV[1]:
getline line
print line

So, I have completed the code with this line:
AWK Code to grep for "Madonna" by using the field name "artist":
BEGIN {
FS = OFS = "|"

# Analyze, skip and print first line of ARGV[1]:
getline line
print line
n = split(line, FIELDS, FS)
for (i=1; i<=n; i++) {
# Store field numbers (Simulate Delphi's FIELDBYNAME function):
FIELDBYNAME[FIELDS[i]] = i
# e.g. FIELDBYNAME["artist"] = "2"
}
delete FIELDS
}

Manuel Collado

unread,
Jun 1, 2016, 4:46:44 PM6/1/16
to
El 01/06/2016 22:29, Marc de Bourget escribió:
> Yes Manuel, in this case you are right but now I remember
> why I like to use "getline line" with a bit more complicated code:
>
> Most times it is wished that also the header line is printed:
> # Analyze, skip and print first line of ARGV[1]:
> getline line
> print line

getline is unnecessary at all for this purpose:

FNR==1 { # Analyze, skip and print first line of ARGV[1]:
... (gawk specific code from the previous post) ...
print
next # <--- skip further processing of the first line
}

$artist=="Madonna" # <--- implicit print $0

Marc de Bourget

unread,
Jun 1, 2016, 4:52:48 PM6/1/16
to
As very often, it seems there are different solutions for a task.
Personally I prefer a solution which works with GAWK, TAWK, MAWK.

Manuel Collado

unread,
Jun 1, 2016, 5:15:51 PM6/1/16
to
El 01/06/2016 22:52, Marc de Bourget escribió:
>
> As very often, it seems there are different solutions for a task.
> Personally I prefer a solution which works with GAWK, TAWK, MAWK.

Maybe this?

BEGIN {
FS = OFS = "|"
}

FNR==1 { # Analyze and skip first line of ARGV[1]:
for (i=1; i<=NF; i++) {
# Store field numbers (Simulate Delphi's FIELDBYNAME function):
FIELDBYNAME[$i] = i
# e.g. FIELDBYNAME["artist"] = "2"
}
print
next
}

$(FIELDBYNAME["artist"]) == "Madonna"

Marc de Bourget

unread,
Jun 2, 2016, 3:32:07 AM6/2/16
to
There aren't any speed issues with my code, so it doesn't matter if my code is one or two lines longer than other solutions.
So, I prefer my solution as the most general one. It's nicer to do the analysis in the BEGIN section than using the first input line of the main section with "next". Using "next" should be avoided whenever possible. Further, storing the header line and then using it whenever needed is best. The header line may be needed later in the code several times for further queries.

Aharon Robbins

unread,
Jun 2, 2016, 3:46:04 AM6/2/16
to
In article <ningbf$pio$1...@gioia.aioe.org>,
Manuel Collado <m.co...@domain.invalid> wrote:
>If you use the GNU implementation of AWK (gawk) the code can be further
>simplified, by taking profit of the SYMTAB feature that allows indirect
>access to variables by name:
>
>BEGIN {
> FS = OFS = "|"
>
> # Analyze and skip first line of ARGV[1]:
> getline
> for (i=1; i<=NF; i++) {
> # Store field numbers in named variables:
> SYMTAB[$i] = i
> }
>}
>
>{
> if ($artist == "Madonna") # <--- !!!
> print $0
>}
>
>Please note the automatic creation of the "artist" variable.

It doesn't work that way. The artist variable is created and placed
into the symbol table when the program is parsed, before it ever
starts to run.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com

Marc de Bourget

unread,
Jun 2, 2016, 4:13:24 AM6/2/16
to
Le jeudi 2 juin 2016 09:46:04 UTC+2, Aharon Robbins a écrit :
> Manuel Collado wrote:
> >If you use the GNU implementation of AWK (gawk) the code can be further
> >simplified, by taking profit of the SYMTAB feature that allows indirect
> >access to variables by name:
> >
> >BEGIN {
> > FS = OFS = "|"
> >
> > # Analyze and skip first line of ARGV[1]:
> > getline
> > for (i=1; i<=NF; i++) {
> > # Store field numbers in named variables:
> > SYMTAB[$i] = i
> > }
> >}
> >
> >{
> > if ($artist == "Madonna") # <--- !!!
> > print $0
> >}
> >
> >Please note the automatic creation of the "artist" variable.
>
> It doesn't work that way. The artist variable is created and placed
> into the symbol table when the program is parsed, before it ever
> starts to run.
> --
> Aharon (Arnold) Robbins arnold AT skeeve DOT com

Just a hint: With the test file the result of Manuel's code is correct:

Manuel Collado

unread,
Jun 2, 2016, 5:57:22 AM6/2/16
to
Ooooops! You are right. I realized I was wrong after posting.

But in this example SYMTAB is also used for setting variables not used
in the code. I.e., the names of the headers of the other columns. Example:

SYMTAB["title"] = something

Does it creates a 'title' variable if 'title' is not named at all in the
code?

Thanks.

Ed Morton

unread,
Jun 2, 2016, 8:48:55 AM6/2/16
to
You should use a "FNR==1{...}" solution over a "BEGIN{...}" solution because the
latter can't be trivially extended to multiple files that each have their own
headers. See for example http://stackoverflow.com/q/37529232/1745001 where the 2
files have some common header fields but in different orders. Also by not
testing it's result you are using getline incorrectly in your script, see
http://awk.freeshell.org/AllAboutGetline, and so the code actually needed to do
the job robustly with "BEGIN{...}" becomes even lengthier and more complicated
and it's all just completely unnecessary. We're talking about having to in
general code something like this:

BEGIN {
for (i=1; i<ARGC; i++) {
fileOrArg = ARGC[ARGC]
if ( (fileOrArg !~ /=/) || (fileOrgArg ~ /\.\//) ) {
if ( (getline line < fileOrArg) > 0 ) {
split(line,fields)
for (j im fields) {
f[++numFiles,fields[j]] = j
}
}
}
}
}
FNR==1 { fileNr++ }
{ print $(f[fileNr,"name"]) }

vs only this:

FNR==1 { for (i=1;i<=NF;i++) f[$i]=i }
{ print $(f["name"]) }

"next" is a crucial/integral part of awks idiomatic "<condition1> { <action1>;
next } <condition2> { <action2> }" syntax and so it should be embraced, not be
avoided.

Ed.

Aharon Robbins

unread,
Jun 2, 2016, 1:40:54 PM6/2/16
to
In article <57500208...@domain.invalid>,
Manuel Collado <m.co...@domain.invalid> wrote:
>>> Please note the automatic creation of the "artist" variable.
>>
>> It doesn't work that way. The artist variable is created and placed
>> into the symbol table when the program is parsed, before it ever
>> starts to run.
>
>Ooooops! You are right. I realized I was wrong after posting.
>
>But in this example SYMTAB is also used for setting variables not used
>in the code. I.e., the names of the headers of the other columns. Example:
>
> SYMTAB["title"] = something
>
>Does it creates a 'title' variable if 'title' is not named at all in the
>code?

No. SYMTAB is also a regular associative array, so it just indexes it.

This is documented in the manual.

Thanks,

Arnold

Marc de Bourget

unread,
Jun 2, 2016, 4:12:16 PM6/2/16
to
OK. Thank you all Manuel, Ed and Arnold.
My code has worked for me without any issues for many years
but of course your hints are very good and I appreciate it!

I have read somewhere that using "next" and "continue"
is no good programming style (some people even say that
using "break" isn't good programming style).
However, if correctly used I'm sure it isn't a problem.
I had the need to use "next" only in very few scripts.

BTW, I remember some more hints about "getline" on
http://awk.info/, but this site doesn't work at the moment.

Marc de Bourget

unread,
Jun 2, 2016, 4:43:52 PM6/2/16
to
Le jeudi 2 juin 2016 19:40:54 UTC+2, Aharon Robbins a écrit :
I'm still wondering a bit why Manuel's code works correctly for me:
There must be something special with SYMTAB here.
Maybe this is due to the fact that it is evaluated in the BEGIN section?

BEGIN {
FS = OFS = "|"

# Analyze, skip and print first line of ARGV[1]:
if (getline > 0) {
line = $0
print line
for (i=1; i<=NF; i++) {
# Store field numbers in named variables:
SYMTAB[$i] = i
}
}
}

{
if ($title == "Bad") # <--- !!!
print $0
}

Input file:
title|artist|released
American Life|Madonna|1993
Bad|Michael Jackson|1987

Correct Result output:
title|artist|released
Bad|Michael Jackson|1987

Hermann Peifer

unread,
Jun 4, 2016, 4:11:21 AM6/4/16
to
The code does indeed work correctly. There is however no automatic
creation of any variable in the BEGIN rule, via: SYMTAB[$i] = i.
Variable names like "artist" or "title" are already in the symbol table,
before program execution starts. Only their values are modified in your
BEGIN rule.

Quotes from the GAWK manual:
SYMTAB (is) an array whose indices are the names of all defined global
variables and arrays in the program. It is built as gawk parses the
program and is complete before the program starts to run. The array may
be used for indirect access to read or write the value of a variable.

http://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html

Hermann


Marc de Bourget

unread,
Jun 4, 2016, 5:08:36 AM6/4/16
to
Thank you very much, Hermann, for your good explanation. I have understood now.
With the new example, the title variable is added to the SYMTAB array as index.
if ($title == "Bad") => "title" is added to SYMTAB, not "artist" or "released".
Thank you again. It is clear for me now. I have added some SYMTAB debug output:

BEGIN {
FS = OFS = "|"

print "Begin SYMTAB indices:"
for (i in SYMTAB) print i
print "End SYMTAB indices.\n"

Kenny McCormack

unread,
Jun 4, 2016, 5:38:13 AM6/4/16
to
In article <niu2f7$ghp$1...@news.albasani.net>,
Hermann Peifer <pei...@gmx.eu> wrote:
...
>The code does indeed work correctly. There is however no automatic
>creation of any variable in the BEGIN rule, via: SYMTAB[$i] = i.
>Variable names like "artist" or "title" are already in the symbol table,
>before program execution starts. Only their values are modified in your
>BEGIN rule.

I think the key thing to understand is that it's not possible to prove
(i.e., demonstrate) that adding an entry to SYMTAB doesn't create new
variables, because the only way to test it would be to code an examination
of that variable and doing that would invalidate what you're trying to prove.

This is not to say that it (by "it", I mean the assertion that adding
entries to SYMTAB doesn't create variables) isn't true. Given that the
documentation says it is true and, moreover, Arnold Robbins says it is
true, it is almost certainly true. I'm also sure that inspection of the
GAWK "C" source code would bring this out.

But the point is that it isn't possible to show, one way or the other, the
truth of this using only the GAWK programming language.

--
The randomly generated signature file that would have appeared here is more than 4
lines in length. As such, it violates one or more Usenet RFPs. In order to remain in
compliance with said RFPs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/EternalFlame

Hermann Peifer

unread,
Jun 4, 2016, 6:40:31 AM6/4/16
to
On 2016-06-04 11:38, Kenny McCormack wrote:
>
> I think the key thing to understand is that it's not possible to prove
> (i.e., demonstrate) that adding an entry to SYMTAB doesn't create new
> variables, because the only way to test it would be to code an examination
> of that variable and doing that would invalidate what you're trying to prove.
>

Usually, one can prove the inexistence of a symbol by using the gawk
debugger. Below, I tried to watch "x", before and after adding it to
SYMTAB. I expected to get this message in both cases: "no symbol `x' in
current context". However, I ended up with a fatal error in the 2nd case
(already reported to the bug-gawk mailing list).

Hermann


$ cat test.awk
BEGIN { SYMTAB["x"] ; y=1 ; y++ }
$
$ awk -D -f test.awk
gawk> watch x # try to watch an inexistent variable
no symbol `x' in current context # as expected
gawk>
gawk> watch y
Watchpoint 1: y
gawk>
gawk> run
Starting program:
Stopping in BEGIN ...
Watchpoint 1: y
Old value: untyped variable
New value: 1
main() at `test.awk':1
1 BEGIN { SYMTAB["x"] ; y=1 ; y++ }
gawk>
gawk> watch x # try again to watch x, after adding it to SYMTAB
Watchpoint 2: (null) # this looks strange
gawk>
gawk> continue
awk: test.awk:1: fatal: internal error line 1711, file: debug.c
Program exited abnormally with exit

Kenny McCormack

unread,
Jun 4, 2016, 6:24:20 PM6/4/16
to
In article <niub6t$1t9$1...@news.albasani.net>,
Hermann Peifer <pei...@gmx.eu> wrote:
>On 2016-06-04 11:38, Kenny McCormack wrote:
>>
>> I think the key thing to understand is that it's not possible to prove
>> (i.e., demonstrate) that adding an entry to SYMTAB doesn't create new
>> variables, because the only way to test it would be to code an examination
>> of that variable and doing that would invalidate what you're trying to
>> prove.

>Usually, one can prove the inexistence of a symbol by using the gawk
>debugger. Below, I tried to watch "x", before and after adding it to
>SYMTAB. I expected to get this message in both cases: "no symbol `x' in
>current context". However, I ended up with a fatal error in the 2nd case
>(already reported to the bug-gawk mailing list).

I would argue that using the debugger is beyond the scope of "using only
the GAWK programming language". It's on the order of peeking at the "C"
source code.

That said, this looks more like a bug (or bugs) in the "debugger" rather
than proof about the existence or non-existence of the variable. One could
well argue that the variable *does* exist if it has a SYMTAB entry. In
that vein, then it would be a bug in the debugger that it cannot locate
said variable...

--
The scent of awk programmers is a lot more attractive to women than
the scent of perl programmers.

(Mike Brennan, quoted in the "GAWK" manual)

Andrew Schorr

unread,
Jun 6, 2016, 8:10:15 PM6/6/16
to
On Wednesday, June 1, 2016 at 5:15:51 PM UTC-4, Manuel Collado wrote:
> $(FIELDBYNAME["artist"]) == "Madonna"

One minor note: the parentheses around "FIELDBYNAME" are not necessary.
I usually name my array "m" (as in mapping), and so you can simply
say $m["artist"] which is a bit easier to type and read. But yes, it would be faster to use the SYMTAB hack to avoid the array indirection.

Regards,
Andy

Kaz Kylheku

unread,
Jun 6, 2016, 9:38:46 PM6/6/16
to
# forest.awk

BEGIN {
SYMTAB["tree_status"] = "fallen"
}

But, alas, nobody was there to hear it, because a variable called
tree_status is not referenced in the code.

At first glance, it seems not wrong to do the assignment *in case* such
a variable use appears. Today, there might be no test for the field
$madonna, but it could be added tomorrow, and so the line SYMTAB[$i]=i
will update the madonna variable with the field number.

The one thing that is naive about this SYMTAB approach is that
when you do

SYMTAB[$i] = i

in a loop over the fields of the first record, you're allowing whoever
controls the contents of that record to trash arbitrary variables
in your code. (Luckily, just with an integer
value, and not a datum of the attacker's choice).

It really looks like a bad idea to pull strings from the input data,
look for variables which match those strings, and clobber those
variables.

If you do this, it's probably a good idea to namespace the
variables with a prefix. A short one might do, like f_ for
field:

# catenation of "f_" "foo", indexes into SYMTAB[]:
#
$ ./gawk 'BEGIN { SYMTAB["f_" "foo"] = 3; print fld_foo; }'
3

So this would be used like

SYMTAB["f_" $i] = i;

# ...

if $f_year > 2007 ... # not $year!

0 new messages