Parsing through all our emails, we seem to be discussing the following
options. In what follows, I am for the things marked (*) and Tim is for
the (+) items. We'll defend those positions, if necessary, in a week or two.
But, before debating the options, we'd like to know what all the options
are. So we post this list, not to argue any particular point, but to
canvas options from the community.
We'd like back, at least for now, not arguments for options, but the
options themselves.
the plan is this:
- post this newsgroup
- watch it to see what comments come back
- post a revised version in a week
- start the real debate then
over to you all!
Jim & Tim
--------
0) Repository name?
0a+) planetawk
0b) libawk
0c) address of the repository must be in the domain name that
corresponds to the chosen name
0d) address of the repository may be <repository name>.<name of hosting
service> or <hosting service full domain name>/<repository name>
1) goal
1a*+) an open source project with methods for promoting stable AWK code
to some special place,
1b*) and providing the kind of search and install features found in CPAN
(for Perl) and PEAR (for PHP).
--------
2) type of extensions
2a+*) scripting based, using the current version of gawk
2b*) scripting based, using any version of AWK
2c*) compiler-based requiring a new gawk executable (e.g. xgawk)
2d*) based on other languages (e.g. runawk is a C program)
--------
3) formatting standards
3a+) one liners, if short. e.g.
function some() { return 1/(rand() * Inf) }.
--------
4) dependency mechanism
4a) m4 and needz . e.g.
http://code.google.com/p/crusty/source/browse/trunk/needz-apps which
shows tim's last attempt (2 years ago) to define a library system in gawk.
- there is a directory called needz-app/fred with files
- there are files
needz-app/fred/about.awk # command line args and help text
needz-app/fred/fred # main file. compilation starts here
needz-app/fred/XXX # support file
needz-app/fred/YYY # support file
- there is a directory called needz-app/fred/eg for unit tests
needz-app/fred/eg/1 # unit test 1
needz-app/fred/eg/1.want # expected output for unit test 1
4b) Debian package system
4c) CPAN or PEAR software, if it's available
--------
5) hosting service
5b+) code.google.com <http://code.google.com>
5b) sites.google
5c) github
5d) launchpad
5e) sourceforge
5e) savannah
5f) custom
--------
6) repository
6a+) subversion
6b) git
6c) bazaar
6d) cvs
6e) (s)ftp
6f) Web pages with upload
6g*) CPAN, PEAR or similar
--------
7) cross-indexing method to describe library (i.e. a way to describe
functions and collections of functions)
7a+) use scripts to auto-generate wiki pages in code.google.com
<http://code.google.com>
7b) documentation that's required for acceptance into the repository
7c) structured comments in the code ala JavaDoc
-------
8) coding standards
8a*) functions don't change globals directly
8b+) reduce use of /pattern/ {action} in favor of while(getline) loops
inside functions
8d+) use a[0] to store size of array
8e+) require all "local" (see 8g) variable names be lower case (use '_'
to separate words?)
8f*) allow naming variables with some version of Hungarian notation
8g*) require all function variables to be "localized" by including them
in the function parameter list
-------
9) define some standard macros (using m4)
9a+) tim's iteration trick
function fred(a,thing, _2){
foreach2(thing,a) {
do something with thing }
}
expands to
function fred(a,thing, max1,i,max2,j){
max1=a[0]; for(i=1;i<=max1;i++){ max2=a[i,0]; for(j=1;j<=max2;j++)
{thing=a[i,j];
do something with thing }
}
10) Label for a collection of functions
10a*) package
10b*) module
10c+) gem
11) Extension and preprocessor standardization
11a) Standardize syntax for all extensions that change or add to the
(G)AWK standards; compliant changes made to all extension
programs/projects that provide the same feature(s)
11b) All preprocessor programs provide an option to turn them into
stdin-->stdout filters
11c*) With 11b, overlapping features in extension programs/projects get
"refactored" into separate programs or projects.
For example, only igawk pulls in external code files, then its
preprocessed code is, optionally, written to stdout for further
processing by other programs.
12) Function standardization
12a) Multiple functions that do the same thing get merged through some
agreed upon process.
Another possibility, especially if the library is not too large, is
just to create a batteries included distro that bundles them all
into the executable.
Really cool! I have a few things that I've made available which I
would be happy to forward on for inclusion, as well as about 80
gazillion things just sitting in my inbox.
>Parsing through all our emails, we seem to be discussing the following
>options. In what follows, I am for the things marked (*) and Tim is for
>the (+) items. We'll defend those positions, if necessary, in a week or two.
>
>But, before debating the options, we'd like to know what all the options
>are. So we post this list, not to argue any particular point, but to
>canvas options from the community.
>
>We'd like back, at least for now, not arguments for options, but the
>options themselves.
>
>the plan is this:
>
>- post this newsgroup
>- watch it to see what comments come back
>- post a revised version in a week
>- start the real debate then
>
>over to you all!
>
>Jim & Tim
Very nice. Some comments below.
>0) Repository name?
>
>0a+) planetawk
>0b) libawk
>0c) address of the repository must be in the domain name that
>corresponds to the chosen name
>0d) address of the repository may be <repository name>.<name of hosting
>service> or <hosting service full domain name>/<repository name>
You may even want subdomains, such as
csv-parsing.repo-name.whatever.top-level
xml-parsing.repo-name.whatever.top-level
...
>1) goal
>
>1a*+) an open source project with methods for promoting stable AWK code
>to some special place,
>
>1b*) and providing the kind of search and install features found in CPAN
>(for Perl) and PEAR (for PHP).
Both are laudable and orthogonal.
>--------
>2) type of extensions
>
>2a+*) scripting based, using the current version of gawk
>
>2b*) scripting based, using any version of AWK
Both are fine, just keep them in separate areas, or clearly mark
each item as to whether it is POSIX compliant or requires gawk
or another version.
>3) formatting standards
I'm not sure you should mandate formatting standards; people have
their own styles. Or you can settle on the pretty-printed style
of gawk --profile, except that that loses comments and merges
BEGIN / END blocks.
>5) hosting service
>
>5b+) code.google.com <http://code.google.com>
>5b) sites.google
>5c) github
>5d) launchpad
>5e) sourceforge
>5e) savannah
>5f) custom
awk.info...
>6) repository
>
>6a+) subversion
>6b) git
>6c) bazaar
>6d) cvs
>6e) (s)ftp
>6f) Web pages with upload
>6g*) CPAN, PEAR or similar
This needs to be very mainstream; particularly bear in mind that Windows
users need a way to get to things. A browsable web repository is the
most generic and easy to use. Internally, you can use whatever you
want and export it to the web repository.
>8) coding standards
>
>8a*) functions don't change globals directly
>
>8b+) reduce use of /pattern/ {action} in favor of while(getline) loops
>inside functions
>
>8d+) use a[0] to store size of array
Gawk & Bell Labs awk support length(array), FWIW.
>8e+) require all "local" (see 8g) variable names be lower case (use '_'
>to separate words?)
>
>8f*) allow naming variables with some version of Hungarian notation
>
>8g*) require all function variables to be "localized" by including them
>in the function parameter list
You should have a separate section for uploads that don't follow your
coding standards, in order to be as inclusive as possible.
>9) define some standard macros (using m4)
I will reserve comments on this until the discussion period.
>10) Label for a collection of functions
>
>10a*) package
>10b*) module
>10c+) gem
"Collection". :-)
This is a great initiative, I really hope it bears fruit!
Thanks,
Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
CAWKAN, for analogy with CPAN (Perl), CTAN (Tex).
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
It's sometimes useful/necessary for functions to change globals
directly. A better coding standard to adopt would be one that
identifies globals so when we see a variable being modified in a
function and that variable isn't included in the function argument
list, we can tell if it's deliberate or a bug. For example:
foo( i) {
for (i=1;i<=10;i++) {
Things[i] = "x"
stuff[i] = "y"
}
}
If we adopt the convention that all globals start with an upper case
letter (there may be better alternatives), we can see immediately that
"Things" is global so it's use above is OK but "stuff" was intended to
be local so it should've been listed as a function argument.
> 8b+) reduce use of /pattern/ {action} in favor of while(getline) loops
> inside functions
What does that mean?
> 8d+) use a[0] to store size of array
I've never needed that. I suppose it's a reasonable idea if required
(e.g. in an awk where length(array) doesn't return the size of the
array), but there could be other things you want to store in array[0]
too, e.g. an original string that you split() into array[] (analogous
with $0 in the "array" of fields, "$"). I wouldn't make this part of
coding standards.
> 8e+) require all "local" (see 8g) variable names be lower case (use '_'
> to separate words?)
I'd recommend you just have them start with lower case. I much prefer
"namesArray" over "names_array" and I won't be changing unless there's
a very good reason.
So, I'm recommending that globals start with a capital letter, locals
with lower case.
> 8f*) allow naming variables with some version of Hungarian notation
Fine, just don't require it. I wouldn't even mention that in coding
standards.
> 8g*) require all function variables to be "localized" by including them
> in the function parameter list
Right and, as typically done, put some white space before them to
separate them from the real function arguments.
You might want to add some things like:
9) Do not use all upper case for user-defined variable names as that's
reserved for builtin variables.
10) Always use parantheses when they're optional (e.g. loop bodies).
11) Do not use one-character variable names as they're hard to find
when searching the code later (e.g. use "idx" instead of "i" for a
loop index)
12) Always give printf at least 2 arguments, the format plus at least
1 data item (e.g. use printf "%s",$1 instead of printf $1, and use
printf "%s","foo" instead of printf "foo".)
13) If using getline, always use one of these forms (see http://tinyurl.com/yn9ka9):
if/while ( (getline var < file) > 0)
if/while ( (command | getline var) > 0)
if/while ( (command |& getline var) > 0)
Regards,
Ed.
agreed- should not part of a standard.
but, fyi, the fix proposed by arnold won't work. he suggested using
a[length(a)+1]=x for a "push" operation but length(a) assumes a is a
string for uninitialized "a". i tried the obvious fix but it did not
work. in the following code push1 crashes for uninitialized "a" but
not push2. so i would still argue for using a[0] to store the length
function empty(a, i) { for(i in a) return 0; return 1}
function push1(a,x) {
if (empty(a)) {print 1; split("",a,"")};
a[length(a)+1]=x
}
function push2(a,x) { a[++a[0]]=x }
>
>> 8e+) require all "local" (see 8g) variable names be lower case (use '_'
>> to separate words?)
>
> I'd recommend you just have them start with lower case. I much prefer
> "namesArray" over "names_array" and I won't be changing unless there's
> a very good reason.
>
> So, I'm recommending that globals start with a capital letter, locals
> with lower case.
i concur
> 12) Always give printf at least 2 arguments, the format plus at least
> 1 data item (e.g. use printf "%s",$1 instead of printf $1, and use
> printf "%s","foo" instead of printf "foo".)
what is the reason for this?
> 13) If using getline, always use one of these forms (see http://tinyurl.com/yn9ka9):
> if/while ( (getline var < file) > 0)
> if/while ( (command | getline var) > 0)
> if/while ( (command |& getline var) > 0)
er... so you are saying always check for errors?
> Regards,
>
> Ed.
great comments. thanks a lot
timm
--
Posted Via Newsfeeds.com Premium Usenet Newsgroup Service
----------------------------------------------------------
http://www.Newsfeeds.com
We frequently see people use
printf $1
when they want to print the first field with no trailing newline, then
they get surprised by the result when their input data contains a
character that has meaning in a printf format string:
$ echo "abc" | awk '{printf $1}'
abc$
$ echo "ab%c" | awk '{printf $1}'
awk: (FILENAME=- FNR=1) fatal: not enough arguments to satisfy format
string
`ab%c'
^ ran out for this one
$
So, you must always use 2 arguments to printf when you want to print
input data, so it's a good habit to get into even if you're printing
the values of variables or literal strings which can still bite you
unexpectedly on a future modification.
As with most coding standards, the above doesn't matter much for a one-
line throw-away awk script, but when you're storing it in a
repository....
>
> > 13) If using getline, always use one of these forms (seehttp://tinyurl.com/yn9ka9):
> > if/while ( (getline var < file) > 0)
> > if/while ( (command | getline var) > 0)
> > if/while ( (command |& getline var) > 0)
>
> er... so you are saying always check for errors?
I'm saying check for errors using the right, unambiguous syntax, and
use the form of getline that doesn't change any builtin variables.
Ed.
Me, or someone else? I don't remember this.
>i tried the obvious fix but it did not
>work. in the following code push1 crashes for uninitialized "a" but
>not push2. so i would still argue for using a[0] to store the length
>
>function empty(a, i) { for(i in a) return 0; return 1}
function empty(a, i) { return (i in a) }
>function push1(a,x) {
> if (empty(a)) {print 1; split("",a,"")};
> a[length(a)+1]=x
>}
length(a) where a is an array is not portable.
>> So, I'm recommending that globals start with a capital letter, locals
>> with lower case.
>
>i concur
This is a very good convention.
>> 13) If using getline, always use one of these forms (see
>http://tinyurl.com/yn9ka9):
>> if/while ( (getline var < file) > 0)
>> if/while ( (command | getline var) > 0)
>> if/while ( (command |& getline var) > 0)
>
>er... so you are saying always check for errors?
More than that:
if (getline var)
where getline returns -1 acts as a "true" value. Not what
you typically want. :-)
Grr. It's too late at night:
function empty(a, i) { return !(i in a) }
Hmmm. Now that I think about it, the original is
probably best. Never mind.
beg pardon? your blessing is on...
function empty(a, i) { for(i in a) return 0; return 1}
why? what is wrong with
function empty(a, i) { return !(i in a) }
timm
Test results:
$ gawk 'BEGIN {print (i in a)}'
0
$ gawk 'BEGIN {a[1] = "";print (i in a)}'
0
$ gawk 'BEGIN {a[1] = 0;print (i in a)}'
0
$ gawk 'BEGIN {a[1] = "1";print (i in a)}'
0
$ gawk 'BEGIN {for(i in a) print "not empty"; print "done"}'
done
$ gawk 'BEGIN {a[1] = "";for(i in a) print "not empty"; print "done"}'
not empty
done
$ gawk 'BEGIN {a[1] = 0;for(i in a) print "not empty"; print "done"}'
not empty
done
I love idea in general but I have one question.
AWK modules are not isolated. One module depends on other etc.
RUNAWK uses #use directve for this which works recursively and
I think it is better than @include and some others for a number of
reasons.
Until #use/@include incompatibility is not resolved AWK modules library
project is not possible. I think this is a question #1.
--
Best regards, Aleksey Cheusov.
This is the only way to create a robust software :-)
See runawk's xgetline.awk and other x<function>.awk modules.
Simple example below. Checks are made automatically.
This is good default for most small scripts.
0 ~>cat ~/tmp/2.awk
#!/usr/bin/env runawk
#use "xgetline.awk"
BEGIN {
while (xgetline0(ARGV [1])){
print $0
}
}
0 ~>yes | head -5 | ~/tmp/2.awk -
y
y
y
y
y
141 0 0 ~>~/tmp/2.awk /do/not/exis
error: assertion failed: getline < /do/not/exis failed
ARGV[0]=2.awk
$0=``
NF=0
FNR=0
FILENAME=
1 ~>
In article <slrngmdc1f...@vermouth.dreamhost.com>,
Tim Menzies <t...@menzies.us> wrote:
>On 2009-01-08, Aharon Robbins <arn...@skeeve.com> wrote:
>> In article <gk5khl$hu4$1...@localhost.localdomain>,
>> Aharon Robbins <arn...@skeeve.com> wrote:
>>>
>>>function empty(a, i) { return (i in a) }
>>
>> Grr. It's too late at night:
>>
>> Hmmm. Now that I think about it, the original is
>> probably best. Never mind.
>
>beg pardon? your blessing is on...
>
> function empty(a, i) { for(i in a) return 0; return 1}
Yes.
>why? what is wrong with
>
> function empty(a, i) { return !(i in a) }
This can be incorrect if someone did
a[""] = 1 # or any value
The null string is a valid (albeit unusual) index, and there's no
guarantee that the some fool won't call
x = empty(a, "foo")
just to be ornery. However,
function empty(a, i) { for(i in a) return 0; return 1}
will work, since if there's nothing at all in a, it'll return 1,
which is what's desired. It's an unusual use of the for, but
I rather like it overall. :-)
Arnold