Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

gawk LMDB (Lightning Memory-mapped Database) extension library

224 views
Skip to first unread message

Andrew Schorr

unread,
May 12, 2016, 2:07:58 PM5/12/16
to
I needed to solve a problem that involved creating a huge associative array in gawk with over 100 million entries. While the size of the actual data was around 7 GB, it turns out that the memory overhead of gawk's array implementation is quite large, I think on the order of 250 bytes per record. This turned out to be too large to fit into memory, so I needed access to an extension library that implements a key-value store. I tested a few and found LMDB to be really fast. You can learn more about LMDB here:
https://symas.com/products/lightning-memory-mapped-database/
I have implemented an extension library that very closely mirrors the C API
documented on the website here:
http://lmdb.tech/doc/
You can grab a copy of the extension from the gawkextlib project on sourceforge:
https://sourceforge.net/projects/gawkextlib/files/
I have not had a chance to test each and every function, so please let me know if you find any problems. Comments, suggestions, and criticism are welcome.

Regards,
Andy

Marc de Bourget

unread,
May 12, 2016, 5:47:26 PM5/12/16
to
Hi Andy, I'm very interested because I have to deal with several million records. It sounds really great but is very hard to understand for me how to build a MS Windows binaries version with my Bloodshed Orwell Dev-C++ Compiler. I really do have to learn C more intensively to understand all these topics better although learning C is not always fun.



Andrew Schorr

unread,
May 12, 2016, 11:43:23 PM5/12/16
to
Hi Marc,

> Hi Andy, I'm very interested because I have to deal with several million records. It sounds really great but is very hard to understand for me how to build a MS Windows binaries version with my Bloodshed Orwell Dev-C++ Compiler. I really do have to learn C more intensively to understand all these topics better although learning C is not always fun.

It should not be necessary to learn C to use this. Do you use Cygwin? The gawkextlib libraries usually install easily under Cygwin, and I guess that the LMDB library may also. Actually, I think if you install the openldap Cygwin packages, that may include the lmdb library.

If you don't use Cygwin, then it is more challenging. I'm not a Windows expert, so it would be tough for me to help you with this.

Regards,
Andy

Kenny McCormack

unread,
May 13, 2016, 10:15:51 AM5/13/16
to
In article <c587577e-6f8f-4d87...@googlegroups.com>,
--- Cut Here ---
checking for gawkextlib.h... no
configure: error: Cannot find gawkextlib.h. Please use --with-gawkextlib
to supply a location for your gawkextlib build.
--- Cut Here ---

I don't approve of the way everything (that you are working on) seems to be
hung onto the main "gawkextlib" project. I.e., you can't just try out the
lmdb extension (which I am interested in) w/o buying into the whole
"gawkextlib" project (which I am not interested in).

I made a good faith effort to get this (and by "this", I mean the lmdb
extension) working, and failed. By my usual rules of life, I assume that
if I failed, most people will not even come anywhere near that close.

Seriously, I'd like to know if there any path to getting this to work (that
doesn't involve what I have alluded to above as being a non-starter for me).

--
The problem in US politics today is that it is no longer a Right/Left
thing, or a Conservative/Liberal thing, or even a Republican/Democrat
thing, but rather an Insane/not-Insane thing.

(And no, there's no way you can spin this into any confusion about
who's who...)

Manuel Collado

unread,
May 13, 2016, 4:34:45 PM5/13/16
to
El 13/05/2016 16:15, Kenny McCormack escribió:
>
> --- Cut Here ---
> checking for gawkextlib.h... no
> configure: error: Cannot find gawkextlib.h. Please use --with-gawkextlib
> to supply a location for your gawkextlib build.
> --- Cut Here ---

Please follow the instructions in the README file:

"Please download gawkextlib plus one or more individual extensions.
You should build and install gawkextlib first. After you untar each
package, please cd into its directory and build as follows:

./configure && make && make check && make install && echo Success."

>
> I don't approve of the way everything (that you are working on) seems to be
> hung onto the main "gawkextlib" project. I.e., you can't just try out the
> lmdb extension (which I am interested in) w/o buying into the whole
> "gawkextlib" project (which I am not interested in).

There are some pieces of code that must appear in almost every
extension. So these parts are put just once in the "lib" directory, and
named "gawkextlib" in the documentation.

>
> I made a good faith effort to get this (and by "this", I mean the lmdb
> extension) working, and failed. By my usual rules of life, I assume that
> if I failed, most people will not even come anywhere near that close.

This is a reasonable objection. Probably the best solution would be to
distribute binaries. But this requires a sensible amount of work.

Maybe we could develop an installer script that should be just executed
to have the desired extension in place, ready to be used.

>
> Seriously, I'd like to know if there any path to getting this to work (that
> doesn't involve what I have alluded to above as being a non-starter for me).
>

Just read and follow the given instructions?

Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Andrew Schorr

unread,
May 13, 2016, 8:02:48 PM5/13/16
to
Hi,

On Friday, May 13, 2016 at 10:15:51 AM UTC-4, Kenny McCormack wrote:
> I don't approve of the way everything (that you are working on) seems to be
> hung onto the main "gawkextlib" project. I.e., you can't just try out the
> lmdb extension (which I am interested in) w/o buying into the whole
> "gawkextlib" project (which I am not interested in).

Actually, "gawkextlib" is not really a project; it's just a library that supplies support routines that are commonly needed by other extension libraries. You don't have to buy in. You just need to install gawkextlib and then the extension that you want. Manuel's response is right on target. The project README should explain exactly what to do. If the README is not clear, I'd appreciate feedback on how to improve it.

The only alternative is to include the library code in each and every extension. That doesn't feel like the right solution to me, but maybe I'm deluded.

I should note that these extensions also won't work if you don't install gawk first. Is it too much to ask to install gawk plus gawkextlib plus the extensions that you want? Once you install gawk and gawkextlib, you never have to think about it again.

Regards,
Andy

Marc de Bourget

unread,
May 14, 2016, 6:29:41 AM5/14/16
to
Hi Manuel and Andy,

yes, best would be to create ready to use Windows and Linux binaries.
I'm not even able to compile GAWK for Windows with my MinGW compiler.

Marc de Bourget

Andrew Schorr

unread,
May 14, 2016, 12:16:15 PM5/14/16
to
Hi Marc,

On Saturday, May 14, 2016 at 6:29:41 AM UTC-4, Marc de Bourget wrote:
> yes, best would be to create ready to use Windows and Linux binaries.
> I'm not even able to compile GAWK for Windows with my MinGW compiler.

Is there a reason that you cannot use Cygwin? With cygwin, it should be very easy to build everything. The instructions in the README should work.

If Cygwin is not an option, the gawk documentation explains how to build gawk under MinGW here:

https://www.gnu.org/software/gawk/manual/html_node/PC-Compiling.html

I have never tried this myself, but I know that PC compilation is actively maintained. If this does not work, please submit a bug report to the bug-gawk mailing list.

I believe that it could be possible to build gawkextlib using MinGW, but I have no experience with this. Cygwin is the easy way.

Regards,
Andy

Andrew Schorr

unread,
May 14, 2016, 12:19:18 PM5/14/16
to
On Saturday, May 14, 2016 at 12:16:15 PM UTC-4, Andrew Schorr wrote:
> I believe that it could be possible to build gawkextlib using MinGW, but I have no experience with this. Cygwin is the easy way.

Please look here for some guidance about how to build under MinGW:
http://wim-blit.nl/?s=gawkextlib
Please report back if you are able to get it to work.

Good luck,
Andy

Kenny McCormack

unread,
May 14, 2016, 2:40:11 PM5/14/16
to
In article <b5042741-0c3a-4626...@googlegroups.com>,
Just to be clear, I'm not asking for the devs/maintainers to
produce/maintain/provide binaries. I know all too well that down that road
lies chaos and despair. Among other things, it is just not the open source
way.

In this thread, I'm not really asking for anything - in the way of "recipes
for success". I'm asking for things to be other than they are - and am
well aware that that is always a quixotic quest. My reasons for not liking
the "gawkextlib" philosophy and approach have all been stated in the past
(good Googlers will be able to find that old set of threads from this
newsgroup), and are well beyond and outside the scope of this instant
thread.

It just seems like there ought to be a recipe for testing this lmdb
extenders that just involves:

1) Download and build lmdb (which I have done).

2) Download and build lmdb extender (which I have tried, but failed to do).

BTW, I did note that the lmdb extender "configure" script wanted gawkapi.h,
which I was able to supply it with. Would it be as simple as extracting
gawkextlib.h from the gawkextlib tarball and using that? Is that all I
need to do?

--
The last time a Republican cared about you, you were a fetus.

Andrew Schorr

unread,
May 14, 2016, 11:48:39 PM5/14/16
to
On Saturday, May 14, 2016 at 2:40:11 PM UTC-4, Kenny McCormack wrote:
> It just seems like there ought to be a recipe for testing this lmdb
> extenders that just involves:

There is a recipe, and it is clearly stated in the README information.

>
> 1) Download and build lmdb (which I have done).
>
> 2) Download and build lmdb extender (which I have tried, but failed to do).

1. Download and build gawk.
2. Download and build gawkextlib.
3. Download and build lmdb.
4. Download and build gawk-lmdb.

I'm sorry that the world doesn't always work the way that you want it to.

> BTW, I did note that the lmdb extender "configure" script wanted gawkapi.h,
> which I was able to supply it with. Would it be as simple as extracting
> gawkextlib.h from the gawkextlib tarball and using that? Is that all I
> need to do?

No. The gawkextlib library does not exist merely to torture you. There is library code in there that is needed by the extension. I really cannot imagine why it is so difficult for you to download gawkextlib and run configure and make. How are you able to do this for other tarballs, but not for the gawkextlib tarball? If you find it to be a better use of your time, you could probably extract the header file and C files from gawkextlib and hack the gawk-lmdb Makefile to include them in the gawk-lmdb library.

Good luck.

Regards,
Andy

Kaz Kylheku

unread,
May 15, 2016, 1:09:33 AM5/15/16
to
On 2016-05-14, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> It just seems like there ought to be a recipe for testing this lmdb
> extenders that just involves:
>
> 1) Download and build lmdb (which I have done).
>
> 2) Download and build lmdb extender (which I have tried, but failed to do).
>
> BTW, I did note that the lmdb extender "configure" script wanted gawkapi.h,
> which I was able to supply it with. Would it be as simple as extracting
> gawkextlib.h from the gawkextlib tarball and using that? Is that all I
> need to do?

The obvious fix is to merge this gawkextlib into gawk, so that if you
have gawkapi.h, you have gawkextlib.h. Just like if you have stdio.h,
you have stdlib.h.

Kenny McCormack

unread,
May 15, 2016, 1:46:44 AM5/15/16
to
In article <20160514...@kylheku.com>,
Yes. Give that man a cigar!

That *is* what I had been arguing in that previous thread to which I
referred - that gawkextlib is essentially part of gawk now, and should be
treated as such. Merging the two into a single entity would make things
easier for everyone (*).

(*) And, yes, I mean that in a larger sense - not just in the
personal/private sense.

--
"I heard somebody say, 'Where's Nelson Mandela?' Well,
Mandela's dead. Because Saddam killed all the Mandelas."

George W. Bush, on the former South African president who
is still very much alive, Sept. 20, 2007

Andrew Schorr

unread,
May 15, 2016, 4:02:16 PM5/15/16
to
Hi,

On Sunday, May 15, 2016 at 1:09:33 AM UTC-4, Kaz Kylheku wrote:
> The obvious fix is to merge this gawkextlib into gawk, so that if you
> have gawkapi.h, you have gawkextlib.h. Just like if you have stdio.h,
> you have stdlib.h.

The gawkextlib library includes both some C functions and a header file. It is not included in gawk because it is not required to use gawkextlib to implement extension libraries. To consider some obvious examples, the extensions bundled with gawk do not use gawkextlib. Furthermore, some of the simpler extensions in the gawkextlib project do not even use the gawkextlib library, i.e. the errno extension and the nl_langinfo extensions. In addition, if I ever find time to finish the join extension, I think it will not use gawkextlib.

Since gawkextlib is not required for implementing extensions, it is extremely unlikely that it will ever be included inside the core gawk distribution. We would like to keep the core gawk code as simple and streamlined as possible, in order to make it easier to maintain. It does not make sense to include this optional library inside of gawk.

I cannot fathom why it is so insanely difficult to download a tarball and enter "./configure && make && make check && make install". I just tried it, and it took approximately 2 seconds. What's the big deal here?

Regards,
Andy


Marc de Bourget

unread,
May 15, 2016, 4:34:47 PM5/15/16
to
Hi Andy,

thank you very much for the hints.
First of all, I need to compile GAWK with my MinGW Bloodshed Dev-C++ Compiler:
https://sourceforge.net/projects/orwelldevcpp/

I've taken these sources:
https://sourceforge.net/projects/ezwinports/files/gawk-4.1.3-w32-src.zip/download

I'll tell you what I have done so far:
1. I've unzipped the source to the folder c:\gawk\gawk-4.1.3\

2. Then, I followed the instructions in c:\gawk\gawk-4.1.3\README_d\README.pc:
"Copy the files in the `pc' directory (EXCEPT for `ChangeLog') to the directory with the rest of the gawk sources. (The subdirectories of `pc' need not be copied.)"

3. Then, I adusted in c:\gawk\gawk-4.1.3\Makefile:
prefix = c:/gawk/gawk-4.1.3

4. After that I opened a DOS box in c:\gawk\gawk-4.1.3\ and typed:
path="c:\program files\Dev-Cpp\MinGW64\bin";%path%
mingw32-make.exe mingw32

=> Errors:
c:\gawk\gawk-4.1.3>mingw32-make.exe mingw32
mingw32-make.exe all \
CC=gcc O=.o CF="-D__USE_MINGW_ANSI_STDIO -O2 -gdwarf-2 -g3" \
OBJ=popen.o LNK=LMINGW32 LF="-gdwarf-2 -g3" \
LF2="-lws2_32 -lmsvcp60" RSP=
mingw32-make.exe[1]: Entering directory 'c:/gawk/gawk-4.1.3'
gcc -c -D__USE_MINGW_ANSI_STDIO -O2 -gdwarf-2 -g3 -DGAWK -I. -DHAVE_CONFIG_H -DDEFLIBPATH="\"c:/gawk/gawk-4.1.3/lib/gawk\"" -DSHLIBEXT="\"dll\"" array.c
gcc: error: CreateProcess: No such file or directory
Makefile:223: recipe for target 'array.o' failed
mingw32-make.exe[1]: *** [array.o] Error 1
mingw32-make.exe[1]: Leaving directory 'c:/gawk/gawk-4.1.3'
Makefile:170: recipe for target 'mingw32' failed
mingw32-make.exe: *** [mingw32] Error 2

Of course there is no c:/gawk/gawk-4.1.3/lib/gawk\ directory, I assumed it would be created? I use stuff like "makefile" and "make" the first time in my life so I'd be very thankful for a hint.

Andrew Schorr

unread,
May 15, 2016, 6:51:24 PM5/15/16
to
Hi Marc,

On Sunday, May 15, 2016 at 4:34:47 PM UTC-4, Marc de Bourget wrote:
> Of course there is no c:/gawk/gawk-4.1.3/lib/gawk\ directory, I assumed it would be created? I use stuff like "makefile" and "make" the first time in my life so I'd be very thankful for a hint.

I wish I could help, but I have no experience with MinGW. Perhaps somebody else can offer some advice.

I strongly urge you to use Cygwin. Is there a reason you can't use Cygwin? It's easy to install and solves these problems and a lot more.

Regards,
Andy

Marc de Bourget

unread,
May 16, 2016, 9:03:55 AM5/16/16
to
Thank you a lot Andy.

It must be possible to build GAWK with MinGW. Eli Zaretskii did so successfully.
I have made some progress by using the original MinGW binaries instead of those provided by C Compiler companies. MinGW installation was cumbersome but not as tedious as Cygwin installations which crahed my PC long time ago so I don't want to install it again. On top of that, I prefer native Windows binaries.

So, I used this "build.bat" in the c:\gawk\gawk-4.1.3\ directory this time:
path="c:\MinGW\bin\mingw32-make.exe";%path%
mingw32-make.exe mingw32

There are fewer error messages now:
In file included from gawkmisc.c:36:0:
pc/gawkmisc.pc:588:1: error: conflicting types for 'usleep'
usleep(unsigned int usec)
^
In file included from awk.h:169:0,
from gawkmisc.c:27:
c:\mingw\include\unistd.h:130:5: note: previous definition of 'usleep' was here
int usleep( useconds_t period ){ return __mingw_sleep( 0, 1000 * period ); }
^
Makefile:223: recipe for target 'gawkmisc.o' failed
mingw32-make.exe[1]: *** [gawkmisc.o] Error 1
mingw32-make.exe[1]: Leaving directory 'c:/gawk/gawk-4.1.3'
Makefile:170: recipe for target 'mingw32' failed
mingw32-make.exe: *** [mingw32] Error 2

Can someone advice, please?

Andrew Schorr

unread,
May 16, 2016, 9:45:23 AM5/16/16
to
On Monday, May 16, 2016 at 9:03:55 AM UTC-4, Marc de Bourget wrote:
> It must be possible to build GAWK with MinGW. Eli Zaretskii did so successfully.

I agree. It is definitely possible. I have never tried.

> I have made some progress by using the original MinGW binaries instead of those provided by C Compiler companies. MinGW installation was cumbersome but not as tedious as Cygwin installations which crahed my PC long time ago so I don't want to install it again.

I have installed Cygwin on many PCs, and it never caused any problems. Maybe you should try it again. It is easy to install and works very well.

> On top of that, I prefer native Windows binaries.

Why? In what way is a native Windows binary better than a Cygwin program? I'm not suggesting that it isn't, but I'd like to be educated. Is it a performance issue?

Regards,
Andy

Kenny McCormack

unread,
May 16, 2016, 9:53:54 AM5/16/16
to
In article <26fd2cb1-553b-432e...@googlegroups.com>,
Marc de Bourget <marcde...@gmail.com> wrote:
>Thank you a lot Andy.
>
>It must be possible to build GAWK with MinGW. Eli Zaretskii did so successfully.
>I have made some progress by using the original MinGW binaries instead of those
>provided by C Compiler companies. MinGW installation was cumbersome but not as
>tedious as Cygwin installations which crahed my PC long time ago so I don't want
>to install it again. On top of that, I prefer native Windows binaries.

As you see, I've diverted this thread, because your discussion has nothing
to do with the lmdb extension (except as noted below).

Short summary: Cygwin ain't so bad. I've installed it many times and it
just works. Now, you may have bad memories from it long ago (like 15 years
or so), but it is much better now (and the last time I messed with it was
probably a few years ago, so it is not like it is just now becoming
usable). Secondly, you don't need the full Cygwin install on target
machines - just on your dev machine. This is a common misconception that I
see frequently on Usenet threads - people for some reason get apoplectic
about installing Cygwin on their target/client machines, but they don't
realize that you don't have to do that (and I rarely do).

Next, I don't understand why you prefer "native Windows binaries". Is
there some tangible reason for this or is it basically dogmatic/religious?
I've always found that Cygwin-compiled binaries work just fine, and for
Unix-centric things like GAWK, generally much better than the "native"
compiled versions.

Finally, I think what Andy is saying is that although you may eventually
struggle your way into getting GAWK itself compiled using some "native"
compiler, you're going to have an ongoing (and ultimately fruitless)
struggle getting gawkextlib and/or any gawkextlib-dependent things (such as
the original topic of this thread) compiled that way. Just look at how
much struggle I've had getting it (the titular topic of this thread)
compiled under Linux - so it is going to be a whole lot harder under
something other than Linux and/or something other than Cygwin.

--
The scent of awk programmers is a lot more attractive to women than
the scent of perl programmers.

(Mike Brennan, quoted in the "GAWK" manual)

Marc de Bourget

unread,
May 16, 2016, 9:55:10 AM5/16/16
to
Thank you Andy.

I can't answer which one is better because Cygwin installation crashed last time I installed it and I won't install it again unless I can't get MinGW working. Maybe Eli or someone else can have a look at the MingW error messages with "conflicting types for 'usleep'". If I can't get it to work I'll write a gawk bug report (although I don't think this is a gawk bug).

Thank you again. I appreciate it.

Kenny McCormack

unread,
May 16, 2016, 9:57:40 AM5/16/16
to
In article <424db756-8838-49bd...@googlegroups.com>,
Andrew Schorr <asc...@telemetry-investments.com> wrote:
...
>> On top of that, I prefer native Windows binaries.
>
>Why? In what way is a native Windows binary better than a Cygwin program?
>I'm not suggesting that it isn't, but I'd like to be educated. Is it a
>performance issue?

I think it is a basic "fear of the other" - that seems to be basic to the
Windows mindset. That is not intended as flame or insult. It is simple
fact and goes back to the entire nature and reason for Microsoft's
existence.

The point is that they don't like to have this extra, POSIX-thingie, layer
running. The Windows API (like 640K) should be good enough for anybody.

--
Given Bush and his insanely expensive wars (*), that we will be paying for
for generations to come, the only possible response a sensible person need
ever give, when a GOPer/TeaBagger says anything about "deficits", is a
polite snicker.

(*) Obvious money transfers between the taxpayers and Bush's moneyed
interests. Someday, we'll actually figure out a way to have a war where the
money just gets moved around and nobody (on either side) gets injured or
killed. That will be an accomplishment of which we will be justly proud.

Kaz Kylheku

unread,
May 16, 2016, 11:15:46 AM5/16/16
to
Neither MinGW nor Cygwin are "native". Or else they are both "native".
It's just a non-technical viewpoint.

Both Cygwin and MinGW programs can call Win32 API functions.

MinGW programs are linked to a crappy, outdated library from Microsoft
which provides shabby versions of a tiny subset of some POSIX functions.
(MinGW removes the gratuitous underscores from their names).
Microsoft's Unix-like functions in MSVCRTL.DLL are not "native" in any
sense. They are not part of the Windows API.

Cygwin programs are linked to a much better, more robust C library
which has better ties to the underlying Windows.

For instance, here is a big difference, the "stat" function
in Cygwin retrieves richer attributes, including meaningful ownership
and permission. The "stat" function in MSVCRT.DLL is just a hack
for quick and dirty porting of Unix utilities.

Neither function is a "native" Windows way for getting file attributes,
but Cygwin's is much better.

Both Cygwin and MinGW let you call the Win32 API to get Win32 file
attributes, if you want---but that's not the point of using these
porting environments.

Kaz Kylheku

unread,
May 16, 2016, 11:23:29 AM5/16/16
to
On 2016-05-16, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <424db756-8838-49bd...@googlegroups.com>,
> Andrew Schorr <asc...@telemetry-investments.com> wrote:
> ...
>>> On top of that, I prefer native Windows binaries.
>>
>>Why? In what way is a native Windows binary better than a Cygwin program?
>>I'm not suggesting that it isn't, but I'd like to be educated. Is it a
>>performance issue?
>
> I think it is a basic "fear of the other" - that seems to be basic to the
> Windows mindset. That is not intended as flame or insult. It is simple
> fact and goes back to the entire nature and reason for Microsoft's
> existence.
>
> The point is that they don't like to have this extra, POSIX-thingie, layer
> running.

Unless its name is MSVCRT.DLL! With its POSIX thingies like _stat,
_fileno, _dup, _open, _fcntl, ... which lose their underscore under
MinGW, but are otherwise the same.

> The Windows API (like 640K) should be good enough for anybody.

But it isn't enough, which is why there are things like MSVCRTL.DLL
(used by MinGW programs) and CYGWIN.DLL (used by Cygwin programs).

CYGWIN.DLL is what MSVCRT.DLL hoped it would grow up to be before
is development was abandoned by Microsoft.

Every development tool in Windows has a run-time layer. If you program
in Pascal, your compiler will give you a run-time to support the Pascal
code.

Marc de Bourget

unread,
May 16, 2016, 3:41:58 PM5/16/16
to
Hi Andy,
Yes, you are exactly right with every sentence you have written :-)
Maybe it's kind of dogmatic/religious that I'm extremely satisfied with Microsoft Windows and don't want to use Linux or Linux like layers :-) - but aren't Linux people also a bit dogmatic/religious :-) ?

Hi Kaz,
What I mean by 'native' is that no installation of additional software (also no Runtime library) is needed to execute Windows exe files in a CMD box (command line) or with Windows Explorer (GUI program). BTW, that's why I like Thompson AWK and Delphi (Object Pascal) for these abilities.

"MinGW programs are linked to a crappy, outdated library from Microsoft which provides shabby versions of a tiny subset of some POSIX functions."

I'm sure you are right but it seems this is sufficient for AWK :-)
Eli's binary MinGW gawk version works great and without any issues.

Marc de Bourget

unread,
May 16, 2016, 3:45:28 PM5/16/16
to
> Hi Andy,
> Yes, you are exactly right with every sentence you have written :-)
... Sorry, I confused the names. This time, I meant "Hi Kenny" :-) ...
Marc de Bourget

Manuel Collado

unread,
May 17, 2016, 6:41:24 AM5/17/16
to
Cygwin allows me to profit from almost all the comprehensive GNU
software. But there are some issues when trying to combine Cygwin
programs with "native" Windows programs in the same toolchain.

There are some subtle issues when calling a Cygwin program from a
Windows CDM.EXE shell, or when calling native Windows programs from a
Cygwin bash shell. I have learn to solve these issues, but we should
agree that it is a real nuisance for mostly-Windows users.

Marc de Bourget

unread,
May 17, 2016, 6:54:00 AM5/17/16
to
> There are some subtle issues when calling a Cygwin program from a
> Windows CDM.EXE shell, or when calling native Windows programs from a
> Cygwin bash shell. I have learn to solve these issues, but we should
> agree that it is a real nuisance for mostly-Windows users.
>
> Regards.
> --
> Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Thank you Manuel. This is exactly my concern.

Kenny McCormack

unread,
May 17, 2016, 9:33:54 AM5/17/16
to
In article <nhesgi$1crm$2...@gioia.aioe.org>,
Manuel Collado <m.co...@domain.invalid> wrote:
...
>There are some subtle issues when calling a Cygwin program from a
>Windows CMD.EXE shell, or when calling native Windows programs from a
>Cygwin bash shell. I have learn to solve these issues, but we should
>agree that it is a real nuisance for mostly-Windows users.

The problem with this attitude is that, by intent and design, most Windows
users don't even know that the Command Prompt exists. They're not supposed
to. So, when you've got a user (such as myself and other posters on this
thread) who uses Windows but also uses the Command Prompt, you've then
already got an atypical Windows user. Given all that, it is then not
unreasonable to expect said atypical user to be able to figure out whatever
inconsistencies he may encounter. And note further that this goes double
if said atypical Windows user is also using Unix/Linux/GNU software under
Windows.

Also, you mentioned running "native" Windows binaries from the Cygwin
bash shell. I object to this characterizations on a couple of somewhat
pedantic grounds:
1) As Kaz makes clear in his posts, the word "native" has become
loaded and should probably be avoided. I think what you really
mean here is "GUI" (e.g., notepad).
2) As I've indicated, there's no reason to install the full-blown
Cygwin on client/target machines. Thus, there's not really an
issue here - there's no reason why you'd *want* to run Windows GUI
programs from the bash shell. In fact, about the only things you'd
want to run from the Cygwin bash shell are "./configure" and "make".

--
The randomly generated signature file that would have appeared here is more than 4
lines in length. As such, it violates one or more Usenet RFPs. In order to remain in
compliance with said RFPs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/LindaSmith

Kaz Kylheku

unread,
May 17, 2016, 11:23:58 AM5/17/16
to
On 2016-05-17, Manuel Collado <m.co...@domain.invalid> wrote:
> El 16/05/2016 15:45, Andrew Schorr escribió:
>> On Monday, May 16, 2016 at 9:03:55 AM UTC-4, Marc de Bourget wrote:
>> I have installed Cygwin on many PCs, and it never caused any problems.
> > Maybe you should try it again. It is easy to install and works very well.
>>
>>> On top of that, I prefer native Windows binaries.
>>
>> Why? In what way is a native Windows binary better than a Cygwin program?
> > I'm not suggesting that it isn't, but I'd like to be educated. Is it
> a performance issue?
>
> Cygwin allows me to profit from almost all the comprehensive GNU
> software. But there are some issues when trying to combine Cygwin
> programs with "native" Windows programs in the same toolchain.
>
> There are some subtle issues when calling a Cygwin program from a
> Windows CDM.EXE shell,

Few native Windows programs are used this way. Nothing that is targetted
at everyday use by end users.

There are issues when calling *any* command line program a shell on
Windows. The main one is that the command line is a single character
string, whose parsing is entirely up to the process which receives it.
The application decides how the string is delimited into arguments and
all the quoting rules.

Scripting with an programs whatsoever out of CMD.EXE is rife with
issues.

Even if you have a so-called "native" Awk compiled with MinGW or
whatever, you cannot use it with all but the most trivial "Unixy"
one-liners out of CMD.EXE.

The special Cygwin issue used to be (IIRC) that Cygwin programs (the standard ones)
didn't understand a path like "C:\users\Bob". This had to be
"/cygdrive/c/users/Bob" under Cygwin. That isn't true any more.

I just tried

ls c:\\users\\kaz
ls c:/users/kaz
ls /cygdrive/c/users/kaz

at a Cygwin bash prompt: all three work. Even if they didn't,
you could still write a Cygwin program (an executable linked to
cygwin.dll) such that it understands a Windows path.

> or when calling native Windows programs from a
> Cygwin bash shell.

The use of the bash shell has no bearing on whether a program linked
with cygwin.dll is "native" or not; such a thing can be deployed without
including Bash.

On the other hand, if you actually want to use utilities and languages
from the Unix environment in all the ways that they are meant to be
used, including convenient one-liners from the shell prompt with all the
quoting and escaping rules, you are much better off with Cygwin: using
the Cygwin shell to run Cygwin programs.

I use a Cygwin bash shell script as a Windows service, which I can start
and stop with the Services MSI control panel in Windows. The script runs
in a loop and maintains a tunnel (via Cygwin's build of ssh).

If you reboot the machine, it nicely starts up, even with nobody logged
in.

That's "native" enough for me, thank you very much.

Kenny McCormack

unread,
May 17, 2016, 12:33:01 PM5/17/16
to
In article <201605170...@kylheku.com>,
Kaz Kylheku <545-06...@kylheku.com> wrote:
...
>Few native Windows programs are used this way. Nothing that is targetted
>at everyday use by end users.

Correct. This whole discussion is outside the realm of the day-to-day
Windows user.

>There are issues when calling *any* command line program a shell on
>Windows. The main one is that the command line is a single character
>string, whose parsing is entirely up to the process which receives it.
>The application decides how the string is delimited into arguments and
>all the quoting rules.

Not entirely true - and not really relevant to the current discussion.

>Scripting with an programs whatsoever out of CMD.EXE is rife with
>issues.

Yes. True.

>Even if you have a so-called "native" Awk compiled with MinGW or
>whatever, you cannot use it with all but the most trivial "Unixy"
>one-liners out of CMD.EXE.

Yes. True.

>The special Cygwin issue used to be (IIRC) that Cygwin programs (the standard ones)
>didn't understand a path like "C:\users\Bob". This had to be
>"/cygdrive/c/users/Bob" under Cygwin. That isn't true any more.

For me, the killer was the fact that it didn't recognize the current path
on any drive other than the currently logged drive.

I.e., the following doesn't work as expected:

C> cd D:\foo\bar
C> echo this > D:moon
C> someCygwinProgram D:moon

The Cygwin program looks for D:\moon, and not for D:\foo\bar\moon.

>> or when calling native Windows programs from a
>> Cygwin bash shell.
>
>The use of the bash shell has no bearing on whether a program linked
>with cygwin.dll is "native" or not; such a thing can be deployed without
>including Bash.

Yes. True.

>On the other hand, if you actually want to use utilities and languages
>from the Unix environment in all the ways that they are meant to be
>used, including convenient one-liners from the shell prompt with all the
>quoting and escaping rules, you are much better off with Cygwin: using
>the Cygwin shell to run Cygwin programs.

TBH, I never really saw the point of trying. As I said in an earlier post,
about all I've ever used the Cygwin bash shell for is ./configure && make.

If I want Linux, I know where to find it. I don't really need to run
a Linux-y shell on my Windows boxes.

--
The randomly generated signature file that would have appeared here is more than 4
lines in length. As such, it violates one or more Usenet RFPs. In order to remain in
compliance with said RFPs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/LadyChatterley

Marc de Bourget

unread,
May 17, 2016, 12:35:14 PM5/17/16
to
> That's "native" enough for me, thank you very much.

I've given my definition of "native" above, but probably this is only my "own" definition and may be misunderstanding because I'm not a "native" English speaker :-) :-). But now enough about "native".

One question please although I don't intend to install Cygwin yet: Does the target computer require a Cygwin installation or is copying DLL files enough?

And back to MingGW: Can someone reproduce the "conflicting types for 'usleep'" issue, please? If not, I'll write a GAWK MinGW installation bug report (although as said I doubt it is a GAWK bug).

Janis Papanagnou

unread,
May 17, 2016, 12:45:08 PM5/17/16
to
On 17.05.2016 18:35, Marc de Bourget wrote:
>
> One question please although I don't intend to install Cygwin yet: Does the
> target computer require a Cygwin installation or is copying DLL files
> enough?

When I've done that in the past I copied just one (or two?) Cygwin DLLs, that
was enough.

Janis

Kenny McCormack

unread,
May 17, 2016, 1:00:14 PM5/17/16
to
In article <nhfhqj$u5n$1...@news.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>On 17.05.2016 18:35, Marc de Bourget wrote:
>>
>> One question please although I don't intend to install Cygwin yet: Does the
>> target computer require a Cygwin installation

No. As I've stated many times in this thread, it does not.
Thinking that it does is a common misconception in the "anti-Cygwin" crowd.

>>or is copying DLL files enough?

>When I've done that in the past I copied just one (or two?) Cygwin DLLs, that
>was enough.

Last I checked, running the Cygwin version of GAWK requires 2 DLLs - the
main Cygwin DLL (cygwin1.dll, OSLT) and the i18n one (cygintl.dll, OSLT).

(OSLT = "Or something like that")

--
People who say they'll vote for someone else because Obama couldn't solve
all of Bush's messes are like people complaining that he couldn't cure cancer,
so they'll go and vote for cancer.

Janis Papanagnou

unread,
May 17, 2016, 1:42:05 PM5/17/16
to
On 17.05.2016 19:00, Kenny McCormack wrote:
> In article <nhfhqj$u5n$1...@news.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>
>> When I've done that in the past I copied just one (or two?) Cygwin DLLs, that
>> was enough.
>
> Last I checked, running the Cygwin version of GAWK requires 2 DLLs - the
> main Cygwin DLL (cygwin1.dll, OSLT) and the i18n one (cygintl.dll, OSLT).

Right, the second was for I18N; my (faint) memories suggest me something like
'iconv.dll'. This had been sufficient not only for gawk but also other tools.

Janis

Kaz Kylheku

unread,
May 17, 2016, 1:55:43 PM5/17/16
to
On 2016-05-17, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <201605170...@kylheku.com>,
> Kaz Kylheku <545-06...@kylheku.com> wrote:
> ...
>>Few native Windows programs are used this way. Nothing that is targetted
>>at everyday use by end users.
>
> Correct. This whole discussion is outside the realm of the day-to-day
> Windows user.
>
>>There are issues when calling *any* command line program a shell on
>>Windows. The main one is that the command line is a single character
>>string, whose parsing is entirely up to the process which receives it.
>>The application decides how the string is delimited into arguments and
>>all the quoting rules.
>
> Not entirely true - and not really relevant to the current discussion.

Yes, not entirely.

There are classes of behavior here. Programs linked to Microsoft's C
library, for instance, which use a regular C main function, have a
consistent behavior for delimiting the argument string.

The Shell API also provides a CommandLineToArgVW function. All programs
that use it will have consistent behavior.

Consistently shitty, of course.

>>Scripting with an programs whatsoever out of CMD.EXE is rife with
>>issues.
>
> Yes. True.
>
>>Even if you have a so-called "native" Awk compiled with MinGW or
>>whatever, you cannot use it with all but the most trivial "Unixy"
>>one-liners out of CMD.EXE.
>
> Yes. True.
>
>>The special Cygwin issue used to be (IIRC) that Cygwin programs (the standard ones)
>>didn't understand a path like "C:\users\Bob". This had to be
>>"/cygdrive/c/users/Bob" under Cygwin. That isn't true any more.
>
> For me, the killer was the fact that it didn't recognize the current path
> on any drive other than the currently logged drive.

But, how does this work at all?

According to the API documentation for the Win32 GetCurrentDirectory
function:

"Each process has a single current directory that consists of two parts:
^^^^^^

* A disk designator that is either a drive letter followed by a colon,
or a server name followed by a share name (\\servername\sharename)
* A directory on the disk designator

But in the CMD.EXE environment, the behavior is like DOS: we apparently
have a per-drive array of current working directories, and change
the "logged drive" not with CD but by using the drive letter as a
command.

Where in Win32 is the "logged drive" concept?

Kenny McCormack

unread,
May 17, 2016, 2:28:16 PM5/17/16
to
In article <201605171...@kylheku.com>,
Kaz Kylheku <545-06...@kylheku.com> wrote:
...
>> For me, the killer was the fact that it didn't recognize the current path
>> on any drive other than the currently logged drive.
>
>But, how does this work at all?

Apparently, it is emulated using environment variables. In the category of
things I learned on the way to learning other things, it turns out that
Win32 programs maintain "hidden" environment variables with values like:

varName value
======= =====
C: C:\Documents and Settings\username
D: D:\foo\bar
etc

These vars are hidden, so that they don't show up when you do, e.g., "set"
at the CMD prompt.

...
>But in the CMD.EXE environment, the behavior is like DOS: we apparently
>have a per-drive array of current working directories, and change
>the "logged drive" not with CD but by using the drive letter as a
>command.

I think your underlying point here is that this *is* not native behavior in
Win32, but is emulated to work the same way as it did in DOS. And I think
you're right about that...

--
The randomly generated signature file that would have appeared here is more than 4
lines in length. As such, it violates one or more Usenet RFPs. In order to remain in
compliance with said RFPs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/BusyOnTheProof

Kaz Kylheku

unread,
May 17, 2016, 2:45:04 PM5/17/16
to
On 2016-05-17, Kaz Kylheku <545-06...@kylheku.com> wrote:
> Where in Win32 is the "logged drive" concept?

It turns out this is documented neither under GetCurrentDirectory, nor
under SetCurrentDirectory.

It's documented in the code sample "Changing the Current Directory", in
a boxed Note which says:

Note: Although each process can have only one current directory, if the
application switches volumes by using the SetCurrentDirectory function,
the system remembers the last current path for each volume (drive
letter). This behavior will manifest itself only when specifying a drive
letter without a fully qualified path when changing the current
directory point of reference to a different volume. This applies to
either Get or Set operations.

It does seem like Cygwin's DOS path support could play along with this
fairly easily.
I wrote to the mailing list and it was fixed quite promptly.

By the way, the above contains yeat another lie: "[T]his behavior will
manifest itself only when specifying a drive letter without a fully
qualified path when changing the current directory ..." Obviously that
is false. The behavior of maintaining a stash of current working
directories for drive letters also manifests itself when you use
relative paths with drive letters! And that fact means that a process
does not in fact have just a single current working directory; it has
one current full working directory which applies to paths which don't
include a drive letter. And it has additional ancillary working
directories that apply to relative paths which do include a drive
letter. Such paths are relative to *something* and that something
something is a *directory* and is *current* to the process!

Kaz Kylheku

unread,
May 17, 2016, 3:03:26 PM5/17/16
to
On 2016-05-17, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <201605171...@kylheku.com>,
> Kaz Kylheku <545-06...@kylheku.com> wrote:
> ...
>>> For me, the killer was the fact that it didn't recognize the current path
>>> on any drive other than the currently logged drive.
>>
>>But, how does this work at all?
>
> Apparently, it is emulated using environment variables. In the category of
> things I learned on the way to learning other things, it turns out that
> Win32 programs maintain "hidden" environment variables with values like:
>
> varName value
> ======= =====
> C: C:\Documents and Settings\username
> D: D:\foo\bar
> etc
>
> These vars are hidden, so that they don't show up when you do, e.g., "set"
> at the CMD prompt.

You are right! I can see them with my MinGW build of txr, which uses
GetEnvironmentStrings() for the env function:

C:\Users\kaz>txr -t "(env)"
=C:=C:\Users\kaz
=D:=D:\
=ExitCode=00000000
=P:=P:\pub
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\kaz\AppData\Roaming
[ .. snip numerous others ... ]

Looks like these are very special entries that *start* with the equal
character; so the name is not exactly "C:". It's not clear what the
name is. According to the rule that everything before the first = sign
is the name, their names are empty. They appear as multiple entries
for the empty-name variables, with an internal namespace.

(What is this =ExitCode thing?)

> I think your underlying point here is that this *is* not native behavior in
> Win32, but is emulated to work the same way as it did in DOS. And I think
> you're right about that...

I suspect that it's SetCurrentDirectory doing this with the environment
variables---a Win32 function in kernel32.dll, which makes it quite "native".
Using environment variables for the storage of these per-drive current
directories makes sense; it's a quick and dirty way to obtain
inheritance of these to spawned processes.

Look:

C:\Users\kaz>txr -t "(progn (chdir \"P:\\vim\") (env))"
=C:=C:\Users\kaz
=D:=D:\
=ExitCode=00000000
=P:=P:\vim
[ ... ]

The calll to chdir (a direct wrapper for the chdir function in MinGW
(i.e. MSVCRT.DLL)) changed the value of the =P: variable. I'm
reasonably certain that chdir just calls down to SetCurrentDirectory.

Kenny McCormack

unread,
May 17, 2016, 5:34:20 PM5/17/16
to
In article <201605171...@kylheku.com>,
Kaz Kylheku <545-06...@kylheku.com> wrote:
...
>You are right! I can see them with my MinGW build of txr, which uses
>GetEnvironmentStrings() for the env function:
>
>C:\Users\kaz>txr -t "(env)"
>=C:=C:\Users\kaz
>=D:=D:\
>=ExitCode=00000000
>=P:=P:\pub
>ALLUSERSPROFILE=C:\ProgramData
>APPDATA=C:\Users\kaz\AppData\Roaming
>[ .. snip numerous others ... ]
>
>Looks like these are very special entries that *start* with the equal
>character; so the name is not exactly "C:". It's not clear what the
>name is. According to the rule that everything before the first = sign
>is the name, their names are empty. They appear as multiple entries
>for the empty-name variables, with an internal namespace.

Yes. In fact, I also have code written in another (off-topic) programming
language, that demonstrates these variables's existence and their format.
The code, in fact, does the equivalent of gsub("=","\t") on the environment
string, so that I can then display it in a table format (with a leading tab
and a tab between the "real" variable name and its value).

--
The randomly generated signature file that would have appeared here is more than 4
lines in length. As such, it violates one or more Usenet RFPs. In order to remain in
compliance with said RFPs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/CLC_Nutshell
0 new messages