zlib import in pyi_archive.py fails: Too many open files

219 views
Skip to first unread message

SK

unread,
Oct 5, 2012, 1:38:23 AM10/5/12
to pyins...@googlegroups.com
Hi,

I'm trying PyInstaller 2.0 and the latest git versions on Fedora 17 64-bit and one of my Python 2.7 applications won't load. After ~20 sec delay it fails with this traceback:

Traceback (most recent call last):
  File "<string>", line 49, in <module>
  File "/opt/pyinstaller/PyInstaller/loader/pyi_iu.py", line 386, in importHook
    mod = _self_doimport(nm, ctx, fqname)
  File "/opt/pyinstaller/PyInstaller/loader/pyi_iu.py", line 459, in doimport
    mod = director.getmod(nm)
  File "/opt/pyinstaller/PyInstaller/loader/pyi_iu.py", line 246, in getmod
    owner = self.shadowpath[thing] = self.__makeOwner(thing)
  File "/opt/pyinstaller/PyInstaller/loader/pyi_iu.py", line 264, in __makeOwner
    owner = klass(path)
  File "/opt/pyinstaller/PyInstaller/loader/pyi_archive.py", line 435, in __init__
    self.pyz = ZlibArchive(path)
  File "/opt/pyinstaller/PyInstaller/loader/pyi_archive.py", line 329, in __init__
    raise RuntimeError('zlib required but cannot be imported')
RuntimeError: zlib required but cannot be imported

Instrumenting pyi_archive.py I find that the exception error message looks like:
/path/to/MyApp/bin/zlib.so: cannot open shared object file: Too many open files

My other applications don't do this. The applications use packages including
PySide, matplotlib, and psutil. The problematic application is different in that
it loads an F2PY-generated shared lib but not until needed, so I don't think that
could be the problem. It also doesn't load a large number of files.

They work with PyInstaller 1.5.1 so I can use that for now but I'm hoping to
find an answer so that I can use 2.x in the future. I'm happy to try out ideas
and patches.

TIA for any help,
Stuart

Martin Zibricky

unread,
Oct 5, 2012, 8:17:55 AM10/5/12
to pyins...@googlegroups.com
SK píše v Čt 04. 10. 2012 v 22:38 -0700:
> My other applications don't do this. The applications use packages
> including
> PySide, matplotlib, and psutil. The problematic application is
> different in that
> it loads an F2PY-generated shared lib but not until needed, so I don't
> think that
> could be the problem. It also doesn't load a large number of files.

Hi Stuart,

how could we reproduce your issue?

I think without that we can't move forward.

SK

unread,
Oct 6, 2012, 10:04:51 AM10/6/12
to pyins...@googlegroups.com
Hi Martin,

Thanks for responding. I dug into it and discovered the cause.

The application is using the standard "trick" to change its shared library search path:
os.environ[ 'LD_LIBRARY_PATH' ] += os.pathsep + lib_path # Add a lib search dir
os.execv( sys.argv[ 0 ], sys.argv ) # Restart with updated env

This works when bundled in PyInstaller 1.5.1 but not in 2.0 (including git tip), where the
modified environment is not inherited by the restarted application.

Perhaps this is related to ticket 465.

Any suggestions on a different way to modify the shared lib search path (other than a wrapper script)?

I'm happy to test a patch for this issue.

Thanks,
Stuart

Martin Zibricky

unread,
Oct 7, 2012, 2:30:08 PM10/7/12
to pyins...@googlegroups.com
Hi Stuart,

could you provide a code example to reproduce this issue?

Could be what you are trying to do related to
http://www.pyinstaller.org/ticket/161 ?


SK píše v So 06. 10. 2012 v 07:04 -0700:

SK

unread,
Oct 8, 2012, 3:38:51 AM10/8/12
to pyins...@googlegroups.com
Hi Martin,

Pretty much as simple as I described, but here it is:

#!/usr/bin/env python

# Demo for PyInstaller 2.x bug with environment changes not being inherited by os.execv
#
# Build with /path/to/pyinstaller/pyinstaller.py this_scripts_name.py
#
# Running from Python source restarts once with the expected env var change
# Running the PyInstaller-generated binary gives an infinite loop without
#  the env var change being passed through

import os, sys

my_lib_path = '/path/to/my/lib'
ld_lib_path = os.environ.get( 'LD_LIBRARY_PATH' )
print '\nBefore:', ld_lib_path
if ld_lib_path is None:
    os.environ[ 'LD_LIBRARY_PATH' ] = my_lib_path
    print 'After:', os.environ[ 'LD_LIBRARY_PATH' ]
    print 'Restarting...'

    os.execv( sys.argv[ 0 ], sys.argv )
elif my_lib_path not in os.environ[ 'LD_LIBRARY_PATH' ]:
    os.environ[ 'LD_LIBRARY_PATH' ] += os.pathsep + my_lib_path
    print 'After:', os.environ[ 'LD_LIBRARY_PATH' ]
    print 'Restarting...'

Martin Zibricky

unread,
Oct 9, 2012, 6:48:42 PM10/9/12
to pyins...@googlegroups.com
SK píše v Po 08. 10. 2012 v 00:38 -0700:
> Hi Martin,
>
> Pretty much as simple as I described, but here it is:

This is bad example. You are not using execv() properly.

execv() restarts itself from the beginning and after restarting itself
it restarts itself again. This causes an infinite loop of creating new
processes.

You have to somehow detect if the app is running already after execv()
or before that and prevent another call to execv().

SK

unread,
Oct 9, 2012, 11:16:03 PM10/9/12
to pyins...@googlegroups.com
Hi Martin,

Well, it is just illustrative code but it is valid and does not infinite loop when run from the Python source -- try it!

This line:

  elif my_lib_path not in os.environ[ 'LD_LIBRARY_PATH' ]:
(which I meant to write as:
  elif my_lib_path not in ld_lib_path:
but is still valid) will notice that the environment in the second execution has been updated as expected, which will prevent it being called again. This is the whole problem with PyInstaller -- it does not follow the documented behavior that os.environ changes should carry over into execv environment.

From the Python 2.7.3 docs for the related putenv call (bolding added):

os.putenv(varname, value)

Set the environment variable named varname to the string value. Such changes to the environment affect subprocesses started with os.system(), popen() or fork() and execv().


I do not understand why you believe that execv should inherit the os.environ as its environment: it should, but in PyInstaller 2.0 it doesn't. If this isn't clear please let me know and I'll try again.

Stuart

Martin Zibricky

unread,
Oct 10, 2012, 9:30:21 AM10/10/12
to pyins...@googlegroups.com
SK píše v Út 09. 10. 2012 v 20:16 -0700:
>
> I do not understand why you believe that execv should inherit the
> os.environ as its environment: it should, but in PyInstaller 2.0 it
> doesn't. If this isn't clear please let me know and I'll try again.

I'm sorry, my bad. I didn't realize that LD_LIBRARY_PATH is not
overriden.

PyInstaller 2.0 is working properly. There was a bug in 1.5, that was
fixed in 2.0. The motivation was to allow running apps created by
pyinstaller from another app created by pyinstaller.

Let me explain how pyinstaller works:
- frozen app is run as two processes
1st sets up environment (overrides LD_LIBRARY_PATH etc.)
2nd runs python code
- when you use execv or similar, you are running 1st process
that again overrides LD_LIBRARY_PATH. And then 2nd process is
started again.

What you probably need is to start just the 2nd process again. It won't
work this way. You should try different approach. Probably os.for()
could do what you need:
-----------

import os, sys

my_lib_path = '/path/to/my/lib'
ld_lib_path = os.environ.get('LD_LIBRARY_PATH')

print '\nBefore:', ld_lib_path
if ld_lib_path is None:
os.environ[ 'LD_LIBRARY_PATH' ] = my_lib_path
print 'Forking...'
elif my_lib_path not in os.environ[ 'LD_LIBRARY_PATH' ]:
os.environ[ 'LD_LIBRARY_PATH' ] += os.pathsep + my_lib_path
print 'Forking...'

pid = os.fork()

if pid:
# Wait for child process.
os.waitpid(pid, 0)
else:
# Child code.

SK

unread,
Oct 11, 2012, 2:33:30 PM10/11/12
to pyins...@googlegroups.com
Martin,

Thanks for the clarification and the ideas. I tried it but, as I expected, the os.fork approach will not work to affect the dynamic library search path even though the LD_LIBRARY_PATH change is inherited as expected. The dynamic loader sets the search path at application startup so the execv trick works but os.fork will not.

This capability is important for some Python uses and you can find many discussions of it (e.g., http://stackoverflow.com/questions/856116/changing-ld-library-path-at-runtime-for-ctypes). The alternatives and work-arounds, such as using a wrapper script, are all problematic and inferior. If we can't find another way to accomplish this then these applications can't use PyInstaller 2.0 and this is a limitation that should be documented and, ideally, fixed. I understand that the change from PyInstaller 1.5 adds a capability but it has also broken something important.

Would it be possible to achieve both capabilities? Why can't the first process pass on the LD_LIBRARY_PATH with necessary additions to the second process? This doesn't immediately seem like a problem but I don't know much about PyInstaller internals. Thoughts?

Martin Zibricky

unread,
Oct 11, 2012, 3:07:59 PM10/11/12
to pyins...@googlegroups.com
SK píše v Čt 11. 10. 2012 v 11:33 -0700:
> Would it be possible to achieve both capabilities? Why can't the first
> process pass on the LD_LIBRARY_PATH with necessary additions to the
> second process?

I think it is not possible without breaking other more important stuff.

The issue is how should the 1st process know about the additions to
LD_LIBRARY_PATH? The 1st process knows nothing about python, it just
prepares LD_LIBRARY_PATH for python and the frozen code.

The reason to set LD_LIBRARY_PATH is to ensure frozen executable will
work even in case it is overriden on user's machine.

> This doesn't immediately seem like a problem but I don't know much
> about PyInstaller internals. Thoughts?

What is your usecase? Probably a different solution could be found.

This link
http://stackoverflow.com/questions/856116/changing-ld-library-path-at-runtime-for-ctypes
mentioned patchelf, it could work for you.

http://nixos.org/patchelf.html


SK

unread,
Oct 11, 2012, 5:27:09 PM10/11/12
to pyins...@googlegroups.com
The use case is user-generated "plugin" libraries that have to be found at runtime
to get loaded instead of the "stub" libraries shipped with the application. You can't
know where they will be located ahead of time and it is not acceptable or multi-user
safe to modify the binaries' dynamic loading paths.

If the PyInstaller 2.0 approach blocks any ability to set the LD_LIBRARY_PATH that
the actual application sees then that seems like a serious flaw.


The issue is how should the 1st process know about the additions to
LD_LIBRARY_PATH? The 1st process knows nothing about python, it just
prepares LD_LIBRARY_PATH for python and the frozen code.

Could a problem arise if you just append the LD_LIBRARY_PATH present when
the first process onto the one the second process sees? That way the execv restart
"trick" should work for the plugin use case.

If that isn't safe then maybe special env vars could be used to tell it what to prepend
or append to the LD_LIBRARY_PATH that it needs.

Naively, this seems workable but there may be reasons it can't be done that way.
In any event, I appreciate your giving this some thought. If you want to point me at
the relevant code I'm willing to poke around in it but that probably won't be a very
efficient way to address this issue. Of course, I'm also willing to try any patches.

Martin Zibricky

unread,
Oct 12, 2012, 7:48:31 AM10/12/12
to pyins...@googlegroups.com
SK píše v Čt 11. 10. 2012 v 14:27 -0700:
> Could a problem arise if you just append the LD_LIBRARY_PATH present
> when
> the first process onto the one the second process sees? That way the
> execv restart
> "trick" should work for the plugin use case.

The issue is with the 1st process. It has to detect it's executing
itself.

Probably a workable workaround could be:
- the 1st process sets a special env variable
- when this variable is set then the 1st process knows it is reexecuting
itself and would keep the value of LD_LIBRARY_PATH.

The place to look for the code is ./bootloader/common/.

SK

unread,
Oct 12, 2012, 8:59:13 AM10/12/12
to pyins...@googlegroups.com
On Friday, October 12, 2012 7:48:37 AM UTC-4, Martin Z wrote:
The issue is with the 1st process. It has to detect it's executing
itself.

Probably a workable workaround could be:
- the 1st process sets a special env variable
- when this variable is set then the 1st process knows it is reexecuting
itself and would keep the value of LD_LIBRARY_PATH.

OK, I'll take a look. Is there a reason that the 1st process can't always
append the current LD_LIBRARY_PATH onto the one it needs? Is there
a use case where that breaks something?

Martin Zibricky

unread,
Oct 12, 2012, 9:52:34 AM10/12/12
to pyins...@googlegroups.com
SK píše v Pá 12. 10. 2012 v 05:59 -0700:
> OK, I'll take a look. Is there a reason that the 1st process can't
> always
> append the current LD_LIBRARY_PATH onto the one it needs? Is there
> a use case where that breaks something?

It could break frozen app in case when there is any 3rd party software
that requires LD_LIBRARY_PATH to be set permanently.

SK

unread,
Oct 12, 2012, 4:14:40 PM10/12/12
to pyins...@googlegroups.com
I do not know enough about PyInstaller internals to understand your comment.
You are current replacing the prior LD_LIBRARY_PATH when running the
inner (2nd) process: I don't see how is that better for "3rd party software that
requires LD_LIBRARY_PATH to be set permanently". If 3rd party software
depends on LD_LIBRARY_PATH to find libraries then keeping the prior value as
the suffix seems better, not worse. (Of course we are only talking about internal/
child values of LD_LIBRARY_PATH: the setenv calls do not affect the env
for other non-child applications.)

When I go to start my PyInstaller-bundled application I would expect it to get
the current environment, including LD_LIBRARY_PATH. If the PyInstaller
bundling forces you to need to add the application home path to the front of
LD_LIBRARY_PATH, which makes sense, it still seems correct to append
the prior LD_LIBRARY_PATH onto the home path when setting up the
environment for the "inner" (2nd) process rather than discarding the prior
LD_LIBRARY_PATH. This is independent of the plugin and restart use case:
your application may be written to depend on the LD_LIBRARY_PATH value.

In any event, patching set_dynamic_library_path in bootloader/linux/utils.c
as below fixes my use case. I have not addressed AIX (where the analogous
patch would probably work) or OS X (we are not building for it yet). I took the
naive approach and simply always append the prior LD_LIBRARY_PATH,
based on my understanding as detailed above.

Feel free to use/adapt this as necessary. If you still feel that the prior
LD_LIBRARY_PATH should normally be discarded then an approach with
a special env var to signal a value that must be brought over would still
allow the plugin/restart use case to work. I encourage you to enable one
of these approaches, but if not at least I can patch and use PyInstaller
2.x now. Thanks for the guidance.

static int set_dynamic_library_path(const char* path)
{
    int rc = 0;

#ifdef AIX
    /* LIBPATH is used to look up dynamic libraries on AIX. */
    setenv("LIBPATH", path, 1);
    VS("%s\n", path);
#else
    /* LD_LIBRARY_PATH is used on other *nix platforms (except Darwin). */
    char * curpath = getenv("LD_LIBRARY_PATH");
    if ( ! curpath ) { /* Use required path only */
        rc = setenv("LD_LIBRARY_PATH", path, 1);
        VS("%s\n", path);
    } else { /* Append current path onto required path */
        char apath[ strlen(path) + strlen(curpath) + 2 ];
        strcpy(apath, path);
        strcat(apath, ":");
        strcat(apath, curpath);
        rc = setenv("LD_LIBRARY_PATH", apath, 1);
        VS("%s\n", apath);
    }
#endif /* AIX */

    return rc;
}

Reply all
Reply to author
Forward
0 new messages