Any way to exclude data files from certain locations?

627 views
Skip to first unread message

Евгений Белоусов

unread,
Aug 10, 2021, 2:04:11 PM8/10/21
to PyInstaller
Greetings, everyone.

I am making a GUI application on Linux using PyGObject. I tried packaging it using PyInstaller, but the size of produced distribution was enormous: ~800 MB in one-dir mode, and 200+ MB in one-file mode, which takes ~4.5 seconds to unpack and launch on my machine.

I soon discovered that most of that volume is due to /usr/share/icons, /usr/share/themes and some other directories being copied into the distribution in whole. The icon packs alone take up around 600 MB on my system. I guess, PyInstaller thinks they are required by Gtk, and they may be, but not in my project. I do use a few icons in the GUI, but I supply the icon files myself, so there should be no need for any system icons, let alone all the icons in the world.

Icon packs are the main concern, but there is also a lot of other files from /usr/share which get copied and which I suspect are unneeded in my case.

I've looked through the documentation, and could not find any info on excluding data files, only adding. I am new to PyInstaller (and Python in general, for that matter), and so far I cannot find any way to not collect all those files. I tried using the 'excludes' argument to the Analyzer, but either I am not using it correctly, or it is meant for something else.

It would be great if there was some way to just blacklist certain locations to prevent PyInstaller from collecting anything from them.

The only way I can currently see for dealing with this problem is to simply delete those directories from the ready distribution, but then I'd have to stick with the one-dir mode, which is not ideal.

I will keep digging, but any help would be much appreciated.

Евгений Белоусов

unread,
Aug 11, 2021, 10:50:50 AM8/11/21
to PyInstaller
A day later, I think I have found a solution. It is kind of hacky, but it works.

Firstly, this
> there should be no need for any system icons, let alone all the icons in the world
is not entirely true. Turns out, the UI does use a couple of system icons, for things like arrows in drop-down lists. Still, this doesn't justify lugging around a whole icon theme, let alone all available themes.

Here is how I solved my problem. Since I am a newbie, it took me a bit to realize that the spec file is actually just another Python script which gets imported somewhere, and I can just add Python code to it, and it will be executed. So, I can hijack the datas list after it was populated by Analysis and before it is passed for file collection, and make whatever modifications I want in it. I will paste my spec file below for all future strugglers.

After excluding almost everything from share/ (except for the few files that were truly required), the build size in one-file mode dropped to 53 MB, and the launch time to ~0.6 seconds, which is already quite manageable, but I suspect even more stuff could be stripped out without compromising the portability of my app.

Also, while I was working on this problem, it occurred to me that the ideal solution in my case would be just to have symlinks in share/ that point to corresponding directories in /usr/share. Or even completely do away with substituting the system paths for these locations and just have the app use whatever is available locally in the system. I mean, any graphical Linux installation is bound to have at least some UI theme and some icon pack. If it does not, then the user likely has bigger problems than my app not working. Alas, there seems to be no way to instruct PyInstaller to do anything like this: neither create links, nor collect pre-made link files (it just copies the directories they link to). Please correct me if I am wrong. If any PyInstaller maintainer reads this — please consider making it possible!

My spec file:

# -*- mode: python ; coding: utf-8 -*-


block_cipher = None

ONEFILE = True

a = Analysis(['myapp.py'],
             binaries=[],
             datas=[],
             hiddenimports=[],
             hookspath=[],
             hooksconfig={},
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)


print('********************* CUSTOM SEGMENT *********************')

# Force exclude some unneeded files.

datas_tmp = []
exclude_counter = 0

to_exclude = [  # Exceptions from datas gathered by Analysis.
    '/usr/share',
]
to_add_back = [  # Exceptions from exceptions.
    '/usr/share/icons/Adwaita/scalable/',
    '/usr/share/icons/Adwaita/scalable-up-to-32/',
    '/usr/share/icons/Adwaita/index.theme',
    '/usr/share/mime/mime.cache',
]


def match_any_substring(substrings: list, in_what: str) -> bool:
    for s in substrings:
        if s in in_what:
            return True
    return False


for item in a.datas:
    if match_any_substring(to_exclude, item[1]) and not \
            match_any_substring(to_add_back, item[1]):
        # print('Excluding', item[1])
        exclude_counter += 1
    else:
        datas_tmp.append(item)


a.datas = datas_tmp

if exclude_counter > 0:
    print('Excluded {} data files.'.format(exclude_counter))


with open('debug_datas.txt', 'w') as f:
    print('Dumping datas to {}...'.format(f.name))
    for item in a.datas:
        f.write(repr(item) + '\n')

print('******************* END CUSTOM SEGMENT *******************')


pyz = PYZ(a.pure, a.zipped_data,
          cipher=block_cipher)


if ONEFILE:

    exe = EXE(pyz,
              a.scripts,
              a.binaries,
              a.zipfiles,
              a.datas,
              [],
              name='myapp',
              debug=False,
              bootloader_ignore_signals=False,
              strip=False,
              upx=True,
              upx_exclude=[],
              runtime_tmpdir=None,
              console=True,
              disable_windowed_traceback=False,
              target_arch=None,
              codesign_identity=None,
              entitlements_file=None)

else:

    exe = EXE(pyz,
              a.scripts,
              [],
              exclude_binaries=True,
              name='myapp',
              debug=False,
              bootloader_ignore_signals=False,
              strip=False,
              upx=True,
              console=True,
              disable_windowed_traceback=False,
              target_arch=None,
              codesign_identity=None,
              entitlements_file=None)

    coll = COLLECT(exe,
                   a.binaries,
                   a.zipfiles,
                   a.datas,
                   strip=False,
                   upx=True,
                   upx_exclude=[],
                   name='myapp')


Reply all
Reply to author
Forward
0 new messages