[OT]dataframe lib for C/C++?

87 views
Skip to first unread message

进陆

unread,
Jun 17, 2017, 8:24:10 PM6/17/17
to PyData
Currently when I try to release a tiny (about 100 lines) anaconda+numpy+pandas app on ms-windows with pyinstaller, the final files take up 500M, which is too huge.

I know Pandas is not only a dataframe lib, the excel files IO makes my work simplified too. Even during my search on google and github for a C/C++ lib which can acts like pandas's dataframe operation (sort, merge, join, group, pivot_table), I found nothing useful.

I think it is very hard for C(or any compiled languages) to implement a full-functional dataframe lib like python/R does. But maybe I am wrong. Any hints? Thanks.

Nathaniel Smith

unread,
Jun 17, 2017, 8:35:09 PM6/17/17
to pyd...@googlegroups.com
On Sat, Jun 17, 2017 at 5:24 PM, 进陆 <lepto....@gmail.com> wrote:
> Currently when I try to release a tiny (about 100 lines)
> anaconda+numpy+pandas app on ms-windows with pyinstaller, the final files
> take up 500M, which is too huge.

That looks weird to me -- if you look at the official downloads for
python+numpy+pandas then they come to like 30+8+8 = 46 megabytes
total. I think you should be able to get your pyinstaller app down to
something similar. Probably the first thing to do would be to stop
using anaconda's "mkl" builds; they use Intel's MKL library for linear
algebra, which is fast but ridiculously huge compared to builds that
use OpenBLAS or similar.

-n

--
Nathaniel J. Smith -- https://vorpus.org

进陆

unread,
Jun 17, 2017, 8:37:11 PM6/17/17
to PyData
After post, I realized that apart from file IO/ plotting, the other function in pandas can be implemented with database. Am I right?

在 2017年6月18日星期日 UTC+8上午8:24:10,进陆写道:

进陆

unread,
Jun 18, 2017, 6:39:39 AM6/18/17
to PyData
Yes, MKL takes more space


在 2017年6月18日星期日 UTC+8上午8:35:09,Nathaniel Smith写道:

Iblis Lin

unread,
Jun 18, 2017, 7:54:15 AM6/18/17
to pyd...@googlegroups.com, 进陆
Is Nuitika helpful?

( I do not try it yet, just provide some idea
> --
> You received this message because you are subscribed to the Google
> Groups "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pydata+un...@googlegroups.com
> <mailto:pydata+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Stephan Hoyer

unread,
Jun 18, 2017, 3:12:42 PM6/18/17
to pyd...@googlegroups.com
SQLite is the obvious choice for this sort of application.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.

Denis Akhiyarov

unread,
Jun 23, 2017, 1:59:56 AM6/23/17
to PyData
I regular package pandas, numpy, scipy with pyinstaller in one file mode compressed with UPX on Windows. It is less than 50 Mb. Don't use mkl version like pointed above. Also create virtual environment just for these dependencies. But I always wondered how to break down these 3 packages into smaller pieces because I'm not using 95% of these APIs.

Tom Augspurger

unread,
Jun 23, 2017, 6:53:49 AM6/23/17
to pyd...@googlegroups.com
Pandas bundles some datasets for testing. We could bring the size down by excluding those from the package if size is an issue.

> On Jun 23, 2017, at 00:59, Denis Akhiyarov <denis.a...@gmail.com> wrote:
>
> I regular package pandas, numpy, scipy with pyinstaller in one file mode compressed with UPX on Windows. It is less than 50 Mb. Don't use mkl version like pointed above. Also create virtual environment just for these dependencies. But I always wondered how to break down these 3 packages into smaller pieces because I'm not using 95% of these APIs.
>
> --
> You received this message because you are subscribed to the Google Groups "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.

Goyo

unread,
Jun 26, 2017, 4:05:36 AM6/26/17
to PyData


El viernes, 23 de junio de 2017, 7:59:56 (UTC+2), Denis Akhiyarov escribió:
I regular package pandas, numpy, scipy with pyinstaller in one file mode compressed with UPX on Windows. It is less than 50 Mb. Don't use mkl version like pointed above. Also create virtual environment just for these dependencies. But I always wondered how to break down these 3 packages into smaller pieces because I'm not using 95% of these APIs.

Do you build scipy by yourself? The only windows build I found were those by Christoph Gohlke and they depend on numpy-mkl.

Denis Akhiyarov

unread,
Jun 26, 2017, 5:17:13 PM6/26/17
to PyData
I built scipy from source on Windows, but only using MSVC, ifort, and MKL. So I was wrong - either scipy wheels without mkl disappeared or I actually never bundled scipy, only numpy + pandas, which is latest setup i have.

There are openblas and ATLAS versions of scipy and numpy, but it is harder to build on Windows due to gfortran dependency (msys, cygwin, mingw - pick your poison).

Goyo

unread,
Jun 27, 2017, 5:37:11 AM6/27/17
to PyData
El lunes, 26 de junio de 2017, 23:17:13 (UTC+2), Denis Akhiyarov escribió:
I built scipy from source on Windows, but only using MSVC, ifort, and MKL. So I was wrong - either scipy wheels without mkl disappeared or I actually never bundled scipy, only numpy + pandas, which is latest setup i have.

There are openblas and ATLAS versions of scipy and numpy, but it is harder to build on Windows due to gfortran dependency (msys, cygwin, mingw - pick your poison).

Yes, building scipy on windows is not a realistic option for me, that's why I was asking. I'll rather bear with the extra weight of numpy-mkl.
Reply all
Reply to author
Forward
0 new messages