DataFrames in Variable Explorer: great but does not work well with large DFs

269 views
Skip to first unread message

Charles Vellutini

unread,
Oct 28, 2014, 9:10:21 AM10/28/14
to spyd...@googlegroups.com
Hi,
The addition of DataFrames as objects that can be viewed (and edited) in Spyder's Variable Explorer is a fantastic development. Viewing data is extremely important in serious data analysis and related debugging. With this addition Spyder approaches the convenience and workability of dedicated, mature statistical packages such as Stata -- all with the performance and malleability of python. In my view, a true game changer.

Now, I have noticed that the feature does not work well (yet) on large data sets. On my system (python 3.4, 8 Go RAM), attempting to use the Variable Explorer with a df with more than say 100,000 rows freezes Spyder altogether. More work is needed is optimize viewing (load rows/columns only as they are viewed, or a similar strategy?). Also, I would like to suggest that viewing is much more important than editing -- in case it helps to optimize the feature? Editing data through a browser is not something you normally do - viewing data on the other hand, you do all the time.

Again congratulations on this, I truly believe that this is important for the python data analysis community.



androidhrm

unread,
Oct 29, 2014, 5:50:26 AM10/29/14
to spyd...@googlegroups.com

Hi,

A suggestion, even if it is not directly related : generally the "io" module is used, with streams, as high level file objects.
There you have two solutions :
- either use paraview. I heard they had a Python API. Why not blend Spyder and Paraview, then...
- Or use HDF5 format and Python bindings. PyTables, as told on their site, has the reputation to be "blind fast" and efficient at loading while having metadata looking like pandas' dataframes. But there it is up to you at the moment to refill your data into hdf stores, I suppose ? Personnally, the hdf5 format reminds me both hyperspy library and TDMS Labview's technical data format...

--
You received this message because you are subscribed to the Google Groups "spyder" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spyderlib+...@googlegroups.com.
To post to this group, send email to spyd...@googlegroups.com.
Visit this group at http://groups.google.com/group/spyderlib.
For more options, visit https://groups.google.com/d/optout.

Charles Vellutini

unread,
Nov 3, 2014, 11:23:13 AM11/3/14
to spyd...@googlegroups.com
Thanks, interesting.
The thing is that pandas DataFrames have become a sort of de facto standard in manipulating data, at least in my area (econometrics/statistics). For those who have not used them yet, DFs are awesome in terms of preparing data. Still, I will look into what you suggested - always good to know there are other approaches.

Carlos Córdoba

unread,
Nov 3, 2014, 6:50:27 PM11/3/14
to spyd...@googlegroups.com
Hi Charles,

Thanks for your suggestion. I found a way to load rows on demand, so I'll add that to our next release (i.e. 2.3.2). The thing is much more snappier for DataFrames with more than 100,000 rows, but it's not that good for 1,000,000 ones. I don't think we can do better though :-)

Cheers,
Carlos

El 03/11/14 a las 11:23, Charles Vellutini escribió:

Charles Vellutini

unread,
Nov 4, 2014, 8:00:28 AM11/4/14
to spyd...@googlegroups.com
Thanks for the update Carlos, good to know.
Reply all
Reply to author
Forward
0 new messages