performance (FYI)

15 views
Skip to first unread message

Dirk Roorda

unread,
Sep 13, 2013, 9:56:59 AM9/13/13
to poio-d...@googlegroups.com
Today I parsed a large LAF resource with graf-python.
The resource is 2 GB, of which 0.005 GB primary data.
So the 2 GBare all regions, nodes, edges, annotations.

I ran graf-python on a MacBook AIR 11" (mid 2012) with 8 GB RAM.
The time and memory usage was this:

Memory: 
6.5 GB real
15.3 GB virtual

Time: 
real     65m56.181s
user     16m42.778s
sys     10m44.703s 

As you see, there was a lot of waiting involved without the CPU doing much. No wonder, look at the massive swap space that was needed.

By the way, I have commented out dependency specifications in the annotation files.
I have ordered the filespecs in the header file in such a way that files came were fed to graf-python before the files that were dependent on them.

So: this is the limit of what you can comfortably do with LAF and POIO on a state-of-the art laptop. 

Dirk Roorda

unread,
Sep 13, 2013, 10:56:45 AM9/13/13
to poio-d...@googlegroups.com
I forgot to tell: the generation of this LAF resource from the database takes just 15 minutes, including the database export before and the validation after.

pbouda

unread,
Sep 16, 2013, 4:02:26 AM9/16/13
to poio-d...@googlegroups.com
Hi Dirk,

how exactly did you meaure time and memory? I want to do the same test with one of my big files and see what comes out if it...

Btw, I created an Amazon AWS image with graf-python and other linguistic and scientific libraries pre-installed, so that I can create a virtual machine with up to 68 GB of RAM with a mouse-click (this amount of memory costs 1,8 € per hour at Amazon). It runs an IPython notebook server and is based on the project notebookcloud. Here is a link, in case you want give it a try:

http://media.cidles.eu/labs/ipython-notebooks-for-linguists/

The image and the webapp are open, so you can modify anything to your needs. It would be an interesting idea to have something comparable within CLARIN...

Best,
Peter

Dirk Roorda

unread,
Sep 16, 2013, 9:04:21 AM9/16/13
to poio-d...@googlegroups.com
Hi Peter,
on the command line (bash) I prefixed the command with "time ".
In Activity Monitor on the Mac I kept inspecting the memory usage of the python process, and I kept track of the highest value I saw there (that was at the end).
Later I will study your use of Amazon. Quite interesting!
But I also want to see methods to keep everything confined to one laptop (probably a researcher does not need all annotation files).
Reply all
Reply to author
Forward
0 new messages