yaml and compiling libYaml

413 views
Skip to first unread message

Asi Sudai

unread,
Nov 21, 2013, 2:20:47 PM11/21/13
to python_in...@googlegroups.com
Hi gang,

I'm wish to write&read data files ( big size 10mb+ ), I'm thinking about using using yaml for that.
But, after some testing, seems that yaml is extremely slow in both write and read for file that size. Than I read about libYaml C++, that speed things up for using yaml.CLoader.

I'm using Windows7 64bit and I couldn't find any installer for libYaml, so I rolled out my sleeves and tried ( for the first time ever ) compiling the source (using VS2008).
I mange to compile the output yaml.dll. but that's not the file type I need for python to import/use , I need *.pyd so I got stuck at this point and could use some help :)

Any idea how can I compile libYaml for win64bit and python?
Or
What's you're favorite writer/reader of big size dictionary-like files ( where speed and human-readability matters )

Ed Caspersen

unread,
Nov 21, 2013, 2:41:01 PM11/21/13
to python_in...@googlegroups.com
Speed and human-readable then I would suggest JSON if you don't want to compile anything for Windows.

http://codersbuffet.blogspot.com/2010/03/json-vs-xml-and-python-parsing.html

I personally use YAML for most cases where I know the file size and parsing speed isn't going to be an issue. Tho I couldn't tell you anything about compiling libYAML.


Ed Caspersen


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/7f96a216-9c53-4ad7-a10c-07a09c7db7b2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Justin Israel

unread,
Nov 21, 2013, 2:42:33 PM11/21/13
to python_in...@googlegroups.com

I'm not fully up on my windows compiling but I assume the dll needs to be linked into to the Python library that creates the pyd? That is, libyaml is a library that a python yaml package wraps.

As far as I know, yaml would be slower to parse than json, because it is a superset with much more formatting options and whitespace. Json might be faster and just as readable.
I did a comparison of serializers, although yaml is not included,  and it was tested with smaller data
http://www.justinfx.com/2012/07/25/python-2-7-3-serializer-speed-comparisons/

Megajson and ultrajson came out very fast. You could update the test to include yaml libraries, and also to do a test against a 10mb data structure.

--

Asi Sudai

unread,
Nov 21, 2013, 7:36:54 PM11/21/13
to python_in...@googlegroups.com
I gave up on using yaml it's doing some overkill object parsing, that I don't in my case.

Json almost gives me what I want, I wish there was a way to control "depth" like in pprint?

for example:
data = {'object1':{'a':[1,2,3,4,5], 'b':{'1':[1,2,3,4,5,6]}}}
print json.dumps( data,indent=4,separators=(',',': ') )

>> {
    "object1": {
        "a": [
            1,
            2,
            3,
            4,
            5
        ],
        "b": {
            "1": [
                1,
                2,
                3,
                4,
                5,
                6
            ]
        }
    }
}


## Would love to have control over depth or better separators, so the output would be:
{
    "object1": {
        "a": [1,2,3,4,5],
        "b": {
            "1": [1,2,3,4,5,6]
        }
    }
}


Colin Alteveer

unread,
Nov 21, 2013, 8:01:30 PM11/21/13
to python_in...@googlegroups.com
Asi, did you consider using pickle in binary mode (protocol version 2) to do this? What else is reading these files?


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

Justin Israel

unread,
Nov 21, 2013, 8:26:11 PM11/21/13
to python_in...@googlegroups.com
Asi was saying that it needed to be human readable (not sure what the requirements are on that). But cPickle binary would be a fast solution, right up there with ultrajson, and msgpack (Which I accidentally referred to as megajson in my previous reply). cPickle and msgpack are both binary, and ultrajson would still be human readable.



Reply all
Reply to author
Forward
0 new messages