echoprint-inverted-index gives error when streaming large dump file to it

58 views

Skip to first unread message

abbas hoseini

unread,

Mar 1, 2017, 8:02:17 AM3/1/17

to echoprint

actually i wanted to test echoprint by its database : http://echoprint-data.s3.amazonaws.com/echoprint-dump-1.json
and i try to do this :
cat echoprint-dump-1.json|jq -r '.[].code' | echoprint-inverted-index index.bin
and it gives this error :

Traceback (most recent call last): File "/usr/local/bin/echoprint-inverted-index", line 19, in <module> create_inverted_index(streamer(sys.stdin), args.indexfile) File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 57, in create_inverted_index for batch_index, batch in enumerate(split_seq(songs, 65535)): File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 30, in split_seq item = list(itertools.islice(it, size)) File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 78, in parsing_code_streamer yield decode_echoprint(line.strip())[1] File "/usr/local/lib/python2.7/dist-packages/echoprint_server/lib.py", line 42, in decode_echoprint unzipped = zlib.decompress(zipped) zlib.error: Error -5 while decompressing data: incomplete or truncated stream

i think it happens just when file is being larger , i tested it with small json files and it works.
any one encounter with this error ?
is this a bug or the problem is just mine ?

and also i try to do it with the api in a script like this :

def makeAndLoadInvertedIndex():
    client = MongoClient('localhost', 27017)
    colection = client.test.songs
    docs = colection.find({})
    codesStr=""
    app.gids=[]
    for doc in docs:
        codesStr+= str(doc['code'])+"\n"
        app.gids.append({"id":str(doc['_id'])})
    f = io.BytesIO(codesStr)
    print "submiting ...."
    create_inverted_index(parsing_code_streamer(f), args.indexfile)
    app.inverted_index = load_inverted_index(['./index.bin'])
    print "all song submited"