error when adding warcinfo record to WARC

14 views
Skip to first unread message

laurendko

unread,
May 22, 2009, 11:13:06 AM5/22/09
to warc-tools
I am using the warc-tools for a python script that walks through a
directory structure and writes a warc record to a WARC for each file.
The data files are being written as resource records, and that works
great with the WARC file containing only resource records validating
with the warcvalidator tool. However, if I try to first add a warcinfo
record, at this line of code:

w.storeRecord(r)

I get this error:

lib/private/wrecord.c:2124: WRecord_getDataFileExtern: Assertion `
(self -> externdfile)' failed.
Aborted

From what I can tell, this is happening because it doesn't like my
trying to store a record without supplying a data file for a payload.
If I do pass it some data file at record creation, it will happily
create and write the warcinfo record with a payload, and this warc
will validate. However, it is my understanding that a warcinfo
shouldn't have a payload.

Here is some of the code:
http://dpaste.com/hold/46646/

Lauren

WARC

unread,
May 26, 2009, 1:25:13 PM5/26/09
to warc-...@googlegroups.com
Hi Lauren,

First of all, thanks for your interest to the WARC-Tools.
You've to know from the WATC specs that a WARC_Record = Header + Block.
From that, a WARC record must have a data block after the header.
But you've also to know that this "data block" is not obligatory a
payload itself. It may be a whole
"http response" for example formed by a "http response" then a
"payload".

In the case of a "WARC Info" record (for example), the data block will
contain the informations about the "WARC file". And we have two ways
to define it:

1) fill it up with a text file and give it to the "Record" as for the
"response recod".

2) give it as a string because our library allows to fill the data
block directly from strings (which is useful in case like WARC INFO
records)

To summarize, a WARC INFO record must have a data block, and this data
block have not to be a "payload".

N.B: could you please share with us your working domain? This will
help us identify people outside national libraries !

Regards
Younès







>
>
> Lauren
>
> >

laurendko

unread,
May 27, 2009, 12:14:27 PM5/27/09
to warc-tools
Hi Younès,
Thank you for your helpful reply. I will go ahead and supply content
for the data block via a string.

Lauren Ko
Digital Projects Unit
University of North Texas Libraries
http://www.library.unt.edu/digitalprojects/
Reply all
Reply to author
Forward
0 new messages