could not parse pubdate from <<[188-]>> for pubdaterange
. . . . . . . . . . . . . . . . . . . . . . . .Traceback (innermost last):
File "indexerDriver.py", line 182, in ?
File "indexerDriver.py", line 121, in processFile
File "/home/gsf/svn/fac-back-opac/trunk/indexer/indexer.py", line
149, in __init__
File "/home/gsf/svn/fac-back-opac/trunk/indexer/processors.py", line
119, in pubdaterangeProcessor
IndexError: index out of range: 205
Mark mentioned he had gotten index errors on encoding issues in the
past, so I converted the whole thing from MARC8 to UTF8 with
yaz-marcdump, but ended up getting the same error. Obviously, a
record is screwed up somewhere. Any ideas on how to clean it up?
This might be easier to solve if I break up my MARC dump into smaller
chunks. I'll do that tomorrow if I can't figure out another way
around it.
Gabriel
Hey, at least you bothered to dig into the code, which is probably
better than banging one's head on the keyboard.
Tests would be nice. If I'm gonna figure some out, however, I'm not
going to want to wait so long for 130,000 to come around, so I'll
redump the MARC tomorrow in chunks of 50,000. Then I'll mess with
that code you mentioned and see if I can get some output that says
what's going on.
Gabriel
I've started a "pymarc-indexer" branch to attempt to parse the MARC
with pymarc instead of marc4j. I've got the directory structure set
up, but not a lot of code yet. More on that in the next couple days.
> I ended up breaking the MARC record loading into smaller files and
> then running any file containing a problem record through Terry
> Reese's MarcEdit MARCValidator (http://oregonstate.edu/~reeset/
> marcedit/html/index.php). You can set it to remove invalid records
> (MarcEditor|Tools|Validate MARC Files). That let me index all the
> "valid" records and gave me a file of "bad" records.
MarcEdit is by all accounts awesome. I've been meaning to introduce
our catalogers to it if they don't know about it already (I suspect
they do).
Gabriel
ps And thanks to Dan Scott for committing that fix to processors.py.
That's what I call talking *and* doing.