failed to parse date field when indexing API.

722 views
Skip to first unread message

Ludwig Possie

unread,
Aug 3, 2017, 3:24:25 PM8/3/17
to archivematica
We are currently testing Archivematica 1.6.0.  We are trying to process metadata contained in a csv file.  As the information is getting ingested, we get an error when it tries to index API.  The error we're getting is:

Traceback (most recent call last):
  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 127, in <module>
    sys.exit(index_aip())
  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 101, in index_aip
    identifiers=identifiers)
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 436, in index_aip
    try_to_index(client, aipData, 'aips', 'aip')
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 445, in try_to_index
    return client.index(body=data, index=index, doc_type=doc_type)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 263, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[failed to parse [mets.ns0:mets_dict_list.ns0:dmdSec_dict_list.ns0:mdWrap_dict_list.ns0:xmlData_dict_list.ns2:dublincore_dict_list.dc:date_unicode_list]]; nested: MapperParsingException[failed to parse date field [ 1902/01/01], tried both date format [yyyy/MM/dd HH:mm:ss||yyyy/MM/dd], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: " 1902/01/01"]; ')

I was following this topic and it seems that this is a known issue with regards to date and field validation prior to being added to ElasticSearch.  The topic ended last year, but there was never any resolution.  Has anyone else ran into this?  Does anyone have any suggestions as to what I can do to resolve this?  Thx.

Sara - Artefactual

unread,
Aug 21, 2017, 3:19:13 PM8/21/17
to archivematica
Hi Ludwig,

Can you provide a sample of your metadata, including the headers?

It's possible that this is caused by ElasticSearch's dynamic mapping. The description is a bit buried in the thread that you linked, so here's a recap: the first time that ElasticSearch comes across a date field, it interprets the value as the "correct" date string format. If you then subsequently enter an "incorrect" date string, it throws this indexing error. For example, if in your first transfer you included a date like 2015-01-01, ElasticSearch would believe that to be the correct format for a date; if, in a later SIP, you introduced a date range like 2001-01-01/2010-10-10, it would not be valid. This is based on ElasticSearch's Date datatype rules.

Regards,
Sara
Reply all
Reply to author
Forward
0 new messages