I think we have found the origin of the issue.
In the read_xml.c file, when the xml parser calls the Character callback function twice for a single record, the second time, the skipWhiteSpace function is called again line 858.
this occurs rarely and in an unpredictible way as declared by libxml2 documentation.
If the first block ends with the exponent of a number and the second block starts with the whitespaces or the tab or \n separators, then the skipWhiteSpace function clears them which causes the issue !
I am not sure of the patch but I did these modifications, and both my file and the "foo.xml" file are read properly :
read_xml.c
858,863c858
< if( data->recordStringLength == 0){
< c = (const xmlChar *) skipWhiteSpace (ch, &dlen);
< }else{
< c =ch;
< }
<
---
> c = (const xmlChar *) skipWhiteSpace (ch, &dlen);
In addition, I think there is a potential bug line 922 in the cumulateRecordData function, since it is expressly said in the g_realloc function doc that the pointer may be moved.
Then the begining of the string should be copied as well, as I understand it. So I added something like this :
xmlChar* rec_copy=data->recordString;
data->recordString = (xmlChar *) g_malloc((data->recordStringLength+len +1) * sizeof (xmlChar));
memcpy (data->recordString , rec_copy, data->recordStringLength * sizeof (xmlChar));
memcpy (data->recordString +data->recordStringLength , ch,len * sizeof (xmlChar));
g_free(rec_copy);
But this last modification seems not mandatory on my test cases, which confuses me since the pointer is sometimes moved.
What do you think about this guys?
François