Long line issues

21 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Allen McIntosh

ungelesen,
26.02.2015, 07:37:3926.02.15
an gg...@googlegroups.com
Does ggobi XML have any known issues with long lines?  I have a dataset with 200 columns (originally 500 but I pared it down for debugging).  Ggobi complains that some lines do not contain enough data.  Awk tells me this isn't so, and that all lines have the correct number of columns.  If I change the case labels, ggobi objects to different lines:

$ ggobi foo.xml
Error in XML parsing [line 334, column 10]: Not enough elements
Error in XML parsing [line 436, column 10]: Not enough elements
Error in XML parsing [line 511, column 10]: Not enough elements
^C

$ sed 's/Redacted/Waiting for ggobi/' <foo.xml >foo2.xml
$ ggobi foo2.xml
Error in XML parsing [line 406, column 10]: Not enough elements
Error in XML parsing [line 580, column 10]: Not enough elements
Error in XML parsing [line 604, column 10]: Not enough elements


This happens with 2.1.10-5ubuntu1 amd64 under Ubuntu 14.04.2.

I would look at this myself, but 2.1.11 won't build under Ubuntu 14.04.  (2.1.11 needs GTK 2.X development and 14.04 only has GTK 3.X development)

I'll attach the offending XML file in case I'm doing something wrong.

I expect I can get around this by running ggobi from R, but it would still be nice to be able to feed it directly.
foo.xml

Dianne Cook

ungelesen,
26.02.2015, 08:52:1626.02.15
an Allen McIntosh, gg...@googlegroups.com
Allen,

I can’t see a mistake at line 334, 436 or 511 - there are no lines because there are only 150 records.

The file looks ok to me. I read it out of ggobi into csv.

I think the warning might be triggered by something else, eg some columns that are all zero’s. They have no variance, and hence that causes ggobi some pain trying to set up the plot space. It does it though.

cheers,
Di
> --
> You received this message because you are subscribed to the Google Groups "GGobi users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ggobi+un...@googlegroups.com.
> To post to this group, send email to gg...@googlegroups.com.
> Visit this group at http://groups.google.com/group/ggobi.
> For more options, visit https://groups.google.com/d/optout.
> <foo.xml>

---------------------------
Di Cook
vis...@gmail.com



Michael Lawrence

ungelesen,
26.02.2015, 15:22:3626.02.15
an Dianne Cook, Allen McIntosh, GGobi users
I'm also getting the error, but at different lines, so I fear that something is wrong with the parser, or that file has something weird about it. Probably both. I've been fixing linking problems and cleaning up some compiler warnings this morning, but I won't be able to look into this in detail today.

Allen McIntosh

ungelesen,
27.02.2015, 07:45:4427.02.15
an gg...@googlegroups.com, aamcin...@gmail.com

Di Cook:


I can’t see a mistake at line 334, 436 or 511 - there are no lines because there are only 150 records.

Ggobi is using line numbers not record numbers in its diagnostics.

Allen McIntosh

ungelesen,
27.02.2015, 07:53:0127.02.15
an gg...@googlegroups.com, vis...@gmail.com, aamcin...@gmail.com


On Thursday, February 26, 2015 at 3:22:36 PM UTC-5, Michael Lawrence wrote:
I'm also getting the error, but at different lines, so I fear that something is wrong with the parser, or that file has something weird about it. Probably both. I've been fixing linking problems and cleaning up some compiler warnings this morning, but I won't be able to look into this in detail today.

That's fine.  I figured out how to use a csv file and set colors later.

I'm not surprised at the "different lines".  That matches my observation.

Thank you both for looking at this.

Michael Lawrence

ungelesen,
07.03.2015, 23:42:1807.03.15
an Allen McIntosh, GGobi users, Dianne Cook
Btw guys, I have begun investigating the move to GTK+ 3, since mainstream Linux distros are starting to drop GTK+ 2 support. While the move from GTK+ 1.2 to 2.0 was of major benefit, GTK+ 3 drops the entire GDK drawing API, including pixmaps, etc. It's now all about cairo, which is too high-level for our needs, or at least too different from GDK in ways that matter to performance. My initial goal is just to convert the GDK calls to cairo calls in order to preserve existing functionality. I'm about half-way there right now. But from past experience, I suspect that cairo will be prohibitively slow in many cases, so there will need to be an investment in optimization, eventually.

Michael

--

François Gallard

ungelesen,
25.03.2015, 13:37:2025.03.15
an gg...@googlegroups.com
I am having the same issue, so I uncomented the comment line 1390 in the read_xml.c file and found out that some values are concatenated when tabulations are used as separators in the xml file.
For instance, when a record is made of two values 1.2     -2.3, ggobi (or the xml lib) reads it as 1.2-2.3 which causes the "not enough elements" error.
I modified my writer for ggobi to do not use tabs as separator but spaces and it worked.
I also had similar issues when I used nicknames for variables, so I don't any more.

Francois

Michael Lawrence

ungelesen,
25.03.2015, 13:46:5225.03.15
an GGobi users
Sounds like this might be an issue with recent versions of libxml2?

Btw, I have been pushing hard on the rewrite to GTK3, but it's a lot of rewriting, so no promises.

--

François Gallard

ungelesen,
25.03.2015, 14:08:3025.03.15
an gg...@googlegroups.com
Yes it sounds.

Which version were you using for development ? I can try to downgrade it and see what it changes.

Francois

Michael Lawrence

ungelesen,
25.03.2015, 14:35:5725.03.15
an François Gallard, GGobi users
I am using libxml2 2.9.2, and had a similar issue with invalid record counts on that file. Di said that things were working for her, so perhaps if she's still reading this thread, she could tell us her version.

Note that libxml2 is a fairly mature library. Version 2.9.1 was released two years ago.

Michael

François Gallard

ungelesen,
26.03.2015, 04:40:3226.03.15
an Michael Lawrence, GGobi users
There was still a fair amount of bugs in the 2.9.1 version ...
See release notes of libxml2  2.9.2, released on october 2014 : http://www.xmlsoft.org/news.html

François Gallard

ungelesen,
26.03.2015, 12:44:0526.03.15
an gg...@googlegroups.com, law...@gmail.com
I compiled libxml 2.9.2, 2.8.0 and 2.7.7 and tried to compile ggobi with them, it gave the same error for all of these versions.
Maybe there is an issue on the ggobi side. I will investigate in this direction.

François


On Thursday, March 26, 2015 at 9:40:32 AM UTC+1, François Gallard wrote:
There was still a fair amount of bugs in the 2.9.1 version ...
See release notes of libxml2  2.9.2, released on october 2014 : http://www.xmlsoft.org/news.html
2015-03-25 19:35 GMT+01:00 Michael Lawrence <>:
I am using libxml2 2.9.2, and had a similar issue with invalid record counts on that file. Di said that things were working for her, so perhaps if she's still reading this thread, she could tell us her version.

Note that libxml2 is a fairly mature library. Version 2.9.1 was released two years ago.

Michael
To unsubscribe from this group and stop receiving emails from it, send an email to ggobi+unsubscribe@googlegroups.com.

François Gallard

ungelesen,
31.03.2015, 11:30:4131.03.15
an gg...@googlegroups.com, law...@gmail.com
I think we have found the origin of the issue.
In the read_xml.c file, when the xml parser calls the Character callback function twice for a single record, the second time, the skipWhiteSpace function is called again line 858.
this occurs rarely and in an unpredictible way as declared by libxml2 documentation.

If the first block ends with the exponent of a number and the second block starts with the whitespaces or the tab or \n separators, then the skipWhiteSpace function clears them which causes the issue !

I am not sure of the patch but I did these modifications, and both my file and the "foo.xml" file are read properly :

read_xml.c
858,863c858
<   if( data->recordStringLength == 0){
<     c = (const xmlChar *) skipWhiteSpace (ch, &dlen);
<   }else{
<     c =ch;
<   }
<
---
>   c = (const xmlChar *) skipWhiteSpace (ch, &dlen);



In  addition, I think there is a potential bug line 922 in the cumulateRecordData function, since it is expressly said in the g_realloc function doc that the pointer may be moved.
Then the begining of the string should be copied as well, as I understand it. So I added something like this :

xmlChar* rec_copy=data->recordString;
data->recordString = (xmlChar *) g_malloc((data->recordStringLength+len +1) * sizeof (xmlChar));
memcpy (data->recordString , rec_copy, data->recordStringLength * sizeof (xmlChar));
memcpy (data->recordString +data->recordStringLength , ch,len * sizeof (xmlChar));
g_free(rec_copy);

But this last modification seems not mandatory on my test cases, which confuses me since the pointer is sometimes moved.

What do you think about this guys?

François

Michael Lawrence

ungelesen,
31.03.2015, 14:45:1831.03.15
an François Gallard, GGobi users
On Tue, Mar 31, 2015 at 8:30 AM, François Gallard <gall...@gmail.com> wrote:
I think we have found the origin of the issue.
In the read_xml.c file, when the xml parser calls the Character callback function twice for a single record, the second time, the skipWhiteSpace function is called again line 858.
this occurs rarely and in an unpredictible way as declared by libxml2 documentation.

If the first block ends with the exponent of a number and the second block starts with the whitespaces or the tab or \n separators, then the skipWhiteSpace function clears them which causes the issue !

I am not sure of the patch but I did these modifications, and both my file and the "foo.xml" file are read properly :

read_xml.c
858,863c858
<   if( data->recordStringLength == 0){
<     c = (const xmlChar *) skipWhiteSpace (ch, &dlen);
<   }else{
<     c =ch;
<   }
<
---
>   c = (const xmlChar *) skipWhiteSpace (ch, &dlen);


Seems like a reasonable solution. Thanks for spending the time looking into this issue! I  committed it to the ggobi-2.1.4 branch in github.



In  addition, I think there is a potential bug line 922 in the cumulateRecordData function, since it is expressly said in the g_realloc function doc that the pointer may be moved.
Then the begining of the string should be copied as well, as I understand it.

No, the whole point of realloc is that it copies the original data, so there is no need to memcpy anything. We already replace the pointer with the return value of g_realloc, so I don't see a problem here.

François Gallard

ungelesen,
01.04.2015, 04:03:4901.04.15
an gg...@googlegroups.com, gall...@gmail.com
Thanks to you for sharing this excellent software, it is a pleasure to contribute, and thanks for your reactivity.

François
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten