Problem of performance with setTextContent

92 views
Skip to first unread message

Francisco

unread,
Jun 26, 2013, 4:34:26 AM6/26/13
to fox-d...@googlegroups.com
Hi everyone,

I am using the FoX library to read/write data from/to an Microsoft Excel XML file. In my program, I want to write several XML files, so this is the workflow of my code:

1 - parse the xml file to get a docXML in memory
2 - extract data from the docXML with the function extractDataContent
3 - modify the docXML with the function setTextContent
4 - serialize the docXML into a file

I reduced my code to a little example which is attached. The steps 3 and 4 are repeated in a loop and at that time is when I found the problem: the first iteration takes around 150 seconds and the second interation takes only 1 second!!!

It seems there is a problem of performance with the function setTextContent. If I replace this function with getTextContent (I know it is not the same, but just to see the performance), both iterations are really fast!

Could anyone give me advice on this? Does it exist another function to replace setTextContent in order to do the same thing or similar?

Best regards,
Francisco

PS: You are free to use this functions in your programs.

source4fox.rar

Francisco

unread,
Jun 27, 2013, 8:17:20 AM6/27/13
to fox-d...@googlegroups.com
I forgot to say that it happens on Windows + gfortran 4.8.0

Francisco

unread,
Jul 2, 2013, 2:14:49 PM7/2/13
to fox-d...@googlegroups.com
Hi everyone,

I was researching about this problem and I got several conclusions I would like to share. I have experienced the problem described on Windows 7 64 bits version but not in Windows XP 32 bits version. Depending on the number of rows of my Microsoft Excel XML file, the program could even crash.

- I have tried to compile and execute on Windows XP 32 bits, and it works fine.
- I have tried to compile on Windows 7 64 bits and to execute on Windows XP 32 bits, and it works fine.
- I have tried to compile on Windows XP 32 bits and to execute on Windows 7 64 bits, and it doesn't work fine.
- I have tried to compile and execute on Windows 7 64 bits, and it doesn't work fine.
- I have tried to compile on Windows 7 64 bits and execute on Windows 7 64 bits with Windows XP compability option, and it works fine.

So, my guess it is that there is some pointer which expects a 32 bit memory address and receives a 64 bit memory address instead, but it is just a guess. In fact, when I do a 'make check' after compile the library on Windows 7 64 bits + MSYS, there are several tests failed in the wxml directory. I have even tried with the flag '-m32' in gfortran, but nothing changes.

Does anyone have experienced the same trouble?

Anna Kelbert

unread,
Jul 5, 2013, 9:40:51 PM7/5/13
to fox-d...@googlegroups.com
Dear everyone,

Sorry to clog up your mailboxes. I have written a pretty elaborate code
that uses FoX to read (FoX_dom: parseFile) and write XML files
(FoX_wxml: xml_OpenFile). Now, it turns out that I need to run these
programs through a Java web service that wants to have everything done
through STDIN/STDOUT. So that, instead of specifying a file name to
write (or read) I would say
./z2xml <my_weird_file_format >output_xml_text
and vice versa. Can this be done with FoX? How? I couldn't find any
guidance on that online. A command such as
call xml_OpenFile(fname, xmlfile, 6)
just sends my standard output to fname, and I couldn't find anything
other than
doc => parseFile(fname)
to initialize the parsing. Is this conceptually impossible to do through
standard input?

Thank you very much for any help!

Anna

Andrew Walker

unread,
Jul 7, 2013, 11:20:46 AM7/7/13
to fox-d...@googlegroups.com
Hi,

I've now had a chance to take a quick look at this. Working backwards:


I think the errors you see when doing "make check" with gfortran 4.8.0 are because new gfortrans will build executables such that they dump a backtrace if they abort (this had to be enabled with a command line switch for earlier versions). FoX will trigger an abort by default in response to things like trying to write a malformed XML document and this behaviour is included in the test suite. The presence of a backtrace is confusing the test harness check for correctness. You can work around this by adding -fno-backtrace to the FFLAGS line in arch.make after running configure and before running make or make check. We should probably make the configure script test for the presence of a backtrace in this case and turn it off (by adding this flag). That's a non-trivial little project in itself though. Could you check that adding -fno-backtrace removes the errors (and if it does not could you send the wxml/test/failed.out file).


I'm not sure what I would expect from building on different architectures, but I've not used a windows machine for almost a decade so I'm not the person to advise on windows specific issues. I wonder if the failure matrix is just because you have a different compiler on the different machine or something?


Regarding setTextContent being slow: I'm not 100% sure why the difference in speed between the first and second run is so large but I do have a much faster (but less flexible) way of achieving your aim. If you replace lines like this:

    call setTextContent(item(colList, 0),str(i-nheaders))

with lines like this:

    call setData(getFirstChild(item(colList, 0)),str(i-nheaders))

your test case runs for me in 4 seconds (compared to 200 without this change). The function setTextContent does two things, first it walks down the DOM from the node and removes any child nodes. Then it adds a text node child with the requested content. In FoX deleting a node is quite an expensive operation (lots of memory management goes on and this hasn't really had very much attention from the point of view of performance). The function setData just creates a string, makes the text node point at that string, and deallocates the old string. This is all pretty fast by comparison to searching out all the children. The disadvantage of this approach is that you need a pointer to the text node, not its parent and this adds complexity. I've assumed that the first child element will be the text node. One way this would go wrong would be if somebody stuck a XML comment in there somewhere. I don't know if that's something you need to care about.

I hope this helps.

Cheers,


Andrew




--
You received this message because you are subscribed to the Google Groups "FoX-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fox-discuss...@googlegroups.com.
To post to this group, send email to fox-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/fox-discuss.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Andrew Walker
School of Earth Sciences
University of Bristol

Andrew Walker

unread,
Jul 7, 2013, 11:37:44 AM7/7/13
to fox-d...@googlegroups.com
Hi Anna,

There are two parts to this, reading with the DOM and writing with wxml. Reading is easer.

Instead of using parseFile you should be able to use parseString to pass your XML data into the DOM parser. I've just added a simple test to the GitHub repository to show how this function can be used. To do this you would need to read the data from standard input into a (possibly very large) string and then when you think you have all the data pass the string into parseString. I think an alternative approach, where each line from standard input is pushed into a parser, could be made to work given the current design of the SAX and DOM bits in FoX but this would need a new subroutine in FoX_DOM (and modifications to the SAX API) to do the driving.

Output is more difficult. WXML only knows about writing to files and not to arbitrary unit numbers for already opened files (which is how standard input and output show up in Fortran). I see two options. You could write XML to a temporary file then read it back in a send each line out on standard output. Web-service security people probably wouldn't like this but I could imagine ways of making it relatively safe. Alternatively, I think it should be possible to create an alternative to xmlOpenFile that took a unit number (usually 6 for standard output I think) and returned a fully operating xmlf_t file handle. I think there are a few edge cases to think about but at the end of the day (most of) the rest of wxml just ends up writing to a unit number stored in this derived type so this should then just work with the XML document coming out on standard output. 

I hope this helps,

Andrew




Anna

--
You received this message because you are subscribed to the Google Groups "FoX-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fox-discuss+unsubscribe@googlegroups.com.

To post to this group, send email to fox-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/fox-discuss.
For more options, visit https://groups.google.com/groups/opt_out.


Ed Zaron

unread,
Jul 7, 2013, 8:56:04 PM7/7/13
to fox-d...@googlegroups.com
Hi Anna,

Fancy meeting you here, eh?

My non-FoX approach to solving this problem is to wrap the fortran
program in a script that handles pipes and stdin better. The script can
pipe the i/o into and out of the files as needed. I tend to use Tcl for
this sort of application. The syntax is not difficult, and it is more
than sufficient to parse some of the fortran output to handle exceptions
and stuff.

All the best,

Ed
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edward D. Zaron
Research Assistant Professor
Department of Civil and Environmental Engineering
Portland State University
Portland, OR 97207-0751
Phone: (503)-725-2435
FAX: (503)-725-5950
za...@cecs.pdx.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reply all
Reply to author
Forward
0 new messages