Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

xmlread then xmlwrite creates extra white space

682 views
Skip to first unread message

Kevin Dufendach

unread,
Feb 27, 2009, 11:57:24 PM2/27/09
to
Using xmlread and then xmlwrite on a simple xml file results in a new
file with extra carriage returns/line breaks/white space. Does anyone
know how to suppress this? It seems the answer may have something to
do with the xml:space attribute (per http://www.w3.org/TR/xml/#sec-white-space),
but I don't see this in my docNode object.

Example to reproduce:

docNode = xmlread('FileName.xml')
xmlwrite('FileName.xml', docNode)

Kevin Dufendach

unread,
Feb 28, 2009, 1:26:48 PM2/28/09
to
Example to clarify:

-----------------------------------
% Set tempFileName
tempFileName = [tempname, '.xml'];

% Create XML document
docNode = com.mathworks.xml.XMLUtils.createDocument('root_element');
docRootNode = docNode.getDocumentElement;
myNode = docNode.createElement('myNode');
docRootNode.appendChild(myNode);
xmlwrite(tempFileName, docNode); edit(tempFileName);

%re-load tempFileName, then re-save it.
docNode = xmlread(tempFileName);
xmlwrite(tempFileName, docNode); edit(tempFileName);
-----------------------------------

First, the xml file produced is:
-----------------------------------
<?xml version="1.0" encoding="utf-8"?>
<root_element>
<myNode/>
</root_element>
-----------------------------------

And then, when it is read back in and then written, the xml produced
is:
-----------------------------------
<?xml version="1.0" encoding="utf-8"?>
<root_element>

<myNode/>

</root_element>
-----------------------------------

Thanks for the help,
Kevin

Some One

unread,
Jul 16, 2009, 11:05:20 AM7/16/09
to
Kevin Dufendach <krd.p...@gmail.com> wrote in message <31bb4ddc-33da-404a...@k9g2000prh.googlegroups.com>...

Hello Kevin,
I have also encountered this issue, and wasn't able to find a fix, or a reason why this happens.
However, I did manage to gather from several other posts a workaround that does the trick for me, it includes writing the document object to a string and then to file, with an intermediate step of fixing an encoding issue,
I summarized the steps in a function you can use which replaces the standard matlab xmlwrite function.
Hope this helps (if still relevant...:))

function ST = myXMLwrite(fileName, docNode)
docNodeRoot=docNode.getDocumentElement;
str=strrep(char(docNode.saveXML(docNodeRoot)), 'encoding="UTF-16"', 'encoding="UTF-8"');
fid=fopen(fileName, 'w');
fwrite(fid, str);
ST = fclose(fid);

Save the above function to myXMLwrite.m
then use as follows:
docNode=xmlread('somefile.xml);
%%%%%%%%%
% Do XML stuff %
%%%%%%%%%
myXMLwrite('somefile.xml', docNode);

Alex

unread,
Mar 18, 2011, 6:53:04 PM3/18/11
to
Kevin Dufendach <krd.p...@gmail.com> wrote in message <9abe43ee-73cc-4294...@j12g2000vbl.googlegroups.com>...

Searched and Searched to no avail. This is what I currently use:


fileNode = xmlread(xmlFileName);
rootNode = cleanXML(fileNode.getDocumentElement);


function node = cleanXML(node)
% removes the whitespace nodes from matlab xmlread
for x = 1:node.getLength
child=node.item(x-1);
if(isa(child,'org.apache.xerces.dom.DeferredTextImpl'))
node.removeChild(child);
% element removed, length is no longer valid -- recurse
cleanXML(node);
break;
else
cleanXML(child);
end
end

Alex

unread,
Mar 18, 2011, 6:55:05 PM3/18/11
to

rootnode = cleanXML(docNode.getDocumentElement)
xmlwrite('FileName.xml', docNode)

Witold

unread,
Apr 12, 2011, 9:55:08 AM4/12/11
to

Added a new function that operates on the output string:

function XMLstring = xmlwrite_spec(varargin)

% treat input
if nargin==1
writeFlag = false;
FileName = '';
DOMnode = varargin{1};
elseif nargin==2
writeFlag = true;
FileName = varargin{1};
DOMnode = varargin{2};
else
error('Wrong number of arguments')
end

% treat output of xmlwrite
XMLstring = xmlwrite(DOMnode);
while 1
length_Old = length(XMLstring);
XMLstring = strrep(XMLstring,[char(32) char(10)],char(10));
if length_Old==length(XMLstring)
break
end
end
XMLstring = strrep(XMLstring,[char(10) char(10)],char(10));

% if demanded, write result to file
if writeFlag
FID = fopen(FileName,'w');
fwrite(FID,XMLstring);
fclose(FID);
end

Adam

unread,
Jun 6, 2011, 12:02:04 PM6/6/11
to
Hi Alex,
Your code almost works as advertised, however I found that
if(isa(child,'org.apache.xerces.dom.DeferredTextImpl'))
node.removeChild(child);
will also remove any elements with actual text nodes.
eg <test>this is a test </test>
will be modified to be
<test/>
after running through cleanXML

I can't think of any other way to identify the pesky whitespace.
They don't seem to be consistent length or have other uniquely identifiable properties.

Any ideas?
Adam

"Alex " <kimm...@uregina.ca> wrote in message <im0ns9$dlk$1...@fred.mathworks.com>...

David

unread,
Dec 7, 2011, 1:27:08 AM12/7/11
to
This works I believe:

docNode = xmlread('FileName.xml');
docStr = xmlwrite(docNode);
docStr = regexprep(docStr,'\x0A\x0A','\n');
openDoc = fopen(''FileName.xml','w');
fprintf(openDoc,'%s\n',docStr);
fclose(openDoc);

Kevin Moerman

unread,
Jan 31, 2012, 7:46:10 AM1/31/12
to
There seem to be not just extra lines but also tabs. This removed them for me:

XML_string = xmlwrite(XDOC); %XML as string
XML_string = regexprep(S,'\n\t*\n','\n'); %removes tabs and extra lines

%Write to file
fid = fopen(save_name,'w');
fprintf(fid,'%s\n',XML_string);
fclose(fid);


Kevin

Kevin Moerman

unread,
Jan 31, 2012, 7:49:09 AM1/31/12
to

Kevin Moerman

unread,
Jan 31, 2012, 8:11:09 AM1/31/12
to

function write_XML_no_extra_lines(save_name,XDOC)

XML_string = xmlwrite(XDOC); %XML as string
XML_string = regexprep(XML_string,'\n[ \t\n]*\n','\n'); %removes extra tabs, spaces and extra lines
0 new messages