Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

extracting part of xml

0 views
Skip to first unread message

puzzlecracker

unread,
Feb 16, 2006, 12:03:34 AM2/16/06
to
let's say I have the following xml file
<info>
<item>
............
</item>

<item>
............
</item>

<item>
............
</item>

</info>

I want to extract each< item> in its entirity; thus, in above, I want
to create 3 files
each containing just
<item>
............
</item>

.I tried using xpath didnt help, not sure how to readof the actual
tags.

Thanks.

Jean-Francois Briere

unread,
Feb 16, 2006, 2:25:59 AM2/16/06
to
This is how to retrieve the nodes:

String xpathExpr = "/info/item";
String inputFilename = "yourFile.xml";
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource inputSource = new InputSource(inputFilename);
NodeList nodes = (NodeList)xpath.evaluate(xpathExpr, inputSource,
XPathConstants.NODESET);

Regards

Denis

unread,
Feb 16, 2006, 2:38:10 AM2/16/06
to
Have you try to use http://jaxen.org/ ?

DM

Jean-Paul

unread,
Feb 16, 2006, 2:40:56 AM2/16/06
to
- Download the JDOM library from http://www.jdom.org.

- Import the library in your project / favorite IDE

Given the following an XML file called items.xml with the following
contents:
<?xml version="1.0"?>
<info>
<item>hello</item>
<item>world</item>
<item>!</item>
</info>

We will be producing 3 files, each named item1.xml, item2.xml,
item3.xml with the following piece of code using the JDOM library:

import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
import java.io.*;
import java.util.*;

public class XMLItemManipulator {

private List<Element> items;

public XMLItemManipulator() {
items = null;
}

public void readItems(File xmlFile) throws FileNotFoundException,
IOException {

// make sure the file exists and can be read
if(!xmlFile.exists())
throw new FileNotFoundException("cannot find the xml file");

if(!xmlFile.canRead())
throw new IOException("file exists but does not have *read*
permission");

// now that we have made sure we got the file, just get the objects
// necessary to read it and create and XML doc outta if
SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
doc = builder.build(xmlFile);
} catch(JDOMException e) {
System.out.println("An error occured while build the XML Doc!");
e.printStackTrace();
}

// get the root element, in you case this would be <info>
Element root = doc.getRootElement();

// get the list of children of the root element
// which have the "item" tag.
// meaning that even if you had other tags that
// were children of the root, we really wouldn't care
// perfect for an heterogenous xml file containing more
// than the "item" elements
items = root.getChildren("item");
}


// now that you got the items you might want to manipulate them
// it depends on what you wanna do with them while they're in
// memory. I recommend you have a look at the JDOM doc for more info.
public void manipulateItems() {
// put some code here
}


// once you have manipulated them or since you got the items,
// you can now decide to write them separately to files.
// To do this, it's very simple.
public void writeItems() throws IOException, Exception {
Element root = null;
Document doc = null;
FileWriter writer = null;
XMLOutputter out = new XMLOutputter();
int size = items.size();

try {
for(int counter = 0; counter < size; counter++) {
root = new Element("item");
root.addContent(items.get(counter).cloneContent());

doc = new Document(root);
writer = new FileWriter(new File("item" + counter + ".xml"));
out.output(doc, writer);
out.output(doc, System.out);
}


} catch(IOException e) {
throw e; // put better handling of exception here
} catch(Exception e) {
throw e; // put better handling of exception here
} finally {
try {
writer.close();
} catch(Exception e) {
e.printStackTrace(); // imagine better handling here
}
}
}


// testing all of this with a main method (normally you'd write)
// a full test case to do this but that's your decision
public static void main(String[] args) {
XMLItemManipulator manip = new XMLItemManipulator();
File file = new File("items.xml");

try {
manip.readItems(file);
manip.manipulateItems(); // this is optional
manip.writeItems();

} catch(Exception e) {
e.printStackTrace();
}
}
}

There you go. Let us know how it goes.

Regards,

Jean-Paul H.

ab2...@gmail.com

unread,
Feb 16, 2006, 9:25:04 PM2/16/06
to

thanks
it didnt work.

Item is not the root tag but they are scatter of the doc...


<Info>

<Item>
..............
</Item>

etc
</Info>

suggest

Jean-Paul

unread,
Feb 17, 2006, 4:55:38 AM2/17/06
to
Even so, you should be able to modify the code to make it work. What
this code does is that it gives you the basics. From here and with the
documentation of the JDOM library, you should be able to get a solution
on our own. Also try to read on how to properly manipulate XML with
Java.

ab2...@gmail.com

unread,
Feb 19, 2006, 12:42:47 AM2/19/06
to
I am wary about using JDOM in a commercial software. Is it possibly to
acchive the same with standard tools that are part of 1.5?

Thanks.

James McGill

unread,
Feb 19, 2006, 2:12:09 AM2/19/06
to
On Sat, 2006-02-18 at 21:42 -0800, ab2...@gmail.com wrote:
> I am wary about using JDOM in a commercial software. Is it possibly to
> acchive the same with standard tools that are part of 1.5?

What are you trying to do (I missed the thread?)

The JDK has reference implementations of DOM and SAX, all in JAXP which
shares ancestry with Xerces. I prefer DOM4J but I can't give you an
intelligent rationale other than, "it's always worked well when I've
used it".

Since 1.5, it seems like it should be unnecessary to use anything
additional for xml processing, unless you need a particular
implementation for performance or compatability reasons. But I must
admit, I didn't see the original question and I might be being naive.

ab2...@gmail.com

unread,
Feb 22, 2006, 9:45:18 PM2/22/06
to
After I do the extraction, i save items to the file, However, when I
read them back, one item at the time (using sax parser- provided by
eclipse), rarely, but for some them I get the following exception.
Can someonw point out the problem? thanks

org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown
Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at
com.touchgraph.amazoncache.io.AmazonParser.parse(AmazonParser.java:33)
at
com.touchgraph.amazoncache.io.AmazonCacheReader.readCache(AmazonCacheReader.java:35)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getBooksFromCache(AmazonCacheStore.java:185)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.loadSimilarFromCache(AmazonCacheStore.java:131)
at
com.touchgraph.amazoncache.io.AmazonCacheStore.getSimilarBooks(AmazonCacheStore.java:44)
at
com.touchgraph.amazoncache.io.AmazonDataModel.addSimilarBooks(AmazonDataModel.java:70)
at
com.touchgraph.amazoncache.io.AmazonCacheFrame$1.actionPerformed(AmazonCacheFrame.java:85)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown
Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForHierarchy(Unknown
Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
null

Jean-Paul

unread,
Feb 25, 2006, 7:07:03 AM2/25/06
to
It seems that there is a problem on the way you're reading the files
back into your system. Can you show us some code?

puzzlecracker

unread,
Feb 25, 2006, 10:52:42 AM2/25/06
to

Jean-Paul wrote:
> It seems that there is a problem on the way you're reading the files
> back into your system. Can you show us some code?
I already solved it. I all I needed to do is to write files with a
different encoding.

thanks

0 new messages