> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ TTransport transport = new TFileTransport(new TStandardFile(file),true);
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵtransport.open();
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵTProtocol protocol = new TBinaryProtocol(transport);
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵwhile (true) {
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ StreamItem doc = new StreamItem();
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ doc.read(protocol);
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ p.process(doc);
> ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ ᅵ }
The issue here is simply buffering. The following works:
static public void parse(String filename) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(filename);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
TTransport transport = new TIOStreamTransport(bufferedInputStream);
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);
while (true) {
StreamItem doc = new StreamItem();
try {
doc.read(protocol);
} catch (TTransportException e) {
if (e.getType() == TTransportException.END_OF_FILE)
{
break;
}
}
System.out.println( "stream_id: " + doc.stream_id );
}
}
If anyone would like a complete maven project that pulls down all the
dependencies, please contact me off list.
Related point of clarification: there are TWO kinds of things called
'thrift'.
1) The thrift compiler takes a text file containing thrift struct
definitions and creates a set of custom client classes for that specific
set of structs.
For example, the KBA structs are defined in this text file:
http://trec-kba.org/schemas/v1.0/kba.thrift
and you can construct the corresponding client classes for java or python
by running:
thrift -r --gen py kba.thrift
thrift -r --gen java kba.thrift
2) The other thing called 'thrift' is the framework library available in
each language. You need to import generic thrift components from this
library in order to use the custom client classes that were generated by
the thrift compiler. For example:
In python, you import framework components from the 'thrift' module, which
is different from the 'thrift' compiler on the command line:
# import thrift framework components
from thrift import Thrift
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
# import the class generated by the thrift compiler from kba.thrift
from kba_thrift.ttypes import StreamItem
In Java it is a bit more verbose and more clear that org.apache.thrift is
not the command-line 'thrift' compiler.
// import thrift framework components
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TIOStreamTransport;
import org.apache.thrift.transport.TTransportException;
// you probably also want:
import java.io.FileInputStream;
import java.io.BufferedInputStream;
// import the class generated by the thrift compiler from kba.thrift
import kba.StreamItem;
If you are wondering how to get thrift, the most common way is to
download it from
http://thrift.apache.org and follow these steps:
http://wiki.apache.org/thrift/ThriftInstallation
(Note that ./bootstrap.sh is not present in the distribution, even though
it is listed on that web page.)
If you are only using python, a short cut is to do this:
wget
http://pypi.python.org/packages/source/t/thrift/thrift-0.8.0.tar.gz
tar xzf thrift-0.8.0.tar.gz
cd thrift-0.8.0
python setup.py build
...
cp -r build/lib.linux-i686-2.6/thrift/ ../your-working-directory/thrift
This constructs the 'thrift' python module, which you can treat as a
locally importable package instead of installing it system-wide, which is
useful if you do not have root on your system.
Don't hesitate to reach out --- we'll help you get up and running.
jrf