Running Lucene on Local Disk/Nfs on alluxio

133 views
Skip to first unread message

Saurabh Sharma

unread,
Nov 21, 2017, 11:13:10 AM11/21/17
to Alluxio Developers
Hi Team,

We are currently migrating a data science project from Web services to big data space. Because of underlying architecture of the project we want to use our lucene indexes as it is rather than using elastic search/solr from spark in hadoop(As we want to develop one code base to deal with both Big Data and Web services). I was thinking of using alluxio as our underlying storage layer which can provide abstraction over storage space and provides access to Lucene Indexes on both Hdfs and Local disk (For web services). That way the wrapper code can remain more or less similar with access to indexes information.

Now for HDFS things work well. We are able to tranparently plugin Alluxio without change of HDFS code.

But for Java native file system , i see that the java file client just gives access to a file rather than directory.

Is there a way that Alluxio file client can give access to a directory location when called from a java code like this?

MMapDirectory directory = new MMapDirectory(Paths.get(indexFolder));

Bin Fan

unread,
Nov 21, 2017, 11:23:13 AM11/21/17
to Saurabh Sharma, Alluxio Developers
"But for Java native file system , i see that the java file client just gives access to a file rather than directory."

Can you elaborate here? What do you mean by give access to a file, and what do you need for a directory?
Just FYI, this is the alluxio filesystem API
methods like listStatus can list all subdirs inside the given dir.
is this what you want?


--
You received this message because you are subscribed to the Google Groups "Alluxio Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
- Bin Fan

Software Engineer
Alluxio Inc

Saurabh Sharma

unread,
Nov 21, 2017, 12:22:51 PM11/21/17
to Alluxio Developers
Lucene by default works with Directories or Java.nio.path abstraction. It read segments file and other files using path abstraction.

I guess i am looking for a java.nio.path complaint wrapper in the alluxio the way HDFS is wrapped by the system.

I was able to use alluxio as a dropin replacement for hdfs without changing one line of code here

HdfsDirectory mergedIndex =  new HdfsDirectory(new org.apache.hadoop.fs.Path(indexFolder), conf);
IndexReader reader = DirectoryReader.open(mergedIndex);

Here indexfolder was a alluxio://serverlink:port url

The same is not working with local disk. 
MMapDirectory directory = new MMapDirectory(Paths.get(indexFolder));
IndexReader reader = DirectoryReader.open(directory);

I can theoretically write my own wrapper using base alluxio filesystem (which can mimic java.nio.path functionality but wanted to see if we already have a wrapper for this.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-dev...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Bin Fan

unread,
Nov 21, 2017, 12:49:05 PM11/21/17
to Saurabh Sharma, Alluxio Developers
Hi Saurabh,

I am not sure I fully understand your setup. So here is the part I understand and questions/suggestions:

- You have a HDFS cluster and a local hardisk so you want your Lucent server to be able to talk to both HDFS and the local disk through Alluxio interface, right?
- Q: can you mount local disk into Alluxio, and use just the HDFS-API to access Alluxio? See alluxio mount cli and unified name space
- Q: Alluxio current provides native Java FS client and HDFS-API compatible client, were you simply asking for a Java NIO compatible API for Alluxio client?

- Bin

To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Saurabh Sharma

unread,
Nov 22, 2017, 10:05:35 AM11/22/17
to Alluxio Developers
Hi Bin,

1- The setup is like this. What we are trying to do create a library which can be used in both Spark Ecosystem and Web service realm.So my thought process use alluxio as basic storage unit for both. In case of web service we will use nfs/local disk as disk backup and in case spark will use hdfs. Now we create Lucene index readers over alluxio based on storage unit and do our work. So for HDFS it works well but for Local disk since lucene java fs is not compatible with java.nio.path i am getting errors. So use case is to use local disk as lucene index backup ,which needs java.nio.path api to work out of the box.

2-  I will try that. But will it more performant than hdfs api on hdfs. When we are doing searches on hdfs its giving ok performance(1 second per call). That is okay for batch api but not all ok for web service. I was thinking local disk is faster(It will support random access) ,it will be better.

3- Yes i am asking for Java NIO compatible API for Alluxio client.

Regards,
Saurabh Sharma
Reply all
Reply to author
Forward
0 new messages