Where is source files indexed by Lucene Index?

141 views
Skip to first unread message

James Chen

unread,
Jul 17, 2015, 3:54:46 PM7/17/15
to luke-d...@googlegroups.com
Hello,

I am working on a web project regarding Web Content Management that uses Lucene Index to search articles. The search index folder already existed, which includes *.cfs, *.cfx, *.del, segments.gen etc files. What I am try to do is to find out where I can get source articles files and make minor changes to these article. 

Now I can open index folder with luke 4.0 and do find all info but article file location. Are the source article files located in the the same folder? *.cfs is Compact File Set file. Does it mean all articles are packed into this file? If so, how can I read it and pull out article one by one?  

Any input will be appreciated.

Dmitry Kan

unread,
Jul 20, 2015, 5:52:47 AM7/20/15
to luke-d...@googlegroups.com
I'd guess your articles are indexed and can be found in indexed / stored (depending on what settings you had when indexing them) form in the index directory.

Luke can help you change the indexed documents. What distribution are you using? What lucene version you have?

See latest releases here: https://github.com/DmitryKey/luke/releases

--
Otrzymujesz tę wiadomość, bo subskrybujesz grupę „Luke - Lucene Index Toolbox” w Grupach dyskusyjnych Google.
Aby anulować subskrypcję tej grupy i przestać otrzymywać od niej wiadomości, wyślij e-maila na luke-discuss...@googlegroups.com.
Więcej opcji znajdziesz na https://groups.google.com/d/optout.



--

James Chen

unread,
Jul 20, 2015, 10:25:46 AM7/20/15
to luke-d...@googlegroups.com
Hi Dmitry,

Thank you for your reply. Actually this is web site built with IBM WCM that creates index files. I just tried to edit source article directly. After combing the folder, I did find some big size binary files. maybe this are files storing all articles.

Thanks,
James

Dmitry Kan

unread,
Jul 21, 2015, 3:59:39 AM7/21/15
to luke-d...@googlegroups.com
Hi James,

What is the file extension of those?

James Chen

unread,
Jul 21, 2015, 9:17:47 AM7/21/15
to luke-d...@googlegroups.com
Hi Dmitry,

They are files like ICM_JCR_BV348977644298952710.tmp. But today I found the file size and number of this kind of files are changing dramatically with time. So they are not files that hold articles body. I may try different way for this project. Anyway, thank you for your input.  

Thanks,
James

Dmitry Kan

unread,
Jul 21, 2015, 12:22:04 PM7/21/15
to luke-d...@googlegroups.com
Hi James,

Sure, feel free to send your questions on luke anytime. Or file an issue directly to luke's github: https://github.com/DmitryKey/luke

Thanks,
Dmitry
Reply all
Reply to author
Forward
0 new messages