Re: [RavenDB] Possibility to do fulltext index without saving the data

66 views
Skip to first unread message

Oren Eini (Ayende Rahien)

unread,
Aug 29, 2012, 6:03:02 AM8/29/12
to rav...@googlegroups.com
Are you storing the actual PDF inside RavenDB?
Or just the text as a json doc?

On Wed, Aug 29, 2012 at 12:57 PM, Tinuz <tinuz...@gmail.com> wrote:
Hi All,

I currently exploring ravendb to see if its fits my needs. I am currently working on a project where I need to index a large amount of docs,xls,pdf etc. I want my users to not only query on the content of a document but also on its metadata. There for I use Tika to extract the metadata and add that information to my ravendb document. So I can ask raven to give me the top 10 most active authors for example. My test app seems to work fine, but I noticed that my database is kinda big. Which could be explained because I store the actual text (which I extracted from the document with Tika also) of the document as a property. So I can build a fulltext index. But there is no need for me to store this data inside my database, but I do want to offer my users the ability to do a fulltext search over the documents inside my document store. So what I want is index the content of the pdf for example but not save it inside the document store, is this possible?? if Yes, how can I achieve this, if No what the best alternative?

Tnx in advanced!

Tinuz

unread,
Aug 29, 2012, 7:54:13 AM8/29/12
to rav...@googlegroups.com
I save the pdf as a attachment inside the document store and the text as a string value inside a document.

Op woensdag 29 augustus 2012 12:03:26 UTC+2 schreef Oren Eini het volgende:

Oren Eini (Ayende Rahien)

unread,
Aug 29, 2012, 7:55:22 AM8/29/12
to rav...@googlegroups.com
You can skip saving the attachments.

Tinuz

unread,
Aug 29, 2012, 8:03:55 AM8/29/12
to rav...@googlegroups.com
Well I do need the actual bytes (attachments), the users may need to view the documents within a MS Word, Excel or Acrobat Reader. So just storing the actual text isn't enough to view the document in its native format.
So correct me if i am wrong there is no way to only index the text without assigning it as a property to a document?

Op woensdag 29 augustus 2012 13:55:45 UTC+2 schreef Oren Eini het volgende:

Oren Eini (Ayende Rahien)

unread,
Aug 29, 2012, 8:19:08 AM8/29/12
to rav...@googlegroups.com
No, there isn't
Reply all
Reply to author
Forward
0 new messages