How to Index Rich Document (.pdf , .doc) file in solrnet

1,422 views
Skip to first unread message

Dhaval950

unread,
Apr 9, 2011, 11:04:09 AM4/9/11
to SolrNet
Hi all,

How to Index Rich Document (.pdf , .doc) file in solrnet.
Please give me the sample code

And other thing is what is the Field type for rich document in
Schema.xml

If you have any sample code which is index rich document using solrnet
then please attached with reply

Thanks,

Mauricio Scheffer

unread,
Apr 9, 2011, 11:24:41 AM4/9/11
to sol...@googlegroups.com
Documentation on configuring and using Solr's ExtractingRequestHandler: http://wiki.apache.org/solr/ExtractingRequestHandler
SolrNet support for this is not yet released (you have to get the source code from github and compile), and currently undocumented. Right now there's only one integration test demoing this feature: https://github.com/mausch/SolrNet/blob/master/SolrNet.Tests/Integration.Sample/Tests.cs#L439

--
Mauricio




--
You received this message because you are subscribed to the Google Groups "SolrNet" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solrnet+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrnet?hl=en.


Vignesh Raj

unread,
Apr 11, 2011, 2:34:52 AM4/11/11
to sol...@googlegroups.com, mauricio...@gmail.com

Hi,

What if I have to upload files along with some information like description, class type, client name, version of document etc.

Please be aware that these info are not the metadata that tika can extract. In that case what is the method that you suggest to index documents automatically along with the additional info I just said?

Also is there a way I can save the entire file in Solr like a database holding the file as a BLOB object?

 

Regards

Vignesh

Dhaval950

unread,
Apr 11, 2011, 7:04:24 AM4/11/11
to SolrNet
Hi Mauricio,

I have download solrnet source code from given link (<https://
github.com/mausch/SolrNet>) and compile it. then after i have use
updated solrnet.dll file in my project, and use this code to extract
particular file but it's give me the exception on solr.Extract
statement.

using (var file = File.OpenRead(@"F:\DOTNET_RESUME.doc"))
{
var response = solr.Extract(new
ExtractParameters(file, "abcd")
{
ExtractOnly = true,
ExtractFormat = ExtractFormat.Text,
});
Console.WriteLine(response.Content);
}

Please help me

exception ::

SolrNet.Exceptions.SolrConnectionException was unhandled
Message=<html><head><title>Apache Tomcat/6.0.32 - Error report</
title><style><!--H1 {font-family:Tahoma,Arial,sans-
serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-
family:Tahoma,Arial,sans-serif;color:white;background-
color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-
serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-
family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-
color:#525D76;} P {font-family:Tahoma,Arial,sans-
serif;background:white;color:black;font-size:12px;}A {color : black;}
A.name {color : black;}HR {color : #525D76;}--></style> </
head><body><h1>HTTP Status 500 - lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:
338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:
864)
at org.apache.coyote.http11.Http11AprProtocol
$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:
1665)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
375)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:
449)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
... 16 more
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
359)
... 19 more
</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</
p><p><b>message</b> <u>lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:
338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:
864)
at org.apache.coyote.http11.Http11AprProtocol
$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:
1665)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
375)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:
449)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
... 16 more
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
359)
... 19 more
</u></p><p><b>description</b> <u>The server encountered an internal
error (lazy loading error

org.apache.solr.common.SolrException: lazy loading error
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:
338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:
206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:
864)
at org.apache.coyote.http11.Http11AprProtocol
$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:
1665)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
375)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:
449)
at org.apache.solr.core.RequestHandlers
$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
... 16 more
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
359)
... 19 more
) that prevented it from fulfilling this request.</u></p><HR size="1"
noshade="noshade"><h3>Apache Tomcat/6.0.32</h3></body></html>
Source=SolrNet
StackTrace:
at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl,
String contentType, Stream content, IEnumerable`1 parameters)
at SolrNet.Commands.ExtractCommand.Execute(ISolrConnection
connection)
at SolrNet.Impl.SolrBasicServer`1.Send(ISolrCommand cmd)
at
SolrNet.Impl.SolrBasicServer`1.SendAndParseExtract(ISolrCommand cmd)
at SolrNet.Impl.SolrBasicServer`1.Extract(ExtractParameters
parameters)
at SolrNet.Impl.SolrServer`1.Extract(ExtractParameters
parameters)
at SolrNetSample.Program.Main(String[] args) in \\K9SERVER
\Shared\Dhaval\Solr\SOLR SOFTWARE\SolrNetSample\SolrNetSample
\SolrNetSample\Program.cs:line 32
at System.AppDomain._nExecuteAssembly(Assembly assembly,
String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile,
Evidence assemblySecurity, String[] args)
at
Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object
state)
at System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: System.Net.WebException
Message=The remote server returned an error: (500) Internal
Server Error.
Source=System
StackTrace:
at System.Net.HttpWebRequest.GetResponse()
at
HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse()
at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest
request)
at SolrNet.Impl.SolrConnection.PostStream(String
relativeUrl, String contentType, Stream content, IEnumerable`1
parameters)
InnerException:


On Apr 9, 8:24 pm, Mauricio Scheffer <mauricioschef...@gmail.com>
wrote:
> Documentation on configuring and using Solr's ExtractingRequestHandler:http://wiki.apache.org/solr/ExtractingRequestHandler
> <http://wiki.apache.org/solr/ExtractingRequestHandler>SolrNet support for
> this is not yet released (you have to get the source code from
> github<https://github.com/mausch/SolrNet>and compile), and currently
> undocumented. Right now there's only one
> integration test demoing this feature:https://github.com/mausch/SolrNet/blob/master/SolrNet.Tests/Integrati...
>
> --
> Mauricio

Mauricio Scheffer

unread,
Apr 11, 2011, 8:17:21 AM4/11/11
to sol...@googlegroups.com
See ExtractParameters.Fields to add extra metadata.
About adding the document as a blob, you could always convert to/from base64 and store that in a string field. Whether that's advisable or not is another question. I recommend asking on the solr-user mailing list.

--
Mauricio

Mauricio Scheffer

unread,
Apr 11, 2011, 8:19:09 AM4/11/11
to sol...@googlegroups.com
java.lang.ClassNotFoundException:

This indicates a misconfiguration of Solr. Probably the Tika libraries are misplaced or misconfigured. See the docs or ask on the solr-user mailing list about this.

--
Mauricio

Dhaval950

unread,
Apr 14, 2011, 8:51:40 AM4/14/11
to SolrNet

Hi Mauricio,

I have upgrad the solr 1.4 to solr 3.1 and tika 0.9
now also give me the same error

please help me to configure tika 0.9 with solr 3.1
i have already follow the solr wiki but i don't understand how to
configure.

Thanks,
On Apr 11, 5:19 pm, Mauricio Scheffer <mauricioschef...@gmail.com>
wrote:
> > java.lang.ClassNotFoundException:
>
> This indicates a misconfiguration of Solr. Probably the Tika libraries are
> misplaced or misconfigured. See the docs or ask on the solr-user mailing
> list about this.
>
> --
> Mauricio
>
> ...
>
> read more »

Dhaval950

unread,
Apr 14, 2011, 8:56:39 AM4/14/11
to SolrNet

Hi Mauricio,

I have upgrad the solr 1.4 to solr 3.1 and tika 0.9
now also give me the same error

please help me to configure tika 0.9 with solr 3.1
i have already follow the solr wiki but i don't understand how to
configure.

Thanks,
On Apr 11, 5:19 pm, Mauricio Scheffer <mauricioschef...@gmail.com>
wrote:
> > java.lang.ClassNotFoundException:
>
> This indicates a misconfiguration of Solr. Probably the Tika libraries are
> misplaced or misconfigured. See the docs or ask on the solr-user mailing
> list about this.
>
> --
> Mauricio
>
> ...
>
> read more »

Mauricio Scheffer

unread,
Apr 14, 2011, 9:36:29 AM4/14/11
to sol...@googlegroups.com
Please post all questions about the Solr server to the solr-user mailing list. The mailing list is for the SolrNet client.
Thanks...

--
Mauricio




--
Reply all
Reply to author
Forward
0 new messages