indexing dublin core records

22 views
Skip to first unread message

Alexandre Rademaker

unread,
Dec 5, 2010, 5:36:19 PM12/5/10
to Kochief

Is it possible to customize the solr schema and the fields used to
faced search? My data is in dublin core format, actually in RDF. What
files should I have to look? How hard will it be?

Cheers,
Alexandre


Gabriel Farrell

unread,
Dec 5, 2010, 5:44:12 PM12/5/10
to koc...@googlegroups.com
For parsing the files for ingest, look in discovery/parsers/ for
examples. If you are indexing fields not already in
solr/conf/schema.xml, you'll have to add them there. It may take a bit
of work to write a parser, depending on how comfortable you are in
Python, but that's most of it.

> --
> You received this message because you are subscribed to the Google
> Groups "Kochief" group.
> To post to this group, send email to koc...@googlegroups.com
> To unsubscribe from this group, send email to
> kochief+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/kochief

Alexandre Rademaker

unread,
Dec 5, 2010, 6:03:26 PM12/5/10
to Kochief

Cool, thanks for your reply. But what about the interface? How can I
changed the fields used to faced search?

Cheers,
Alexandre

Gabriel Farrell

unread,
Dec 5, 2010, 8:12:23 PM12/5/10
to koc...@googlegroups.com
The facet fields displayed can be modified in settings.py.

Alexandre Rademaker

unread,
Dec 6, 2010, 3:29:19 PM12/6/10
to Kochief

Hello Gabriel,

Once more time, thank you very much for your attention.

May I make one more question? In my case, I must show more information
about authors, institutions and journals. That is, I will have in the
solr index documents not only about publications but also about
person, institution or journals. My idea is that every document will
have a type and fields. One of the facet fields will be the type...
What you think?

Cheers,
Alexandre


On Dec 5, 11:12 pm, Gabriel Farrell <gsf...@gmail.com> wrote:
> The facet fields displayed can be modified in settings.py.
>
> On Sun, Dec 5, 2010 at 6:03 PM, Alexandre Rademaker
>

Gabriel Farrell

unread,
Dec 6, 2010, 4:49:39 PM12/6/10
to koc...@googlegroups.com
Sounds fine. Solr handles heterogeneous data pretty well. The trick is
to balance specific data models against the generality of universal
fields. You won't have much to facet with if there are few shared
fields across document types.

Alexandre Rademaker

unread,
Dec 15, 2010, 2:36:49 PM12/15/10
to koc...@googlegroups.com

Hello Gabriel,

I am trying to insert data directly in solr using curl and the http interface. The document that I sent was:

<add>
<doc>
  <field name="id">6732277923509253#P90</field>
  <field name="full_title">
    A Dinâmica Monetária da Hiperinflação: Cagan Revisitado</field>
  <field name="year">
    1997</field>
  <field name="collection">
    Revista de Economia Contemporânea</field>
  <field name="collection">CV Lattes Fulano</field>
  <field name="author">
    Fernando de Holanda Barbosa</field>
  <field name="format">Article</field>
  <field name="title">
    A Dinâmica Monetária da Hiperinflação: Cagan Revisitado</field>
</doc>
</add>

But the Kochief interface does not understand the document very well...Can you help me? Please see the attached screenshots.

Cheers,
Alexandre


Alexandre Rademaker
http://web.me.com/arademaker/
FirefoxScreenSnapz002.png
FirefoxScreenSnapz001.png

Gabriel Farrell

unread,
Dec 15, 2010, 3:15:58 PM12/15/10
to koc...@googlegroups.com
I'm not sure if it's due to your use of curl, but the ID for the
record you created is "\n
http://www.fgv.br/lattes/6732277923509253#P90". The "\n" is a newline,
and Django is trying to create a URL with it. That's not going to
work.


On Wed, Dec 15, 2010 at 2:36 PM, Alexandre Rademaker

Alexandre Rademaker

unread,
Dec 24, 2010, 1:27:10 PM12/24/10
to koc...@googlegroups.com
Thanks Gabriel,

You were right! I removed the "\n" and the character "#" from the id and it worked! I was thinking that the problem was caused by the fact that I inserted the document directly in solr using http, without using the ingest command. 

BTW, I was looking the settings.py and I found in line 36:

DATABASE_ENGINE = 'sqlite3'

Do you keep any data in this database? 

Cheers,

Alexandre Rademaker
http://web.me.com/arademaker/

Gabriel Farrell

unread,
Dec 26, 2010, 10:37:57 AM12/26/10
to koc...@googlegroups.com
Only the session data that Django keeps by default.
Reply all
Reply to author
Forward
0 new messages