document mapping?

326 views
Skip to first unread message

porjo38

unread,
Nov 29, 2016, 10:43:36 PM11/29/16
to bleve
Hello,

I'm starting out with Bleve, and would like to use it for searching email headers. I'm struggling to understand how document mapping works. Using default settings I get results, however I don't get results when I customise the mapping.

Here is a sample program showing my custom mapping:

    package main
   
   
import (
       
"bytes"
       
"fmt"
       
"log"
       
"net/mail"
   
       
"github.com/blevesearch/bleve"
       
"github.com/blevesearch/bleve/analysis/analyzer/keyword"
       
"github.com/blevesearch/bleve/mapping"
       
"github.com/blevesearch/bleve/search/query"
   
)
   
   
const email = `Subject: test email
    To: Bob Smith <b...@example.com>
    From: Jane Smith <ja...@example.com>
    Date: Tue, 29 Nov 2016 21:48:47 +1000
   
    This is a test email
    `

   
   
const searchQuery = "test"
   
    type bleveDoc
struct {
       
Type string
       
Data mail.Header
   
}
   
    func main
() {
       
var err error
       
var index bleve.Index
   
       
// the default mapping returns results, however my custom mapping does not
       
//mapping := bleve.NewIndexMapping()
        mapping
:= buildIndexMapping()
   
        index
, err = bleve.NewMemOnly(mapping)
       
if err != nil {
            log
.Fatal(err)
       
}
   
       
var msg *mail.Message
        msg
, err = mail.ReadMessage(bytes.NewReader([]byte(email)))
       
if err != nil {
            log
.Fatal(err)
       
}
   
        doc
:= bleveDoc{"header", msg.Header}
   
       
if err := index.Index("testdoc", doc); err != nil {
            log
.Fatal(err)
       
}
   
        fmt
.Printf("Searching with query '%s'\n", searchQuery)
   
        bSearchRequest
:= bleve.NewSearchRequest(query.NewQueryStringQuery(searchQuery))
        searchResult
, err := index.Search(bSearchRequest)
       
if err != nil {
            log
.Fatal(err)
       
}
   
        fmt
.Println("Hits:")
       
for _, hit := range searchResult.Hits {
            fmt
.Printf("ID %s, Locations %v, Fields %v\n", hit.ID, hit.Locations, hit.Fields)
       
}
   
   
}
   
    func buildIndexMapping
() mapping.IndexMapping {
        mapping
:= bleve.NewIndexMapping()
   
        keywordFieldMapping
:= bleve.NewTextFieldMapping()
        keywordFieldMapping
.Analyzer = keyword.Name
   
        docMapping
:= bleve.NewDocumentMapping()
        docMapping
.AddFieldMappingsAt("Type", keywordFieldMapping)
   
        headerMapping
:= bleve.NewDocumentMapping()
        headerMapping
.AddFieldMappingsAt("Subject", keywordFieldMapping)
        headerMapping
.AddFieldMappingsAt("From", keywordFieldMapping)
        headerMapping
.AddFieldMappingsAt("To", keywordFieldMapping)
        headerMapping
.AddFieldMappingsAt("Date", keywordFieldMapping)
   
        docMapping
.AddSubDocumentMapping("Data", headerMapping)
   
        mapping
.AddDocumentMapping("header", docMapping)
        mapping
.TypeField = "Type"
   
       
return mapping
   
}



When I run this program with default mapping, I get a hit:

    Searching with query 'test'
    Hits:
    ID testdoc, Locations map[Data.Subject:map[test:[0xc420116780]]], Fields map[]


When I try with my custom mapping, I get no hit:

    Searching with query 'test'
    Hits:


Could someone please point me in the right direction?

Ian.

Marty Schoch

unread,
Nov 29, 2016, 11:02:11 PM11/29/16
to bl...@googlegroups.com
I took a quick look.  I think your mapping looks good.  The problem appears to be that the you're using the "keyword" analyzer which, will index all the contents as a single value.  Thus "test" is no longer a term in the index.

Also, I'm not sure if the data you're passing in to the mail header parsing stuff is working either.  I printed out the fields that bleve sees and get:

2016/11/29 22:57:04 field: &document.TextField{Name:Type, Options: INDEXED, STORE, TV, Analyzer: &{[] 0x52b180 [0x52b180 0xc4200900f8]}, Value: header, ArrayPositions: []}
2016/11/29 22:57:04 field: &document.TextField{Name:Data.Subject, Options: INDEXED, STORE, TV, Analyzer: &{[] 0x52b180 [0x52b180 0xc4200900f8]}, Value: test email To: Bob Smith <b...@example.com> From: Jane Smith <ja...@example.com> Date: Tue, 29 Nov 2016 21:48:47 +1000, ArrayPositions: [0]}

See how there are only 2 fields, and all the other header fields appear to be inside the subject?  This shows up the same regardless of which mapping is used.  I suspect the data being passed into bleve is not what you're expecting either.

I'd suggest checking the output of the data before passing into bleve, and try using "standard" instead of "keyword" analyzer.

marty

--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+unsubscribe@googlegroups.com.
To post to this group, send email to bl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bleve/e73edec8-e367-4256-88c5-0203de800306%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

porjo38

unread,
Nov 30, 2016, 1:12:49 AM11/30/16
to bleve
Hi Marty,

Thanks, using 'keyword' analyser was the problem! I would also like to search on the 'Date:' field, so I've extended my example program with a NewDateTimeFieldMapping - see here:

https://play.golang.org/p/YyzYxz9gi4

When I run this, I get error:

header map[From:[Jane Smith <ja...@example.com>] Date:[Tue, 29 Nov 2016 21:48:47 +1000] Subject:[test email] To:[Bob Smith <b...@example.com>]]
Searching with query 'Date:>"Tue, 29 Nov 2016 21:48:46 +1000"'
2016/11/30 16:03:03 parse error: invalid time: unable to parse datetime with any of the layouts
exit status 1

I'm using time layout RFC1123Z which matches the date format in the email! Where did I go wrong?

Ian.

porjo38

unread,
Nov 30, 2016, 7:24:58 AM11/30/16
to bleve
I realised that my search string had the wrong format for the timestamp - I was using RFC1123Z but should be RFC3339 (or similar). Even after fixing the date format in the search string I still wasn't getting any results. I tried using a DateRangeQuery instead of a QueryStringQuery and that worked! So while my original issue is unresolved, I have a work around for now.

Marty Schoch

unread,
Nov 30, 2016, 5:47:52 PM11/30/16
to bl...@googlegroups.com
So, you can change which date format the query string expects by setting:

query.QueryDateTimeParser = "dateTimeParser"

I made that the first line in main() and it works (but as you point out still doesn't get any hits).

Can you give me the query string you're trying (which doesn't work) and the date range query (which does)?  That will help me try to reproduce your problem.

Thanks,
marty

--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+unsubscribe@googlegroups.com.
To post to this group, send email to bl...@googlegroups.com.

porjo38

unread,
Dec 1, 2016, 7:16:50 AM12/1/16
to bleve
Hi Marty,

Thanks for the tip regarding setting the QueryDateTimeParser. Below are example programs that demonstrate what I've found.

This example shows a date range query which returns a result. If you adjust the date on line 70 so that the range is outside the date in the email header (2016-11-29) then it will return no result (i.e. works as expected).

https://play.golang.org/p/y28clggnQL

This example shows use of a query string query `Date:>"2016-11-26"` - this doesn't doesn't work as expected:

https://play.golang.org/p/fxGgIhX3zN

Ian.

Marty Schoch

unread,
Dec 6, 2016, 8:25:34 AM12/6/16
to bl...@googlegroups.com
So for the long delay in getting back to you about this.

The problem is that your query string is restricted to the field "Date", but there is no field "Date, only "Data.Date".  Obviously "Date" or even "date" would be a much better field name, but currently there is no way to ditch the name from the parent container field.  See also: https://github.com/blevesearch/bleve/issues/229  Changing "Date" to "Data.Date" gives the expected results.

The reason that your DateRangeQuery works is that its not restricted to any field, so it is searching on the "_all" field.  By default, everything is in the _all field, so that includes that Data.Date.

marty

--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+unsubscribe@googlegroups.com.
To post to this group, send email to bl...@googlegroups.com.

porjo38

unread,
Dec 6, 2016, 3:19:35 PM12/6/16
to bleve
Thanks Marty, I appreciate the help.
Reply all
Reply to author
Forward
0 new messages