How can I refactor this Map/Reduce index to go from 3 mapped properties to 1 (so I can search on it)?

73 views
Skip to first unread message

Justin A

unread,
Apr 9, 2012, 8:48:20 AM4/9/12
to rav...@googlegroups.com
Hi folks,

I've got an Map/Reduce Index which seems to work ok. I'm hoping to refactor it so the 3 Mapped properties now become ONE single property which will enable me to search on it. This is similar to the Orders Search in Ravendb post and KB Article - Querying Unlike Docs

The difference between those two posts and my Index, is that I have a reduce. So i'm not sure how to do the map, bit.

Here's my code.

public class LogEntries_ByClientGuidClientName :
    AbstractIndexCreationTask<LogEntry, LogEntries_ByClientGuidClientName.ReduceResult>
{
    public LogEntries_ByClientGuidClientName()
    {
        Map = docs => from doc in docs
                        //where !string.IsNullOrEmpty(doc.ClientGuid) &&
                        //  !string.IsNullOrEmpty(doc.ClientName)
                        where doc.ClientGuid != null && doc.ClientName != null 
                        select new
                                    {
                                        doc.ClientGuid,
                                        ReversedClientGuid = (string)null,
                                        ClientNames = new[] {doc.ClientName}
                                    };

        Reduce = results => from result in results
                            group result by result.ClientGuid
                            into g
                            select new
                                        {
                                            ClientGuid = g.Key,
                                            ReversedClientGuid = g.Key.Reverse(),
                                            ClientNames = g.SelectMany(x => x.ClientNames).Distinct().ToArray()
                                        };

        Index(x => x.ClientNames, FieldIndexing.Analyzed);
        Index(x => x.ReversedClientGuid, FieldIndexing.Analyzed);
    }

    #region Nested type: QueryResult

    public class ReduceResult
    {
        public string ClientGuid { get; set; }
        public string ReversedClientGuid { get; set; }
        public string[] ClientNames { get; set; }

        public override string ToString()
        {
            return string.Format("{0} : {1}", ClientGuid, ClientNames != null && ClientNames.Count() == 1
                                                                ? ClientNames[0]
                                                                : ClientNames == null
                                                                    ? "no names"
                                                                    : ClientNames.Count() + " names.");
        }
    }

    #endregion
}

This means my will now just contain ONE property, instead of 3.

Some sample data (and note, #1 and #2 have the same Guid) :-

LogEntry #1.
Client Name: PewPew
Guid: abcde.....1
Reversed Guid: 1.....edcba

LogEntry #2
Client Name: I-see-dead-people
Guid: abcde.....1
Reversed Guid: 1.....edcba

LogEntry #3
Client Name: Hi There!
Guid: ccccc.....1
Reversed Guid: 1.....abcccc


results
- abcde.....1 1.....edcba PewPew  I-see-dead-people
- abccc.....1 1.....cccba Hi There!

UX search box: abc 
=> returns 2 results
abcde....1
abccc ...1

UX search box: Pew 
=> returns 1 result.
abcde....1

NOTE: The reversed is because people might search on the LAST guid chars. eg. searching for *aaa1 (if the last 4 guid chars where a-a-a-1).

cheers :)

Itamar Syn-Hershko

unread,
Apr 9, 2012, 8:55:54 AM4/9/12
to rav...@googlegroups.com
Introduce a field "Content" and put those properties in it, and keep the field you want to reduce on in the Result object you pass to the reduce

Also, note you might want to use a more relaxed analyzer on that Content field (meaning, not StandardAnalyzer), since it might not process your IDs nicely

Justin A

unread,
Apr 9, 2012, 10:23:41 AM4/9/12
to rav...@googlegroups.com
I gave that a go and got some really really weird results. like ..3 or 4 guids in the Content (when there should only be one  [ed. Hi Highlander!] )

This was the code i werked up ...


public class LogEntries_ByClientGuidClientName :
    AbstractIndexCreationTask<LogEntry, LogEntries_ByClientGuidClientName.ReduceResult>
{
    public LogEntries_ByClientGuidClientName()
    {
        Map = docs => from doc in docs
                        where doc.ClientGuid != null && doc.ClientName != null 
                        select new
                                    {
                                        doc.ClientGuid,
                                        Content = new string[]
                                                  {
                                                    doc.ClientName
                                                  }
                                        
                                    };

        Reduce = results => from result in results
                            group result by result.ClientGuid
                            into g
                            select new
                                        {
                                            ClientGuid = g.Key,
                                            Content = new object[] { g.Key, g.Key.Reverse(), g.SelectMany(x => x.Content).Distinct().ToArray() }
                                        };

        Index(x => x.Content, FieldIndexing.Analyzed);
    }

    #region Nested type: QueryResult

    public class ReduceResult
    {
        public string ClientGuid { get; set; }
        public string[] Content { get; set; }
    }

    #endregion
}

So i did add the Content property and tried to reduce on the key. Hmm :( (i'll play with the Analyzer after the index gets the correct content).

Oren Eini (Ayende Rahien)

unread,
Apr 9, 2012, 2:13:15 PM4/9/12
to rav...@googlegroups.com
Justin,
I don't understand what you are trying to do, or what the problem is.
Can you write a failing test to demonstrate that?

Justin A

unread,
Apr 9, 2012, 6:14:31 PM4/9/12
to rav...@googlegroups.com
Sure.

Context
Client: A client is a person how plays pew-pew games on a gaming server. Eg. Me playing Battlefield3 on some Aussie BF3 server.
LogEntry: this is a single line in a log file that represents an an 'state' action. Eg. connected to the server or Hack found on his computer.this is NOT any info about the client -playing- the game (eg. turned a corner, killed another client, driving a tank...)

A client can have 1 GUID only BUT multiple names. Eg. My Guid is abcd1234... but i have 5 names : Jussy, Qwerty, PewPew, HiThere, KThxBai
This means the GUID is the key. (I still use string ID for the RavenDb Id, though).

Scenario
People search for clients via a single text box. This means we can search by guid (16 chars or the last 'x' chars) or by partial name,
eg. 
o) 460c5d75a34d2c812c070f508f2ff707
o) ff707 (last 5 chars)
o) f508f2ff707 (last 11 chars <- I just randomly picked the number 11)
o) Jussy
o) Jus
o) sdsfdsfsfsdf <-- no results at all.

Arrange
public class LogEntry
{
    Guid ClientGuid;
    string ClientName;
    // I've omitted the game state (NewConnection, LostConnection, etc) and other meta.
}

var logEntries = new List<LogEntries>
                 {
                     new LogEntry { 460c5d75a34d2c812c070f508f2ff707, "Jussy" },
                     new LogEntry { 460c5d75a34d2c812c070f508f2ff707, "PewPew" }, 
                     new LogEntry { 460c5d75a34d2c812caaaaaaaaaaaa, "xxxxxx" }, 
                 }

So here, a single real life player has 2 log entries. so if we search for 'PewP' .. we should get 460c5d75a34d2c812c070f508f2ff707.

Itamar Syn-Hershko

unread,
Apr 9, 2012, 6:36:29 PM4/9/12
to rav...@googlegroups.com
First, you don't show your query. I hope it is something along the lines of .Search("Content", "ff707*").Search("Content", "707ff*", SearchOptions.Or)

Second, try removing the Index(x => x.Content,...) line.

Third, it will be much easier to work with a failing test here

Justin A

unread,
Apr 9, 2012, 7:12:12 PM4/9/12
to rav...@googlegroups.com
Sure Itamar. i'm on a train, so when i get to work, i'll post up a full repo to pastie.org.

watch this space :)

Justin A

unread,
Apr 10, 2012, 12:09:47 AM4/10/12
to rav...@googlegroups.com
Ok - here's the repo:  http://pastie.org/3759756

It's needs the following 2 nuget packages
1. xUnit
2. Raven.Embedded

Each Fact creates a DocStore and adds a single Index + some fake data.

The index currently isn't that good - I'm missing some important concept. I'm trying to group all the player's names (ie. ClientName) into a single string array and search on that.

The tests are 

1. Search for all players that contain 'jus'
2. search for an exact guid
3. Search for players that end with the following GUID chars '0800200c9a66"

because the searching is like *query* .. i'm happy to make it query* (ie. StartsWith(query))

-me-

Ryan Heath

unread,
Apr 10, 2012, 3:42:32 AM4/10/12
to rav...@googlegroups.com
I think the type of ClientNames in the reduce step is "string" not "string[]" ?
Could that be a problem?

// Ryan

Itamar Syn-Hershko

unread,
Apr 10, 2012, 8:14:33 PM4/10/12
to rav...@googlegroups.com
It is set as analyzed, so probably not

I'll take a look tomorrow or Sunday

Oren Eini (Ayende Rahien)

unread,
Apr 11, 2012, 12:02:38 PM4/11/12
to rav...@googlegroups.com
Thanks for the tests, it was very useful.
It actually expose several different bugs in RavenDB.

a) When you have QueryYourWrites + NoStaleResultsQueryListener, the QueryYouWrites wins, which results in no waiting for the index.
b) When you are setting a map/reduce field as store = no, it will still consider this to be yes.

At any rate, I make sure that all your tests are passing now, check the new build
Reply all
Reply to author
Forward
0 new messages