Hi folks,
Ok, got a really weird modelling question which I'm not sure how to do.
Goal: *Smart* searching for locations, given a massive list of locations. ie. Autocomplete or SEO friendly url.
Problem: I'm not too sure how to model a Location, given that I'm cautious of having denormalized references because sometimes locations change which has a flow on effect for locations 'underneath'. (A city is under a state. a street is under a city .. which is under a state.. etc).
eg. An old neighbourhood is now split into two. Which means all streets in each one will need to be updated, also.
Denormalising would solve the problem, if I didn't need to update.
Details: Previously Ayende showed me how to do some autocomplete with a bunch of locations in a system.
A location, in -that- scenario was just a single string - the fully formatted name of the place, which came from an existing SQL Server.
eg. smith street, new york, new york.
The data was a simple export from the existing Sql Server.
My problem is: I'm a bit stuck to how I can model this. I'm starting out with a SEARCH Index and working my way up to Location object.
Sample Data
- California (State)
- San Francisco, California (City / State)
- Downtown, San Francisco, California (Neighbourhood / City / State)
- O'Farrell St, Downtown, San Francisco, California (Street / Neighbourhood / City / State)
- Post St, Downtown, San Francisco, California (Street / Neighbourhood / City / State)
- Post St, Tribeca, New York, New York (Street / Neighbourhood / City / State)
and here's some Search Queries
- post st downtown
- post st downtown san fran
- post st san fran
- post st ny
- post st ny ny
- post st tribeca
Ok. So what we currently have was to allow each location to have an alias (or aliases - just a single field, space delimeted) .. which is *inherited* as you go down.
An alias(es) are used ONLY for searching. (refer to search examples, above). *inherited* means, it's used when generating the search query ... it's not actually copied to each location. (that would be ack!!)
If the state of New York has an alias of NY, then this is what the TERMS would be for that location.
location: Post St, Tribeca, New York, New York
search query terms: "post st", "tribeca", "new york", "ny"
so this means if we add a new alias / update an alias to a state or a city, etc.. then -all- locations under it also need to be updated.
It's like we want to have a 'view' of a location which is the sum of all the location 'parts' (eg. neighbourhood name, city name, etc) and the alias(es), which is searchable.
lastly, these parts also generate a unique uri : post-st-tribeca-new-york-new-york (for example).
Ideas:
Idea #1 - Class with Id's.
So at first i was thinking about having a Location look like this :-
Id
StreetId
NeighbourhoodId
CityId
StateId
CountryType (an enum).
and possibly use Include(..) to preload all the location 'segments'.
Pros: No denormalized references means that if i have a street location .. and the neighbourhood changes name, then i do not have to find all streets that are in this neighbourhood and update. (get's messier when I go up the hierarchy).
Cons: have to include(..) for each xxxId to get the full location name.
Not sure i could make a static index with this format .. cause i need to Include in a static index?
Idea #2 - Denormalised
Id
Street
Neighbourhood
City
State
CountryType
Pros: No more including required. Can easily create a search static index.
Cons: A real PITA if anything above this location, is updated or changed.
So that's where i'm at :(
and i still stuck on how the static index would look :(
Does anyone have any suggestions about how I should tackle this?