Hotel search API

144 views
Skip to first unread message

Jirka Chadima

unread,
Oct 5, 2018, 7:50:22 AM10/5/18
to Winding Tree

Hello everybody,


the time has come to think about a Hotel search API, I've created a little sketch of the overall architecture that this box could have. Detailed description follows below the picture.


wt-search-api.png



Ideal flow

- Hotel writes its data to the platform with Write API.

- Write API sends notifications about hotel data changes via Update API.

- If a new hotel pops up, Subscriptions Management makes sure that the Search API is tracking all the hotel data changes.

- Subscription handler (or Resync cron job) tells the Crawler to collect the changed data via Read API.

- Crawler puts a copy of the data to Permanent storage.

- Crawler also bumps the Indexer and Price Computation components to start work with the changed data.

- Indexer re-indexes the hotel data from Permanent Storage to make search easier where possible (such as location data, description for fulltext etc.) and puts them to Indexed Storage.

- Price computation (silly name) re-computes all of the prices based on new hotel information from Permanent Storage and puts them to Price Storage. It's not really clear how it should know which prices it should compute though.

- OTAs (or other users of the system) are posting queries via Query API and they are getting quick responses


Various notes

- Prices (and all other guest-related) query results might be hard to pre-compute. It might be feasible to collect common query types and pre-compute appropriate data for that.

- There's yet no decision on how the Query API will communicate with the outside world. Contenders probably are: REST API with query strings, REST API with custom query language, GraphQL endpoint

- Query API has to offer ways of sorting the data and some relevance score for search results. Also we cannot forget about pagination.

- Resync CRON job has to be in place because there's no guarantee that the outside system is reliable and that every hotel uses Update API.

- Indexed and Price storages are fast, ideally in-memory databases.

- Permanent storage is in place if Indexed Storage and/or Price Storage get somehow corrupted or destroyed. The Search box can re-index the whole WT platform way faster than if it had to get all data from various distributed storages.


There are probably hidden costs everywhere, but I think this architecture is something we should follow. Even if the first version is very simple. However, all of the boxes by themselves can be isolated into their own microservices and scaled as needed in the future.

In the following days we will start to implement a simplistic version of this and our focus will be to fulfill the "search by location" use case, for queries such as "Give me hotels nearby LAX".


Cheers,

Jirka Chadima

Jakub Vysoky

unread,
Oct 6, 2018, 6:55:25 PM10/6/18
to Jirka Chadima, Winding Tree
Two questions:

(1) where is the development going to happen?

(2) can there be mechanisms in the future to distribute validated data from the index storage or permanent storage, so it is faster to spin up a new search node?

Great work! Thank you!

--
You received this message because you are subscribed to the Google Groups "Winding Tree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to windingtree...@googlegroups.com.
To post to this group, send email to windi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/windingtree/eebd73d7-7ab9-437c-8ad4-963318236c66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Jirka Chadima

unread,
Oct 8, 2018, 4:07:11 AM10/8/18
to Jakub Vysoky, windi...@googlegroups.com
(1) On GitHub :-) I've set up a repo on https://github.com/windingtree/wt-search-api but the name might change. Not sure about that right now.
(2) You mean something like that when you spin up a node, you provide a link to another node from which you would copy the whole database? I haven't thought about that. It sounds like a potentially good idea, but I can see a lot of possible issues right away:
   - It sounds a lot like an initial database replication, and that might be a lot of data (= it might take a lot of time)
   - It sounds that it very much depends on selected database technology (= some db dumps are faster than others)
   - It might cause a big load on the source node (not sure you want that, in an extreme case you might actually block the source database for reading)
   - It's not very decentralized. I don't see a way how to distribute this across multiple source nodes.
   - I think it might actually be feasible to replicate the permanent storage and to compute the indexed storage locally (but it's efficiency depends on selected database and data size)

If (2) was meant for scaling, I can see a much easier way in scaling each component by itself. Like a DB cluster, cache cluster, many indexer workers etc. Anyway, let's not forget this idea, it might come in play later.

Jirka Chadima

Augusto Lemble

unread,
Oct 8, 2018, 4:51:46 AM10/8/18
to Winding Tree
I had troubles understainding the flow, at first oook I can see four APIs and two questions just pop up:
Do we need four APIs? cannt we have only two with write and read operations and optional services? Like a WT Write API with update notifications service and a Read API with a search service. I think this servcies are related each other, the search API need access to the updates but this can achieved by listening to susbcriptions.

I also see a lot of storages, Jirka can you add a description of what each storage has and in what differences one from each other?

Jirka Chadima

unread,
Oct 8, 2018, 5:37:03 AM10/8/18
to aug...@windingtree.com, windi...@googlegroups.com
To the public, there's only one new API - the Query API.

Having the services as separate APIs (like microservices) is a better design that allows people to choose how they will use the platform and brings many benefits to scaling and security. At this point, bundling services does not really make sense to me, but in the future, who knows. I also think that different APIs have potentially different audiences. Also, it's much harder to split things up than join them together - so if we decide to merge some of these into one monolith, we always can.

A lot of storages? There are three:
  - Permanent storage - consider it like a "mirror" to the WT platform. It should contain raw, unprocessed data downloaded via wt-read-api from whichever storage hotels use. So probably some DB that's good with JSON (PostgreSQL, MongoDB, CouchDB...)
  - Indexed storage - A storage supporting quick lookups that holds pre-computed "views" based on possible static search criteria (such as hotel location, description). As "static" I consider data that's not affected by data provided by the end user (typically dates, guests and their information). Something like ElasticSearch, Cassandra, Redis...
  - Price storage - A storage supporting quick lookups that holds pre-computed "views" based on possible dynamic search criteria (such as number of guests, their age, arrival date etc.). Something like ElasticSearch, Cassandra, Redis...

The Indexed Storage and Price storage will probably end up with the same technology. I've split them into two, because the data will have to be (pre)computed in different fashion.



--
You received this message because you are subscribed to the Google Groups "Winding Tree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to windingtree...@googlegroups.com.
To post to this group, send email to windi...@googlegroups.com.

Hynek Urban

unread,
Oct 8, 2018, 8:00:47 AM10/8/18
to Jiří Chadima, windi...@googlegroups.com
Hi Jirka,

just to add my feedback - what you describe seems more or less comprehensible to me and it sounds good! Two thoughts:

- The diagram doesn't show any data flowing out of permanent storage, ever. You describe it quite well in the text; so I assume the arrows leading to "Price computation" and "Indexer" should perhaps originate in the "Permanent storage"?

- It seems to me that the benefit of using REST over GraphQL in terms of consistency with other parts of the WT platform would be hard to beat.

- Finally, let me express a heretic thought - if we assume that we need to run the "resync cron job" regularly and further, if we assume that the cronjob has to crawl through all the available data in WT and thus has to be reasonably fast to achieve that, is there a reason to keep the data mirrored in "Permanent storage" at all? I mean, if we only use it to prepopulate the index storages in cases of corruption and if we're able to do the same reasonably quickly from the distributed WT platform data (which is implied by the first two assumptions), does a "permanent storage" bring us more than it costs us? (Complexity, backups, ...).

I'm not convinced that my last proposition is correct; it's just a fleeing thought of mine - possibly completely misled.

Regards,
Hynek


--
You received this message because you are subscribed to the Google Groups "Winding Tree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to windingtree...@googlegroups.com.
To post to this group, send email to windi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--

Jirka Chadima

unread,
Oct 8, 2018, 8:51:38 AM10/8/18
to Hynek Urban, windi...@googlegroups.com
> - The diagram doesn't show any data flowing out of permanent storage, ever. You describe it quite well in the text; so I assume the arrows leading to "Price computation" and "Indexer" should perhaps originate in the "Permanent storage"?
Yeah, that's true, I was never good with arrows. I tried to show the communication flow, but the Storage itself does not initiate anything. I think that the arrows into Price computation (PC) and Indexer are coming correctly from the Crawler, because Crawler would initiate the process. However, the Indexer and PC rely on the data from the permanent storage. Perhaps it's time to add a second type of arrows for data retrieval. I'll update the diagram on https://github.com/windingtree/wt-search-api/blob/master/README.md

> - It seems to me that the benefit of using REST over GraphQL in terms of consistency with other parts of the WT platform would be hard to beat.
I sort of agree, but
  - I imagine that the user base of the Query API might be quite different than of the low level platform (so no need for consistency)
  - I really don't want to invent a custom query language over REST (logical AND and OR etc.). Also I'm not really sure if GraphQL provides such expressivity at all.
  - The ideal interface for me would be SQL :-) I don't think we need to do any decision now, we should evaluate both options once we need to.

> - Finally, let me express a heretic thought - ....
My assumptions were a little different, I'll try to explain myself a little better

> if we assume that we need to run the "resync cron job" regularly....
I assume that every hotel would have a different schedule of the resync job. If a hotel does not use update-api, I can see a refresh happening every 2 minutes. If a hotel has update-api, I can see a refresh every hour. It's meant primarily as a failsafe.

> if we assume that the cronjob has to crawl through all the available data in WT and thus has to be reasonably fast to achieve that
I assume that we won't crawl all of the data at the same time. So if it takes a little bit longer for one hotel it doesn't matter.

> is there a reason to keep the data mirrored in "Permanent storage" at all? I mean, if we only use it to prepopulate the index storages....
I understand your point, it doesn't sound heretic at all. Maybe my design is an overkill, I don't know. But given my prior experience with systems relying entirely on data in an unreliable 3rd party storage, I opted to put the local data copy in the design. I think it (at least) really helps with the development and debugging of the system, because you can quickly see why it's doing what it's doing. The major selling point for me is, that since the data in the platform can change at any time, you are keeping track of the source data your indexed data is based on - and you can better understand what is going on or even track down where the problem originated.
Another point where I can see a clear benefit is the pricing computation. If we don't know which prices are we supposed to pre-compute, every price search would result into hitting a primary unreliable storage. I just feel that the data would quickly become an unmaintainable mess. But maybe I'm just too careful...

I was trying to design it in a defensive way where the system can (at least for some time) survive if any other part is broken down. I'm in general quite sceptical about reliability of computers :-)



mensc...@gmail.com

unread,
Oct 9, 2018, 12:41:08 AM10/9/18
to Winding Tree
Great job Jirka! Exactly what I needed. 

Reg. "duplicating BD" form one node to another:
we talked with Robin that it would be good idea if anyone can run this Search Node as service and make business, this will allow those guys to cover cost of storage and computation. (wrote about it in my doc).
I think those owner of Nodes won't be very happy to give their DB for free to new Node runners (their new competitors). So maybe it will be a WT service for some fee (Lif?): when we run own Node and allow new Node owners to duplicate DB from us, kinda selling DB. Of course they can run Node and wait until it fills local DB but with our service they will save much resources.


понедельник, 8 октября 2018 г., 13:07:11 UTC+5 пользователь Jirka Chadima написал:

mensc...@gmail.com

unread,
Oct 9, 2018, 12:59:47 AM10/9/18
to Winding Tree
Reg. permanent storage:
I agree that it will be much safer to have copy of data in Permanent storage. There can be different situations when it can be useful:
- e.g. if Node reboots it's faster to make Price Computation and Indexer to fill in memory DBs from Permanent Storage rather than re-sync from Platform. 
- Node want to share it's BD with another Node

Maybe we could make the Permanent Storage optional (e.g for Node runners with not much storage resources), if it will worth to do it.

Micro-services are awesome!

понедельник, 8 октября 2018 г., 17:51:38 UTC+5 пользователь Jirka Chadima написал:

Stephen Burke

unread,
Oct 9, 2018, 5:21:33 AM10/9/18
to mensc...@gmail.com, windi...@googlegroups.com
we need to get some data on lead time to arrival metrics for rate shoppers, metas, etc.  as this should play into how the service is designed.  If, for example, hotels provide preferential rates on WT platform due to the direct nature, you'll have an unbelievable amount of shopping.  Most companies use a shopping cache to handle that traffic and only hit the real data store when it appears that there is a revenue transaction.

I'll work on getting some data 

--
You received this message because you are subscribed to the Google Groups "Winding Tree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to windingtree...@googlegroups.com.
To post to this group, send email to windi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Best regards,

Stephen Burke
SVP & Practice Principal, Travel and Hospitality | Sciant AD
 
m: +420 776 491 795
e: ste...@sciant.com

Book appointments here: https://calendly.com/stephen-sciant


Jirka Chadima

unread,
Oct 9, 2018, 6:23:58 AM10/9/18
to Winding Tree
We talked with Hynek about making it optional, but there are two strong points against that:

1) subscriptions - They need some sort of storage anyway (you need to keep track of what's related to which hotels etc.), so you would need some sort of local database as well.
2) incoming partial data - In some cases, you might need more data than what will come from the update-api. Imagine this: You get a notification about change in a rate plan, so you download a fresh set of rate plans. But, for some precomputed data, you also need to get availability. But you don't have that, because you don't have the availability in the local permanent storage. So you have to get it from the off-chain storage, and that is potentially slow.

Robin Gottfried

unread,
Oct 10, 2018, 4:43:17 AM10/10/18
to mensc...@gmail.com, windi...@googlegroups.com
Alex,

the idea of p2p synchronisation is interesting. Still, I am missing the motivation for this feature. What problem do you want to solve by p2p sync? Who will benefit from it?

R

Alexander Menschikov

unread,
Oct 10, 2018, 5:02:56 AM10/10/18
to robin.g...@fragaria.cz, windi...@googlegroups.com
Robin, 

 I wrote "those owner of Nodes won't be very happy to give their DB for free to new Node runners (their new competitors). " So I don't think the P2P sync will be 100% required by market players.

The only situation when someone would want to "sell DB" to new Node is, I guess: if it will take too much time to make initial sync of new Search Node and Platform. And owned would pay to get DB sooner. 
Winding Tree can use this situation too and run own node of Search API and provide P2P sync to others for money (maybe this can be even another product)
I don't see any necessity to design this feature now, it's just a suggestion what can happen
 



Alexander Menschikov


ср, 10 окт. 2018 г. в 13:43, Robin Gottfried <robin.g...@fragaria.cz>:

Vladimir Linhart

unread,
Nov 30, 2018, 10:03:45 AM11/30/18
to Winding Tree
Hello,

There is the hotel location search, but can't find a way to do this kind of search:
- location + number of persons + dates  -> gimme hotels+prices

Can't find any API doc...

Thanks for pointers,

Vladimir

Jakub Vysoky

unread,
Nov 30, 2018, 10:09:10 AM11/30/18
to Vladimir Linhart, windi...@googlegroups.com
Hey Vladimir!

Our demo explorer is using our search api running instance. The api documentation is in the repository [1]


It is a work in progress, the search to the explorer was actually integrated yesterday, but feel free to ask more questions!

Cheers!

--
You received this message because you are subscribed to the Google Groups "Winding Tree" group.
To unsubscribe from this group and stop receiving emails from it, send an email to windingtree...@googlegroups.com.
To post to this group, send email to windi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vladimir Linhart

unread,
Nov 30, 2018, 11:11:25 AM11/30/18
to Jakub Vysoky, windi...@googlegroups.com
Ha, I'm keeping up with the progress it seems :-)

What I see in the swagger is only hotel-search, not even the availability..

So it can only search hotels right?

Jirka Chadima

unread,
Dec 3, 2018, 3:25:07 AM12/3/18
to vladimir...@gmail.com, Jakub Vysoky, windi...@googlegroups.com
Hello Vladimir,

good to hear from you again. Right now, the search API is very basic and far from completed and it can only search hotels by location (and also sort it, although we're still working on that part a little bit).

So the flow that would work right now to book a hotel would be:
1) Search by location using the Search API (https://playground-search-api.windingtree.com/docs or https://github.com/windingtree/wt-search-api/blob/develop/docs/swagger.yaml) and get hotel IDs (ethereum addresses)
3) Compute price, cancellation terms and all other customer-related things client-side
4) Send a booking request to a hotel's bookingUri (Booking API instance - interface on https://mazurka-booking.windingtree.com/docs/ or https://github.com/windingtree/wt-booking-api/blob/master/docs/swagger.yaml). We have deployed the booking API only for Mazurka hotel (0x9d226A90C5a4c711c92e23cD246e9eb77d2D21f9) in playground and the sample implementation is not production-ready at all.

Ad step 2 - There is a plan to make search API compatible with Read API as much as possble, so you wouldn't have to talk to Read API at all in this use case.
Ad step 3 - Our plan is to eventually support offers (computed prices) in Search API but we're far from that. See the architecture proposal on https://github.com/windingtree/wt-search-api/blob/develop/README.md#proposed-architecture for more details.
Ad step 4 - This is only one way of doing it and we might (and probably will) come up with a solution that would benefit from blockchain and/or other distributed technologies.

In the explorer itself, this can be done by going to a map search, looking up a hotel, then filling in guest data ("Get estimates" button - it's probably a stupid name) and then (if you're lucky - data is totally random right now) you should see a Book this button for every room where that is possible.

I hope this clears things up at least a little bit, we're constantly adding more and more functionality to various pieces of the platform, so it's kind of hard to keep an up to date document somewhere. We hope that common use cases will emerge and we will document them in some stable way in the near future. A good start for a newcomer developer is probably the document on https://github.com/windingtree/wiki/blob/master/developer-resources.md

Vladimir Linhart

unread,
Dec 10, 2018, 8:18:05 AM12/10/18
to Winding Tree
Hello Jirka,

I think I get it. Seems I'll wait until the availability search is part of the search api.
Are you planning to host this API or is it intended for the clients of the API to also host it?

Vladimir

Jirka Chadima

unread,
Dec 11, 2018, 3:24:47 AM12/11/18
to Vladimir Linhart, windi...@googlegroups.com
> Are you planning to host this API or is it intended for the clients of the API to also host it?

We will surely host an instance to show how it works. I'm not sure if we will operate it as a full-fledged SaaS, that decision is probably a little further in the future. But it will remain open-source so it should be fairly easy to run it by anybody.
Reply all
Reply to author
Forward
0 new messages