Hey folks,
Long post, sorry!
The past week or so I’ve been playing around trying to put together a “real-world”(ish) app that uses KSQL in order to get a better idea of where our strengths and weaknesses are. The idea being - we need to make it super fun and easy for devs to write apps with KSQL in order to drive adoption.
The app I’ve chosen to build is modelling a simple retail business using event sourcing principles. It contains various topics containing things like: line_items, shopping basket events (add/remove), orders, and warehouse stock events (add/remove). There will be various aggregations that provide materialized views for things like: Current shopping basket states and current warehouse stock.
The app will allow the user to view the current catalogue, add/remove items to their basket, and place orders. There will be pages to show current basket state for a user (pull query) and current warehouse stock (pull query), and other pages showing live updating reports (push query to browser) for things like total orders value.
I’ve spent some time trying to build such an app using KSQL but I've hit some stumbling blocks, including:
To get shopping basket state for a user I need to execute a pull query of the form “SELECT * FROM BASKET_STATES WHERE USER_ID=?” - this currently is not possible as we don’t support pull queries that return multiple values and which have a where clause on non key field. To workaround this I have had to setup connect to dump the aggregate table state into a JDBC table and read from that directly in the app using a JDBC client. It took me about a day to get connect working properly (figuring out the configuration), and the app needs to use the JDBC client directly thus adding another layer of complexity.
Chunked response pull queries are not easily usable in the app. The chunks do not correspond to whole rows(s) so some tricky parsing on the app is necessary to re-assemble the chunks into rows so they can be handled by the app. It's not reasonable to expect the app developer to do this.
The websocket pull query endpoint works, but is not documented, so we can’t expect users to use that currently. This means that right now there is no reasonable way for KSQL users to get pull query results in their apps.
The app needs to send messages to Kafka topics, e.g. to place an order, or to represent shopping basket events. This means the app also needs to use the Kafka client directly.
In order to create a meaningful app the user currently has to use up to 4 different clients currently (JDBC, Kafka, HTTP (for REST API), and possible Websocket client for ws endpoint). That's really confusing and hard to setup for the app developer. I think we need to provide a KSQL client to improve this.
Also, streaming over the current HTTP/REST or (undocumented) websocket API doesn't provide an effective solution for app developers. We need a better way of doing this that separates out the stream oriented thing (push queries, pull queries, inserting) and implements it in a way more suitable for high throughput streaming.
My next step is to create a prototype. The prototype will:
Flesh out the beginning of a KSQL client (initially in Java). The new client will allow:
* Inserting message into streams
* Pull queries
* Push queries
The idea is for the app to do everything it needs to do to create an awesome streaming app using one client.
New server side streaming API - this will provide a simple binary protocol (most likely over websockets and/or TCP) for executing and streaming the results from pull or push queries and for handling inserts. These kinds of operations are inherently stream oriented and not suitable for HTTP/REST API.
I will try and hack together an implementation of pull queries supporting non key selects probably by creating a new KS state store implementation which allows a 3rd party relational db to be used for the state store. I can use something simple and embedded like Apache Derby for the prototype or quick out of the box experience but we could make this configurable by the user. This will be a hack initially.
I think if we tackle the above then it will really open up KSQL to a lot more real-world use cases, and make the app development experience really great.
--
You received this message because you are subscribed to the Google Groups "ksql-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-dev/7b18983f-c1ab-474b-be31-37dfb8c1f3cb%40googlegroups.com.
Thanks for taking the time to do this This "end-user" perspective is very essential atm!On the stumbles- On non-key field queries, as you might guess this needs secondary indexing. RocksDB does not offer secondary indexes atm. We could think about some form of local secondary indexing by leveraging transactions support and double writing to two tables.. but all this needs a lot of design and implementation across Kafka Streams + KSQL. For now, could we create another "table" off BASKET_STATES aggregating by users, where we store a set/list of items keyed by user id? I know this wont be consistent with BASKET_STATES all the time.. but a lot of nosql stores offer eventually consistent global secondary indexes and users seem to find it atleast a workable solution?
- +1 . This was my core concern as well, from our internal discussions. I think it got lost in translation. Even when we are returning full jsons out of push queries now, then it's not streaming anymore i.e each chunk is not consumer by itself (I thought that was the orginal intent with that API). We need to get out client story straight IMHO.- Sorry maybe I am missing something. Can't the app do insert statement via KSQL instead of directly talking to Kafka ? +1 on abstracting Kafka away from user.
- On separating pull queries, inserting, push, again +1, they are very different and the clean way is to have different resources for them at the server. Everything from threadpool, connection management is different.
My only suggestion for prototype would be : can we start a KLIP first on the client redesign? I don't intend to stop your experimenting. By all means, please prototype and it will help drive requirements. Just saying we need a ground up rethinking before we take the next step, to avoid getting into the same state again :)
>I will try and hack together an implementation of pull queries supporting non key selects probably by creating a new KS state storeunless you are planning to keep this consistent with your original table by hacking this into Kafka Streams itself,
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+unsubscribe@googlegroups.com.
Hey Vinoth,
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-dev/7b18983f-c1ab-474b-be31-37dfb8c1f3cb%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "ksql-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-dev/60bd95a6-4bee-4ba2-9fe1-889797067c66%40googlegroups.com.
Thanks for writing this up Tim! I think we should push to develop with this level of user experience in mind regularly. I have a few high-level, rough and arguably constructive comments:- From the API perspective I think it's important to limit the number of public APIs and protocols that we support, designing them in tandem so that they each have clear and distinct responsibilities with little overlap. Unfortunately, this probably means we should also invest up front to get a big picture view (cf. what Vinoth said about clients) so we don't end up with what the mish-mash we have today.- With regards to the streaming protocol, is there anything we can leverage from Kafka instead of trying to reinvent the wheel? They have years of building an optimized streaming protocol, and turns out it's not easy to write a good one.
- Similarly with clients, we should take care to learn from Kafka and its experience with heavyweight clients. Kafka has tons of clients, some which are third party and others that don't properly implement the protocol, and it causes a massive development and support burden. Could we get away with pretty lightweight clients?
Chunked response pull queries are not easily usable in the app. The chunks do not correspond to whole rows(s) so some tricky parsing on the app is necessary to re-assemble the chunks into rows so they can be handled by the app. It's not reasonable to expect the app developer to do this.
--
You received this message because you are subscribed to the Google Groups "ksql-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-dev/6347f3ef-e11c-43f2-872f-357c2d983499%40googlegroups.com.
Thanks for starting this discussion, Tim. I think developing a coherent application development lifecycle is a critical piece for KSQL right now. I only have one question, which is more for my edification:Chunked response pull queries are not easily usable in the app. The chunks do not correspond to whole rows(s) so some tricky parsing on the app is necessary to re-assemble the chunks into rows so they can be handled by the app. It's not reasonable to expect the app developer to do this.Can you share the example response output for pull / push queries today? The discussions here have tended to be abstract, and writing down the exact request/response would help solidify the gaps in my mind.
Great initiative!
About the "state store hack": this will be rather difficult. The current
API of Kafka Streams enforces to use `KeyValue<Bytes, byte[]>` stores to
be plugged into the DSL operators.
Similarly, a `ReadOnlyKeyValue` interface is exposed via the IQ API. It
_might_ be possible to hack around both to some extend, but not sure if
it's actually possible to pull it off.
However, even if you can manage the first two issue, how do you know
which instance to query? Stores are partitioned by the primary key, and
Kafka Streams has support to detect which instance hosts what store
partitions based on the key. But there is no support for secondary
indexes. Hence, the only way I see how this could be implemented given
the limitation of Kafka Stream, is a lookup into all
instances/store-partitions (maybe exploiting a local secondary index to
avoid a full table scan).
Just want to point the expected issues...
For building a KSQL client: for push queries, did you consider to just
run a `KafkaCosumer` with the client and read directly from the result
topic? Cutting out the ksqlServer from the communication path might be
desirably from a performance point of view? Furthermore, we don't need
to design any protocol.
For the ksql-client: My understanding was that a transient query would
write its result into a topic. If this his not true, my proposal does of
course not work. However, considering EOS, it might actually be an
advantage to re-route the result data through an output topic --
Yes, a persistent query obviously creates an output topic. However, I
asked (I think it was Almog) recently how transient queries work, and he
explained to me, that transient queries also create an (internal) output
topic, and read this output topic with a KafkaConsumer server side to
serve the client.
--
You received this message because you are subscribed to the Google Groups "ksql-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-dev+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-dev/3663a3b1-4390-48bf-b899-b80a6369c2ca%40googlegroups.com.
> send an email to ksql...@googlegroups.com
> <mailto:ksql...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/ksql-dev/3663a3b1-4390-48bf-b899-b80a6369c2ca%40googlegroups.com
> <https://groups.google.com/d/msgid/ksql-dev/3663a3b1-4390-48bf-b899-b80a6369c2ca%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "ksql-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ksql...@googlegroups.com
> <mailto:ksql...@googlegroups.com>.