Hi Ilya,
I think this can be done in this way. First apply the window operation on the first event stream to create a new windowed stream. Then this window stream is joined against the second event stream, to create another stream. This stream will have all the joined data. The code will look something like this, probably a few more maps and other transformations as well.
val windowedEventStream = firstEventStream.window(Minutes(30)) // Each RDD of this DStream will be the unified data of last 30 minutes.
val joinedStream = secondEventStream.join(windowedEventStream)
joinedStream.foreach( joinedRDD => {
// do something with the joined RDD, maybe push it to a database, etc.
})
Underneath since all the data transformations are executed as deterministic RDD transformation, you automatically get exact once guarantee for all transformation (map, join, reduce, fitler, etc.) despite worker failures. For the output operation like foreach (shown above), that operation may get executed more than once. Unlike the deterministic, functional (i.e. external side-effect free) transformations like map, and reduce, output operations are not side-effect free as arbitrary computation can be down with the data. So its upto the user to output data in the operation in such a way that ensure exactly-once semantics For example, if you are updating a database, you could design it such that updates are idempotent - updating the same record twice with the same data does not make a difference.
Let me know if you need more clarifications.
TD