Hi Alek,
You are correct that this isn’t currently supported aside from the inefficient approach of using drJoin on singleton chunks. Our main goal with datadr is to provide a simple interface and analysis paradigm for key/value type large data sets. Data frames are a special and common case of data in this paradigm, but we aren’t planning on full support of the type of operations you might get in a RDBMS or data frame specific packages like dplyr.
However, I don’t think it would be too difficult to add this and it would be a good thing to support. The approach I am thinking of would be to support input ddfs of arbitrary chunking and basically write a custom map/reduce procedure that in the map breaks the data up either row-wise or according to a hash function applied to the columns to merge on and does the merge in the reduce. If you have use case you’d like to see this feature for, please file an issue on github.
Thanks,
Ryan