Hi Shafi,
This is because until it receives the first query from Hive, MR3 does not know what containers to create. The first query specifies the memory size of containers, DaemonTasks in each container (such as MR3 shuffle handler and LLAP I/O daemon), whether containers should be reused across queries, and so on. Moreover those containers created for the first query are not reused for the next query if cross-DAG container reuse is disabled. So, DAGAppMaster should wait for the first query in order to decide what containers to create. In essence, as an execution engine, MR3 does not know anything about its applications. (For example, we are currently looking into running both Hive and Spark with a common MR3 DAGAppMaster).
Here is my initial thought on your use case (where I assume autoscaling is enabled). To solve your issue,
1) MR3 should be extended so that the first query would create a fixed number of containers right away, instead of creating the first few containers incrementally.
2) At the same time, your application could submit a dummy query in order to let MR3 know what containers to create.
Would this approach solve your issue?
Cheers,
--- Sungwoo