On Thu, Nov 17, 2022 at 04:19:01PM +0900, Sungwoo Park wrote:
> >
> > The worker nodes are configured with 50 GB, virtual disks and the
> > /data1/k8s/dag* directories get to about 35 GB used leaving about 5 GB
> > free before things stop. Is this likely a case of simply needing a
> > bigger disk to buffer any intermediate data or do we need to tweak the
> > Hive/MR3 configuration? It looks like IT created the worker nodes
> > with 100 GB, virtual disks but only used 50 GB for the root LV so
> > expanding them to 100 GB should be quite easy. I'll try that
> > tomorrow.
>
>
> If it is a problem related to disk space, you could either 1) try
> increasing the disk space for storing intermediate data, or 2) try deleting
> vertex-local directories. For 2), please see this page:
>
>
https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning-k8s/#deleting-vertex-local-directories
>
> Depending on the scale of your datasets and also the number of concurrent
> users, allocating only 35GB for a worker node might be too small. You could
> watch disk consumption while executing a typical query and then
> (conservatively) choose the amount of disk space to allocate to each worker
> node.
>
> If it is a problem related to memory consumption by Python processes, you
> could try increasing the memory size of ContainerWorker Pod and/or
> adjusting hive.mr3.container.max.java.heap.fraction (so as to allocated
> more memory to Python processes).
Neither increasing the local, disk space to 100 GB nor deleting
vertex-local directories fixed the failure. They just delayed the
inevitable hang where no more workers could get created and the query
was essentially stuck in limbo.
Increasing the local, disk space to 250 GB helped but ultimately still
fails. Here is the partial log from Hive.
[...]
INFO : 2022-11-17 18:56:29,113 Map 1: 1764(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1766(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1767(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1768(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1770(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1771(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1773(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1774(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:44,122 Map 1: 1776(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:44,123 Map 1: 1778(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1779(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1780(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1781(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:54,129 Map 1: 1783(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:54,129 Map 1: 1784(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1785(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:04,135 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:09,138 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:09,138 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:14,141 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:14,141 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:19,145 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:19,145 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:24,148 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:24,148 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:29,151 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:34,154 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:34,154 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:39,157 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:39,157 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:44,160 Map 1: 1773(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:44,161 Map 1: 1774(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:49,164 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:49,164 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:54,167 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:54,168 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:59,171 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:04,174 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:04,174 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:09,177 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:09,177 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:14,180 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,183 Map 1: 1777(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1779(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1781(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1783(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1784(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:29,190 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:29,190 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:34,194 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:39,197 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:39,197 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:44,200 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:44,201 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:49,204 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:54,207 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:54,208 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:59,211 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:59,211 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:04,214 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:09,218 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:09,218 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:14,221 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:14,222 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:19,224 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:24,228 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:24,228 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:29,230 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:29,230 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:34,233 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:39,236 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:39,236 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:44,240 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:44,240 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:49,243 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:49,243 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1690(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:59,249 Map 1: 1690(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:04,253 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:04,253 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:09,256 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:09,256 Map 1: 1692(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,259 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,259 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,260 Map 1: 1701(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,260 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,262 Map 1: 1705(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,263 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,263 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:24,266 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:24,266 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1673(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1669(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:34,271 Map 1: 1669(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:34,271 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1671(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1673(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:44,276 Map 1: 1645(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:44,276 Map 1: 1647(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1648(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1651(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1656(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1658(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1662(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1666(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1668(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1671(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,283 Map 1: 1676(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,285 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1697(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1681(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1665(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1677(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1681(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1683(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1686(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1688(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1692(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1697(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1699(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1704(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1709(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1711(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1713(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1714(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1715(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1716(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1718(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1721(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1723(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1726(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1729(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1732(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1734(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1735(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1739(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1701(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1704(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
startTime: 2022-09-10
Traceback (most recent call last):
[...]
Exception: query returned abnormal status ERROR_STATE (TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), errorCode=2, operationCompleted=1668733302811, progressUpdateResponse=TProgressUpdateResp(status=1, progressedPercentage=0.9476490616798401, rows=[[u'Map 1 .........', u'llap', u'Failed', u'1786', u'1690', u'95', u'1', u'0', u'100'], [u'Map 5 ..........', u'container', u'Killed', u'265', u'265', u'0', u'0', u'0', u'0'], [u'Reducer 2 ', u'container', u'Killed', u'6', u'0', u'6', u'0', u'0', u'24'], [u'Reducer 4 ', u'container', u'Killed', u'6', u'0', u'0', u'6', u'0', u'0']], footerSummary=u'VERTICES: 01/04', startTime=1668724609175, headerNames=[u'VERTICES', u'MODE', u'STATUS', u'TOTAL', u'COMPLETED', u'RUNNING', u'PENDING', u'FAILED', u'KILLED']), errorMessage=u'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Terminating unsuccessfully: Vertex failed, vertex_4073_0000_73_01. Map 1 1786 tasks 8692791 milliseconds: Failed, Some(Task unsuccessful: Map 1, task_4073_0000_73_01_001303, Rerun request, succeeded/failed/killed = 1/0/0
Rerun request, succeeded/failed/killed = 2/0/0
Rerun limit reached, succeeded/failed/killed = 3/0/0)', taskStatus=u'[{"returnValue":2,"errorMsg":"org.apache.hadoop.hive.ql.metadata.HiveException: Terminating unsuccessfully: Vertex failed, vertex_4073_0000_73_01. Map 1 1786 tasks 8692791 milliseconds: Failed, Some(Task unsuccessful: Map 1, task_4073_0000_73_01_001303, Rerun request, succeeded/failed/killed = 1/0/0\
Rerun request, succeeded/failed/killed = 2/0/0\
Rerun limit reached, succeeded/failed/killed = 3/0/0)","beginTime":1668724609109,"endTime":1668733302773,"taskId":"Stage-1","taskState":"FINISHED","taskType":"MAPRED","name":"TEZ","elapsedTime":8693664}]', operationState=5, hasResultSet=False, sqlState=u'08S01', operationStarted=1668724608827))
I've not seen an error like this before so I'm not sure what to look
for. When this happens, 2 of th2 12, worker nodes leave directories
with significant amounts of data under /data1/k8s. One node has two
dag_NNN direcotries and the other has one dag_NNN directory and an
attempt_* directory. Hopefully this will all mean something to you.
Until then, I'll start looking through logs hoping to find a clue.
> Our toughest transform will probably present different problems. It
> > processes one line of input at a time and writes the modifed row out
> > immediately. However, it builds some very, large, lookup tables that
> > can reach several GBs in total size. When I get to it, I will look to
> > see if I can delete some of the data that isn't needed after loading.
> > It's been a long time since I wrote that transform and our analysts
> > have added a lot more data to the loookup tables since then. Assuming
> > I can't reduce the memory footprint by much, do you know what types of
> > problems we might run into with it?
> >
>
> If Python processes consume too much memory, Docker containers might be
> killed.
>
> If you execute such queries infrequently, you could increase the value
> for hive.mr3.map.task.memory.mb and hive.mr3.reduce.task.memory.mb only for
> such queries (e.g., inside Beeline). Then, the Java process would consume
> less memory, thus leaving more memory for executing Python processes.
> However, I think this would require a trial-and-error approach because we
> don't control Python processes. (After all, I guess this is the reason why
> Hive stopped supporting TRANSFORM.)
I will keep this in mind when I get to that query but there's a lot
more testing and verification before then. We typically run 4
variations of this query per monthly process, but we also run ad hoc
variations of it as requested by our analysts.