Python transform causing worker disk to fill up

11 views
Skip to first unread message

David Engel

unread,
Nov 17, 2022, 12:55:37 AM11/17/22
to MR3
I'm now porting and testing our Python, transform scripts on the new
cluster. The "easy" ones are working quite well. One, however, is
causing the disk to fill up with /data1/k8s/dag* directories to the
point where Kubernetes marks the worker nodes as unschedulable.

I'd consider this transform to probably be the second, most, difficult
one we have. It buffers some data in memory as it essentially runs a
moving window over the rows as they stream through. On the particular
dataset I'm using, the whole query gets to about 80% complete before
it bogs down and no more worker pods can get scheduled.

The worker nodes are configured with 50 GB, virtual disks and the
/data1/k8s/dag* directories get to about 35 GB used leaving about 5 GB
free before things stop. Is this likely a case of simply needing a
bigger disk to buffer any intermediate data or do we need to tweak the
Hive/MR3 configuration? It looks like IT created the worker nodes
with 100 GB, virtual disks but only used 50 GB for the root LV so
expanding them to 100 GB should be quite easy. I'll try that
tomorrow.

Our toughest transform will probably present different problems. It
processes one line of input at a time and writes the modifed row out
immediately. However, it builds some very, large, lookup tables that
can reach several GBs in total size. When I get to it, I will look to
see if I can delete some of the data that isn't needed after loading.
It's been a long time since I wrote that transform and our analysts
have added a lot more data to the loookup tables since then. Assuming
I can't reduce the memory footprint by much, do you know what types of
problems we might run into with it?

David
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Nov 17, 2022, 2:19:14 AM11/17/22
to David Engel, MR3
The worker nodes are configured with 50 GB, virtual disks and the
/data1/k8s/dag* directories get to about 35 GB used leaving about 5 GB
free before things stop.  Is this likely a case of simply needing a
bigger disk to buffer any intermediate data or do we need to tweak the
Hive/MR3 configuration?  It looks like IT created the worker nodes
with 100 GB, virtual disks but only used 50 GB for the root LV so
expanding them to 100 GB should be quite easy.  I'll try that
tomorrow.

If it is a problem related to disk space, you could either 1) try increasing the disk space for storing intermediate data, or 2) try deleting vertex-local directories. For 2), please see this page:
 

Depending on the scale of your datasets and also the number of concurrent users, allocating only 35GB for a worker node might be too small. You could watch disk consumption while executing a typical query and then (conservatively) choose the amount of disk space to allocate to each worker node.

If it is a problem related to memory consumption by Python processes, you could try increasing the memory size of ContainerWorker Pod and/or adjusting hive.mr3.container.max.java.heap.fraction (so as to allocated more memory to Python processes).

Our toughest transform will probably present different problems.  It
processes one line of input at a time and writes the modifed row out
immediately.  However, it builds some very, large, lookup tables that
can reach several GBs in total size.  When I get to it, I will look to
see if I can delete some of the data that isn't needed after loading.
It's been a long time since I wrote that transform and our analysts
have added a lot more data to the loookup tables since then.  Assuming
I can't reduce the memory footprint by much, do you know what types of
problems we might run into with it?

If Python processes consume too much memory, Docker containers might be killed. 

If you execute such queries infrequently, you could increase the value for hive.mr3.map.task.memory.mb and hive.mr3.reduce.task.memory.mb only for such queries (e.g., inside Beeline). Then, the Java process would consume less memory, thus leaving more memory for executing Python processes. However, I think this would require a trial-and-error approach because we don't control Python processes. (After all, I guess this is the reason why Hive stopped supporting TRANSFORM.)

Sungwoo


David Engel

unread,
Nov 17, 2022, 9:45:54 PM11/17/22
to Sungwoo Park, MR3
On Thu, Nov 17, 2022 at 04:19:01PM +0900, Sungwoo Park wrote:
> >
> > The worker nodes are configured with 50 GB, virtual disks and the
> > /data1/k8s/dag* directories get to about 35 GB used leaving about 5 GB
> > free before things stop. Is this likely a case of simply needing a
> > bigger disk to buffer any intermediate data or do we need to tweak the
> > Hive/MR3 configuration? It looks like IT created the worker nodes
> > with 100 GB, virtual disks but only used 50 GB for the root LV so
> > expanding them to 100 GB should be quite easy. I'll try that
> > tomorrow.
>
>
> If it is a problem related to disk space, you could either 1) try
> increasing the disk space for storing intermediate data, or 2) try deleting
> vertex-local directories. For 2), please see this page:
>
> https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning-k8s/#deleting-vertex-local-directories
>
> Depending on the scale of your datasets and also the number of concurrent
> users, allocating only 35GB for a worker node might be too small. You could
> watch disk consumption while executing a typical query and then
> (conservatively) choose the amount of disk space to allocate to each worker
> node.
>
> If it is a problem related to memory consumption by Python processes, you
> could try increasing the memory size of ContainerWorker Pod and/or
> adjusting hive.mr3.container.max.java.heap.fraction (so as to allocated
> more memory to Python processes).

Neither increasing the local, disk space to 100 GB nor deleting
vertex-local directories fixed the failure. They just delayed the
inevitable hang where no more workers could get created and the query
was essentially stuck in limbo.

Increasing the local, disk space to 250 GB helped but ultimately still
fails. Here is the partial log from Hive.

[...]
INFO : 2022-11-17 18:56:29,113 Map 1: 1764(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1766(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1767(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1768(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:34,116 Map 1: 1770(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1771(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1773(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:39,119 Map 1: 1774(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:44,122 Map 1: 1776(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:44,123 Map 1: 1778(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1779(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1780(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1781(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:49,126 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:54,129 Map 1: 1783(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:54,129 Map 1: 1784(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1785(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:56:59,132 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:04,135 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:09,138 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:09,138 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:14,141 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:14,141 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:19,145 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:19,145 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:24,148 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:24,148 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:29,151 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:34,154 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:34,154 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:39,157 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:39,157 Map 1: 1772(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:44,160 Map 1: 1773(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:44,161 Map 1: 1774(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:49,164 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:49,164 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:54,167 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:54,168 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:57:59,171 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:04,174 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:04,174 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:09,177 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:09,177 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:14,180 Map 1: 1775(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,183 Map 1: 1777(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1779(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1781(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:19,184 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1783(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1784(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:24,187 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:29,190 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:29,190 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:34,194 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:39,197 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:39,197 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:44,200 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:44,201 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:49,204 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:54,207 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:54,208 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:59,211 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:58:59,211 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:04,214 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:09,218 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:09,218 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:14,221 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:14,222 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:19,224 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:24,228 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:24,228 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:29,230 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:29,230 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:34,233 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:39,236 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:39,236 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:44,240 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:44,240 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:49,243 Map 1: 1786/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:49,243 Map 1: 1782(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:54,246 Map 1: 1690(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 18:59:59,249 Map 1: 1690(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:04,253 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:04,253 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:09,256 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:09,256 Map 1: 1692(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,259 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,259 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,260 Map 1: 1701(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:14,260 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,262 Map 1: 1705(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,263 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:19,263 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:24,266 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:24,266 Map 1: 1698(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1673(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:29,268 Map 1: 1669(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:34,271 Map 1: 1669(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:34,271 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1671(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:39,274 Map 1: 1673(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:44,276 Map 1: 1645(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:44,276 Map 1: 1647(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1648(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1651(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1656(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1658(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:49,279 Map 1: 1662(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1666(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1668(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,282 Map 1: 1671(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:54,283 Map 1: 1676(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,285 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1691(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1697(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:00:59,286 Map 1: 1681(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1665(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1670(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1677(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:04,289 Map 1: 1681(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1683(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1684(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:09,292 Map 1: 1686(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1688(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1692(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1695(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:14,295 Map 1: 1697(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1699(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1704(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:19,299 Map 1: 1708(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1709(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1711(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1713(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:24,302 Map 1: 1714(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1715(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1716(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1718(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:29,305 Map 1: 1721(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1723(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1726(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1729(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1732(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:34,308 Map 1: 1734(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1735(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1739(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1701(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1703(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
INFO : 2022-11-17 19:01:39,311 Map 1: 1704(+1786)/1786 Map 5: 265/265 Reducer 2: 0(+6)/6 Reducer 4: 0/6
startTime: 2022-09-10
Traceback (most recent call last):
[...]
Exception: query returned abnormal status ERROR_STATE (TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), errorCode=2, operationCompleted=1668733302811, progressUpdateResponse=TProgressUpdateResp(status=1, progressedPercentage=0.9476490616798401, rows=[[u'Map 1 .........', u'llap', u'Failed', u'1786', u'1690', u'95', u'1', u'0', u'100'], [u'Map 5 ..........', u'container', u'Killed', u'265', u'265', u'0', u'0', u'0', u'0'], [u'Reducer 2 ', u'container', u'Killed', u'6', u'0', u'6', u'0', u'0', u'24'], [u'Reducer 4 ', u'container', u'Killed', u'6', u'0', u'0', u'6', u'0', u'0']], footerSummary=u'VERTICES: 01/04', startTime=1668724609175, headerNames=[u'VERTICES', u'MODE', u'STATUS', u'TOTAL', u'COMPLETED', u'RUNNING', u'PENDING', u'FAILED', u'KILLED']), errorMessage=u'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Terminating unsuccessfully: Vertex failed, vertex_4073_0000_73_01. Map 1 1786 tasks 8692791 milliseconds: Failed, Some(Task unsuccessful: Map 1, task_4073_0000_73_01_001303, Rerun request, succeeded/failed/killed = 1/0/0
Rerun request, succeeded/failed/killed = 2/0/0
Rerun limit reached, succeeded/failed/killed = 3/0/0)', taskStatus=u'[{"returnValue":2,"errorMsg":"org.apache.hadoop.hive.ql.metadata.HiveException: Terminating unsuccessfully: Vertex failed, vertex_4073_0000_73_01. Map 1 1786 tasks 8692791 milliseconds: Failed, Some(Task unsuccessful: Map 1, task_4073_0000_73_01_001303, Rerun request, succeeded/failed/killed = 1/0/0\
Rerun request, succeeded/failed/killed = 2/0/0\
Rerun limit reached, succeeded/failed/killed = 3/0/0)","beginTime":1668724609109,"endTime":1668733302773,"taskId":"Stage-1","taskState":"FINISHED","taskType":"MAPRED","name":"TEZ","elapsedTime":8693664}]', operationState=5, hasResultSet=False, sqlState=u'08S01', operationStarted=1668724608827))

I've not seen an error like this before so I'm not sure what to look
for. When this happens, 2 of th2 12, worker nodes leave directories
with significant amounts of data under /data1/k8s. One node has two
dag_NNN direcotries and the other has one dag_NNN directory and an
attempt_* directory. Hopefully this will all mean something to you.
Until then, I'll start looking through logs hoping to find a clue.

> Our toughest transform will probably present different problems. It
> > processes one line of input at a time and writes the modifed row out
> > immediately. However, it builds some very, large, lookup tables that
> > can reach several GBs in total size. When I get to it, I will look to
> > see if I can delete some of the data that isn't needed after loading.
> > It's been a long time since I wrote that transform and our analysts
> > have added a lot more data to the loookup tables since then. Assuming
> > I can't reduce the memory footprint by much, do you know what types of
> > problems we might run into with it?
> >
>
> If Python processes consume too much memory, Docker containers might be
> killed.
>
> If you execute such queries infrequently, you could increase the value
> for hive.mr3.map.task.memory.mb and hive.mr3.reduce.task.memory.mb only for
> such queries (e.g., inside Beeline). Then, the Java process would consume
> less memory, thus leaving more memory for executing Python processes.
> However, I think this would require a trial-and-error approach because we
> don't control Python processes. (After all, I guess this is the reason why
> Hive stopped supporting TRANSFORM.)

I will keep this in mind when I get to that query but there's a lot
more testing and verification before then. We typically run 4
variations of this query per monthly process, but we also run ad hoc
variations of it as requested by our analysts.

Sungwoo Park

unread,
Nov 17, 2022, 11:18:21 PM11/17/22
to David Engel, MR3
Neither increasing the local, disk space to 100 GB nor deleting
vertex-local directories fixed the failure.  They just delayed the
inevitable hang where no more workers could get created and the query
was essentially stuck in limbo.

I have a quick question. Did you disable container reuse? Normally container reuse is enabled and a (long-running) worker Pod can execute any tasks produced from  any queries.
 
I've not seen an error like this before so I'm not sure what to look
for.  When this happens, 2 of th2 12, worker nodes leave directories
with significant amounts of data under /data1/k8s.  One node has two
dag_NNN direcotries and the other has one dag_NNN directory and an
attempt_* directory.  Hopefully this will all mean something to you.
Until then, I'll start looking through logs hoping to find a clue.

NNN in dag_NNN is the id of the DAG produced by a specific query. That is, all the intermediate data produced by a query is stored under dag_NNN directories for the same NNN. Normally these directories dag_NNN are deleted after the corresponding query is finished, but this cleanup operation is not perfect and may leave some files.

Can you get the log of the worker Pod that left directories
with significant amounts of data (which probably executed problematic tasks)?

Another thing is that you can attach multiple local disks to each worker node. If each local disk is allocated its own disk bandwidth, more local disks improve the performance (and maybe stability as well). For fault tolerance, Hive writes a lot of intermediate data (unlike Presto), so having more local disks is always good.

Sungwoo

David Engel

unread,
Nov 18, 2022, 12:40:55 AM11/18/22
to Sungwoo Park, MR3
On Fri, Nov 18, 2022 at 01:18:20PM +0900, Sungwoo Park wrote:
> >
> > Neither increasing the local, disk space to 100 GB nor deleting
> > vertex-local directories fixed the failure. They just delayed the
> > inevitable hang where no more workers could get created and the query
> > was essentially stuck in limbo.
>
> I have a quick question. Did you disable container reuse? Normally
> container reuse is enabled and a (long-running) worker Pod can execute any
> tasks produced from any queries.

No, we did not change that.

> > I've not seen an error like this before so I'm not sure what to look
> > for. When this happens, 2 of th2 12, worker nodes leave directories
> > with significant amounts of data under /data1/k8s. One node has two
> > dag_NNN direcotries and the other has one dag_NNN directory and an
> > attempt_* directory. Hopefully this will all mean something to you.
> > Until then, I'll start looking through logs hoping to find a clue.
> >
>
> NNN in dag_NNN is the id of the DAG produced by a specific query. That is,
> all the intermediate data produced by a query is stored under dag_NNN
> directories for the same NNN. Normally these directories dag_NNN are
> deleted after the corresponding query is finished, but this cleanup
> operation is not perfect and may leave some files.

That's what I eventually concluded. Thanks for confirming.

> Can you get the log of the worker Pod that left directories
> with significant amounts of data (which probably executed problematic
> tasks)?

I will try to get you the logs tomorrow. I have some other things
running now and I want to get all, applicable logs from a fresh,
server restart.

> Another thing is that you can attach multiple local disks to each worker
> node. If each local disk is allocated its own disk bandwidth, more local
> disks improve the performance (and maybe stability as well). For fault
> tolerance, Hive writes a lot of intermediate data (unlike Presto), so
> having more local disks is always good.

I don't expect adding disk will change anything. All worker nodes are
VMs in a large, VM cluster with all, virtual disks coming from the
same source. The available bandwidth will be the same whether there
is one, virtual disk or ten.

Sungwoo Park

unread,
Nov 19, 2022, 3:32:28 AM11/19/22
to David Engel, MR3
From your log, worker Pods executing tasks of Reducer 2 fail and get deleted. We see usual symptoms such as fetch failure, task rerun, and ContainerWorker failure, all of which are probably caused by the execution of Python processes. Since we can't control the resources allocated to Python processes, I think we should try to increase both the number of Reducer tasks and their resources, as you suggest.

Here is my suggestion.

1. Check if the local disk is clean. Delete leftover dag_NNN directories if any.

2. Before executing the main query, run a few queries so as to populate the cluster with the maximum number of worker Pods. This reduces the chance of data skew and also increases the number of reducers (which takes into account the total resources available to Hive-MR3).

3. Increase the resource for reducers by adjusting hive.mr3.reduce.task.memory.mb (and hive.mr3.reduce.task.vcores). In your experiment, two reducers happen to be assigned to the same worker Pod. To prevent such bad cases, we employ a trick to prevent a single worker Pod from running two or more concurrent reducer tasks. 

This trick is usually unnecessary for executing common Hive queries, but for your query, we don't know exactly what Python processes do, so we limit just one reducer task to each worker Pod.

4. Increase the value of hive.tez.llap.min.reducer.per.executor. The default is 0.2, but you can increase it to 1.0 or even 2.0. A higher value leads to more reducer tasks (before applying auto parallelism).

5. Update the configurations for auto parallelism by setting tez.shuffle-vertex-manager.auto-parallel.max.reduction.percentage to 100 (which effectively disables auto parallelism in typical cases) and setting tez.shuffle-vertex-manager.auto-parallel.min.num.tasks to a large value (in order not to trigger auto parallelism).

In this way, hopefully you will create many reducer tasks to distribute the load across many worker Pods. In step 3, 4, and 5, you can change the configuration parameters inside Beeline (or your client) without having to restart HiveServer2. For documentation, please see: https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning/

Also, it might help to watch the progress of the execution and terminate it quickly and retry (after updating config parameters) if it does not create enough reducers (like only 6 reducers for Reduce 60, rather than waiting until the query fails.

As a last resort, you could try worker nodes with larger capacity (e.g., 120GB per worker node). As I don't really know the behavior of Python processes, I guess this is all I can think of right now. 

Cheers,

Sungwoo

Reply all
Reply to author
Forward
0 new messages