On Tue, Sep 27, 2022 at 09:31:39AM +0900, Sungwoo Park wrote:
> >
> > What is the preferred way, if any, to enable Hive transforms using the
> > new, typescript configuration? It appears that
> > hive.security.authorization.enabled is hard-coded to true and we need
> > to set it to false. We can change it back to true after we refactor
> > our existing transforms but that is likely months down the road.
> >
>
> For using Hive transforms, please see 'Using Python scripts':
>
https://mr3docs.datamonad.com/docs/k8s/user/use-udf/
>
> 1.
> hiveEnv.authorization in run.ts should be set
> to SQLStdConfOnlyAuthorizerFactory or SQLStdHiveAuthorizerFactory.
I'd already done that.
> 2.
> To set hive.security.authorization.enabled to false,
> update src/server/resources/hive-site.xml before executing ts-node to
> generate run.yaml:
>
> <property>
> <name>hive.security.authorization.enabled</name>
> <value>false</value>
> </property>
That's what I expected as most users won't want to change it. Just
wanted to make sure there wasn't a better way.
> Alternatively, you could directly update
> hive.security.authorization.enabled in run.yaml (in 3 places) after
> executing ts-node.
That's what I did for testing.
> 3.
> If you don't mount Python scripts in worker Pods,
> mr3.container.localize.python.working.dir.unsafe should be set to true in
> mr3-site.xml. Similarly to setting hive.security.authorization.enabled, you
> could update src/server/resources/mr3-site.xml before executing ts-node (or
> update run.yaml after executing ts-node).
>
> <property>
> <name>mr3.container.localize.python.working.dir.unsafe</name>
> <value>true</value>
> </property>
We will be mounting a conda and the scripts via NFS. That's the
reason for my followup question on NFS mounts.
Thanks.
> As a VM should allocate a bit of resources for Kubernetes, you would have
> to adjust the CPU and memory slightly and check if two workers are created
> on each VM, e.g:
>
> const workerEnv: worker.T = {
> workerMemoryInMb: 60 * 1024,
> workerCores: 7.75,
Does Kubernetes have any hard-coded amounts for its overhead or does
it try to dynamically figure out more exact values? I've already run
into this issue as MinIO recommended 8 cores so that's what we
configured the MinIO VMs to have. Kubernetes wouldn't schedule the
MinIO pods until I changed MinIO to only request 7 cores.
> Creating a single worker (with 128GB, 16 cores, 16 concurrent tasks) on
> each node is fine, but I think it will be slower than creating two smaller
> workers. Or, you could just create VMs each with 64GB and 8 cores.
For now, I'll stick with the slightly, smaller requirements so 1 VM
can hold 2 workers. That is, unless you think smaller VMs with 1
worker each would be noticeably faster. Configuring more nodes is
would be fine by me, but IT would probably complain.
> For running Hive-MR3 in production, you need to allocate enough resources
> to HiveServer2, Metastore, and MR3 master. The resources would depend on
> the number of concurrent users and the data size. This page has baseline
> settings for 20 concurrent users, and you could update adjust as necessary
> to meet your requirements:
>
https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning/
Thanks. I'l take a look at that page. I currently have 2 VMs of 16
cores and 128 GIB for running those processes, the Kubernetes
control-plane and MySQL. If needed, I can get more VMs or cores and
memory for the existing VMs.