Help Configuring Hive/Spark Using Typescript

47 views
Skip to first unread message

David Engel

unread,
Sep 20, 2022, 6:37:38 PM9/20/22
to MR3
Hi,

After concluding our evaluation on a test, Kubernetes cluster, I got
temporarily redirected. I'm now back working on building and
configuring a production cluster with Hive, Spark and all the extras.
I'm following the instructions on
https://mr3docs.datamonad.com/docs/quick/k8s/typescript/ . I'm trying
to keep things as generic as possible until we have a basic setup
working. I'll add our extra, site customizations later.

I've run into the following minor and one blocking issues so far and
am in need of help.

These are minor.

1. It seems our version of Kubernetes is newer than was used when
creating the instructions. I needed to use "apiVersion:
rbac.authorization.k8s.io/v1" instead of ".../v1beta1" for the
ClusterRoleBinding and RoleBinding declarations.

2. I had to change the mode of the root, workdir PV to 777 in order
for the apache-0 pod to startup. All other users fine with 755 and
UID 1000 but apache appears to be running as a differetn user that
gets mapped to nobody:nogroup.

3. What is the user/password, or where are they specified, for the
Superset web interface running on port 8080 by default?

Here is the blocker.

4. The metastore-0 pod keeps periodically shutting down and restarting
and never publishes its service on port 9851 on any cluster IP.
This is preventing the hiveserver2 pods from initializing. I don't
see any errors in the metastore-0 logs. It did login to MariaDB
and initialize the schema. I'm stumped as to what else it is
needing.

David
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Sep 20, 2022, 11:34:24 PM9/20/22
to David Engel, MR3
Hello,

3. What is the user/password, or where are they specified, for the
   Superset web interface running on port 8080 by default?

The initial password is admin/admin and you can change the password after the initial login. In the current release of MR3, Superset uses SQLite and stores all its data in the directory 'superset' under the PersistentVolume (/opt/mr3-run/work-dir/superset/superset.db). If you would like to use an external database, the configuration for Superset should be revised.
 
Here is the blocker.

4. The metastore-0 pod keeps periodically shutting down and restarting
   and never publishes its service on port 9851 on any cluster IP.
   This is preventing the hiveserver2 pods from initializing.  I don't
   see any errors in the metastore-0 logs.  It did login to MariaDB
   and initialize the schema.  I'm stumped as to what else it is
   needing.

Metastore does not publish its service using a cluster IP because it is used only internally inside the Kubernetes cluster, e.g.:

$ kubectl get svc -n hivemr3
NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                                        AGE
...
metastore               ClusterIP      None             <none>           9851/TCP                                       14m

If there is no problem with configuring Metastore, could you check if the Metastore pod gets killed by Kubernetes for some reason? E.g., for allocating too small an amount of memory to Metastore, or for failing to answer liveness probes.

Could you share the last part of the log just before a Metastore Pod is terminated? If the Pod is killed for some reason, the last part of the log would look like:

2022-09-21T03:32:41,780  INFO [shutdown-hook-0] metastore.HiveMetaStore: Shutting down hive metastore.
   PID TTY          TIME CMD
    91 ?        00:00:15 java
Waiting for process 91 to stop...
2022-09-21T03:32:41,793  INFO [shutdown-hook-0] metastore.HiveMetaStore: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HiveMetaStore at metastore-0.metastore.hivemr3.svc.cluster.local/11.44.0.1
************************************************************/
   PID TTY          TIME CMD
Process 91 stopped.

Cheers,

--- Sungwoo

David Engel

unread,
Sep 21, 2022, 12:43:23 AM9/21/22
to Sungwoo Park, MR3
On Wed, Sep 21, 2022 at 12:34:11PM +0900, Sungwoo Park wrote:
> Hello,
>
> 3. What is the user/password, or where are they specified, for the
> > Superset web interface running on port 8080 by default?
> >
>
> The initial password is admin/admin and you can change the password after
> the initial login. In the current release of MR3, Superset uses SQLite and
> stores all its data in the directory 'superset' under the PersistentVolume
> (/opt/mr3-run/work-dir/superset/superset.db). If you would like to use an
> external database, the configuration for Superset should be revised.

Thanks. I would have sworn I'd tried admin/admin eariier as a guess
but it seems not.

> > Here is the blocker.
> >
> > 4. The metastore-0 pod keeps periodically shutting down and restarting
> > and never publishes its service on port 9851 on any cluster IP.
> > This is preventing the hiveserver2 pods from initializing. I don't
> > see any errors in the metastore-0 logs. It did login to MariaDB
> > and initialize the schema. I'm stumped as to what else it is
> > needing.
> >
>
> Metastore does not publish its service using a cluster IP because it is
> used only internally inside the Kubernetes cluster, e.g.:
>
> $ kubectl get svc -n hivemr3
> NAME TYPE CLUSTER-IP EXTERNAL-IP
> PORT(S) AGE
> ...
> metastore ClusterIP None <none>
> 9851/TCP 14m

I'm still relatively new to Kubernetes. I thought a CLUSTER-IP was
still allocated for internal services and an EXTERNAL-IP was only
allocated when an ingress or some other form of external access was
created.

> If there is no problem with configuring Metastore, could you check if the
> Metastore pod gets killed by Kubernetes for some reason? E.g., for
> allocating too small an amount of memory to Metastore, or for failing to
> answer liveness probes.
>
> Could you share the last part of the log just before a Metastore Pod is
> terminated? If the Pod is killed for some reason, the last part of the log
> would look like:

I've attached an entire log. There are a handful of warnings that are
all ignored. I'm guessing one or more of them are important. Could
this be a case of a subtle difference between MariaDB and MySQL? I
prefer the former so decided to use it here on the production cluster.
If it is the problem, I'll switch to MySQL tomorrow.

Davi

> 2022-09-21T03:32:41,780 INFO [shutdown-hook-0] metastore.HiveMetaStore:
> Shutting down hive metastore.
> PID TTY TIME CMD
> 91 ? 00:00:15 java
> Waiting for process 91 to stop...
> 2022-09-21T03:32:41,793 INFO [shutdown-hook-0] metastore.HiveMetaStore:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down HiveMetaStore at
> metastore-0.metastore.hivemr3.svc.cluster.local/11.44.0.1
> ************************************************************/
> PID TTY TIME CMD
> Process 91 stopped.
>
> Cheers,
>
> --- Sungwoo

--
David Engel
da...@istwok.net
metastore-0.log

Sungwoo Park

unread,
Sep 21, 2022, 1:53:14 AM9/21/22
to David Engel, MR3
I've attached an entire log.  There are a handful of warnings that are
all ignored.  I'm guessing one or more of them are important.  Could
this be a case of a subtle difference between MariaDB and MySQL?  I
prefer the former so decided to use it here on the production cluster.
If it is the problem, I'll switch to MySQL tomorrow.

I can't figure out why Metastore fails, but as you suspect, it seems to be a problem with initializing the Metastore schema. From the log, I see:

1. You don't initialize the Metastore schema (by setting metastoreEnv.initSchema to true in run.ts).

2. However, the log messages are usually printed only when initializing the Metastore schema for the first time.

2022-09-21T04:21:57,625  INFO [main] metastore.HiveMetaStore: Setting location of default catalog, as it hasn't been done after upgrade
2022-09-21T04:21:58,057  INFO [main] beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2022-09-21T04:21:58,080  INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-s3a-file-system.properties
2022-09-21T04:21:58,133  INFO [main] impl.MetricsSystemImpl: Scheduled Metric snapshot period at 180 second(s). 
2022-09-21T04:21:58,134 INFO [main] impl.MetricsSystemImpl: s3a-file-system metrics system started 

I can think of two experiments.

1. Try using MySQL which is known to work with Hive. My guess is that if there is a problem with MariaDB at all, it is due to some configuration of MariaDB, rather than the difference from MySQL.

If you would like to use MariaDB, then check if the database for Metastore is properly created after initializing the Metastore schema.

2. Use DEBUG logging level and get more details (by updating typescript/src/server/resources/hive-log4j2.properties and running ts-node again).

Cheers,

--- Sungwoo

David Engel

unread,
Sep 21, 2022, 11:35:21 AM9/21/22
to Sungwoo Park, MR3
On Wed, Sep 21, 2022 at 02:53:01PM +0900, Sungwoo Park wrote:
> >
> > I've attached an entire log. There are a handful of warnings that are
> > all ignored. I'm guessing one or more of them are important. Could
> > this be a case of a subtle difference between MariaDB and MySQL? I
> > prefer the former so decided to use it here on the production cluster.
> > If it is the problem, I'll switch to MySQL tomorrow.
> >
>
> I can't figure out why Metastore fails, but as you suspect, it seems to be
> a problem with initializing the Metastore schema. From the log, I see:
>
> 1. You don't initialize the Metastore schema (by
> setting metastoreEnv.initSchema to true in run.ts).

I had set initSchema to false after the first run and tables had been
created in the database.

> I can think of two experiments.
>
> 1. Try using MySQL which is known to work with Hive. My guess is that if
> there is a problem with MariaDB at all, it is due to some configuration of
> MariaDB, rather than the difference from MySQL.

I'm seeing the same behavior with MySQL. Here is the log snippet with
MySQL and initSchema set to true.

Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql













^M ^M ^M







^M ^M ^M ^M



^M ^M ^M ^M

^M
Error: Table 'CTLGS' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory

> If you would like to use MariaDB, then check if the database for Metastore
> is properly created after initializing the Metastore schema.

As noted above, the tables in the MariaDB/MySQL database appeared to
be all created.

> 2. Use DEBUG logging level and get more details (by
> updating typescript/src/server/resources/hive-log4j2.properties and running
> ts-node again).

I will enable this soon and report back later.

Davi
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Sep 21, 2022, 12:08:30 PM9/21/22
to David Engel, MR3

 ^M
Error: Table 'CTLGS' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory

> If you would like to use MariaDB, then check if the database for Metastore
> is properly created after initializing the Metastore schema.

As noted above, the tables in the MariaDB/MySQL database appeared to
be all created.

The above log says that the initialization of the database for Metastore failed. If the initialization succeeds, one can see log like:

 ^M

Initialization script completed
schemaTool completed

[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory

My suggestion is:

1. When you set metastoreEnv.initSchema to true, can you check that the database specified in metastoreEnv.databaseName does exist? In your case, the log says that the database already exists.

2. Or, you can set metastoreEnv.initSchema to true and metastoreEnv.databaseName to a new database name.

Cheers,

--- Sungwoo

David Engel

unread,
Sep 21, 2022, 1:09:31 PM9/21/22
to Sungwoo Park, MR3
Previously with MR1.4, I created the database and user and granted all
privileges on the already created database. This seemingly worked
without issue so that is what I did with MR1.5.

I just retried after dropping the database and granting the privilege
to create new databases. IOW, I started from the cleanest state I
can. Attached is the compressed, metastore log with the level changed
to DEBUG. In addition, here is the metastore section from my run.ts
file.

const metastoreEnv: metastore.T = {
kind: "internal",
host: "",
port: 0,
dbType: "MYSQL",
databaseHost: "10.1.6.155",
databaseName: "hive3mr3",
userName: "hivemr3",
password: "xxx",
initSchema: true,
resources: {
cpu: 2,
memoryInMb: 4 * 1024
},
enableMetastoreDatabaseSsl: false
};

With this configuration, the database gets created with 74 tables.
metastore.log.gz

David Engel

unread,
Sep 21, 2022, 1:13:48 PM9/21/22
to Sungwoo Park, MR3
I forgot to add this is with mysql-server-8.0.30-0ubuntu0.22.04.1 from
Ubuntu 22.04 and the only change from the Ubuntu, default, MySQL
configuration is to allow network connections.

Sungwoo Park

unread,
Sep 21, 2022, 1:49:58 PM9/21/22
to David Engel, MR3
Your log says that the database for Metastore was initialized successfully:

Initialization script completed
schemaTool completed

The database should have 74 tables, e.g.:

mysql> show tables;
+-------------------------------+
| Tables_in_hive3mr3            |
+-------------------------------+
| AUX_TABLE                     |
...
| WRITE_SET                     |
+-------------------------------+
74 rows in set (0.01 sec)

So, everything seems ready and I think you can populate the data warehouse. It is also okay to terminate and restart Metastore with initSchema still set to true because Metastore recognizes an existing database.

If you still have a problem with Metastore, please let me know.

Cheers,

--- Sungwoo

David Engel

unread,
Sep 21, 2022, 2:26:53 PM9/21/22
to Sungwoo Park, MR3
On Thu, Sep 22, 2022 at 02:49:45AM +0900, Sungwoo Park wrote:
> Your log says that the database for Metastore was initialized successfully:
>
> Initialization script completed
> schemaTool completed
>
> The database should have 74 tables, e.g.:
>
> mysql> show tables;
> +-------------------------------+
> | Tables_in_hive3mr3 |
> +-------------------------------+
> | AUX_TABLE |
> ...
> | WRITE_SET |
> +-------------------------------+
> 74 rows in set (0.01 sec)
>
> So, everything seems ready and I think you can populate the data warehouse.
> It is also okay to terminate and restart Metastore with initSchema still
> set to true because Metastore recognizes an existing database.

That's what I was doing before and still had problems.

> If you still have a problem with Metastore, please let me know.

Here are two, new logs. The first is another, metastore log with
initSchema now set to false. The second is from hiveserver2. It is
getting connection refused when trying to connect to metastore.
metastore.log.gz
hiveserver.log.gz

David Engel

unread,
Sep 21, 2022, 6:11:29 PM9/21/22
to Sungwoo Park, MR3
Okay, I think I've identified the core problem but don't have a fix
yet.

For this cluster, we enabled TLS on our MinIO installation within
Kubernetes. I used the "auto cert" option which is supposed to
automatically work for S3 access within the cluster. For external
access to S3, I provided a custom cert and CA cert for the cluster.
This all seems to work in isolation.

Now, it appears metastore is trying to access/test the warehouse
location at s3a://hivemr3/warehouse and is unable to validate the
server's cert. I'm guessing I need to do one of the following two
things. One, specify the S3 endpoint differently (perhaps by name
instead of IP address) so the server cert can be validated using the
internal, CA cert. Two, make the custom, CA cert available to
metastore and hiveserver2. I don't know which is better or even
feasible.

Sungwoo Park

unread,
Sep 21, 2022, 10:52:06 PM9/21/22
to David Engel, MR3
We didn't try the first option, so I don't have any comments yet. From my understanding, Metastore should still be configured somehow so as to use the cluster certificate placed in the Pod (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt).

The second option is explained at this page:

For your setup, these fields should be set:

basicsEnv.s3aEndpoint
basicsEnv.s3aEnableSsl
basicsEnv.s3aCredentialProvider
secretEnv.ssl.*
secretEnv.secretEnvVars (if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are used)

For secretEnv.ssl, you need to create TrustStore and KeyStore. 'Creating certificates and secrets' section in the above page shows how to create  TrustStore and KeyStore using a script generate-ssl.sh included in the mr3-run-k8s distribution. You need the public certificate of MinIO, and should set S3_CERTIFICATE in generate-ssl.sh.

It's also okay to create TrustStore and KeyStore manually (https://mr3docs.datamonad.com/docs/k8s/advanced/enable-ssl/), but it will take more time.

Cheers,

--- Sungwoo

David Engel

unread,
Sep 22, 2022, 6:56:05 PM9/22/22
to Sungwoo Park, MR3
Ugh! Is there really no simpler way to have Hive and Metastore accept
a custom CA when accessing S3? I expect we'll need to fully configure
TLS on everything when/if we enable Ranger but I'd hoped to defer that
to way down the road. We have some non-trivial, application changes
to make before we can even consider that.

For now, then, I've disabled TLS on MinIO and now have Hive working on
very, trivial queries and very, limited data using the default, run.ts
configuration. I'll soon start applying the remaining configuration
changes we expect to need.

Sungwoo, as always, thank you very much for you prompt and detailed
help. I expect I'll probably need it again before too long. :)

David
--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Sep 22, 2022, 10:31:36 PM9/22/22
to David Engel, MR3
If you use Typescript code in mr3-run-k8s, enabling SSL (TLS) is actually quite simple. The hardest part is to create TrustStore and KeyStore.

The shell script generate-ssl.sh included in mr3-run-k8s makes it easy to create TrustStore and KeyStore. One could try to manually create TrustStore and KeyStore, but would eventually end up executing the commands in generate-ssl.sh.

If the Kubernetes cluster is secured and can be accessed only by admin users, it suffices to enable HTTPS on the Apache server and the external HiveServer2. If your MariaDB database for Metastore (and Ranger) runs outside the Kubernetes cluster, you need to enable SSL on the database, too. So, I think the amount of extra work for enabling SSL depends on the configuration of your cluster.
 
mr3-typescript-components-fs8.png

--- Sungwoo

David Engel

unread,
Sep 24, 2022, 12:26:31 PM9/24/22
to Sungwoo Park, MR3
Thanks. I'll check out generate-ssl.sh later. Right now, I'd prefer
to get the rest of the cluster built and configured.

David
--
David Engel
da...@istwok.net

David Engel

unread,
Sep 26, 2022, 6:02:54 PM9/26/22
to Sungwoo Park, MR3
On Thu, Sep 22, 2022 at 05:56:01PM -0500, David Engel wrote:
> Sungwoo, as always, thank you very much for you prompt and detailed
> help. I expect I'll probably need it again before too long. :)

What is the preferred way, if any, to enable Hive transforms using the
new, typescript configuration? It appears that
hive.security.authorization.enabled is hard-coded to true and we need
to set it to false. We can change it back to true after we refactor
our existing transforms but that is likely months down the road.

The default, worker settings are as follows:

workerMemoryInMb: 16 * 1024,
workerCores: 4,
numTasksInWorker: 4,
numMaxWorkers: 10,

What would you recommend for a cluster with 12 workers with each
worker having 16 cores and 128 GiB memory? We will mainly be running
Hive initially but I expect Spark usage to pick up over time. Also,
it's likely we'll be running Trino on some workers, at least initially
-- old habits are hard to break. If Hive performs well enough on the
new cluster, we can reduce or even eliminate the resources given to
Trino.

Would a different cores/memory ratio for workers work better? All
nodes are VMs so we have a lot of flexibility in how the resources are
allocated.

David Engel

unread,
Sep 26, 2022, 6:11:35 PM9/26/22
to Sungwoo Park, MR3
I forgot a question.

What is the preferred way with the new, typescript configuration to
add additional, NFS mounts within the hiveserver2 and worker images?
Besides workdir, we have 2 more mounts we want to make available.

Sungwoo Park

unread,
Sep 26, 2022, 8:31:51 PM9/26/22
to David Engel, MR3
What is the preferred way, if any, to enable Hive transforms using the
new, typescript configuration?  It appears that
hive.security.authorization.enabled is hard-coded to true and we need
to set it to false.  We can change it back to true after we refactor
our existing transforms but that is likely months down the road.

For using Hive transforms, please see 'Using Python scripts':

1.
hiveEnv.authorization in run.ts should be set to SQLStdConfOnlyAuthorizerFactory or SQLStdHiveAuthorizerFactory.

2.
To set hive.security.authorization.enabled to false, update src/server/resources/hive-site.xml before executing ts-node to generate run.yaml:

<property>
  <name>hive.security.authorization.enabled</name>
  <value>false</value>
</property>

Alternatively, you could directly update hive.security.authorization.enabled in run.yaml (in 3 places) after executing ts-node.

3.
If you don't mount Python scripts in worker Pods, mr3.container.localize.python.working.dir.unsafe should be set to true in mr3-site.xml. Similarly to setting hive.security.authorization.enabled, you could update src/server/resources/mr3-site.xml before executing ts-node (or update run.yaml after executing ts-node).

<property>
  <name>mr3.container.localize.python.working.dir.unsafe</name>
  <value>true</value>
</property>

The default, worker settings are as follows:

  workerMemoryInMb: 16 * 1024,
  workerCores: 4,
  numTasksInWorker: 4,
  numMaxWorkers: 10,

What would you recommend for a cluster with 12 workers with each
worker having 16 cores and 128 GiB memory?  We will mainly be running
Hive initially but I expect Spark usage to pick up over time.  Also,
it's likely we'll be running Trino on some workers, at least initially
-- old habits are hard to break.  If Hive performs well enough on the
new cluster, we can reduce or even eliminate the resources given to
Trino.

Would a different cores/memory ratio for workers work better?  All
nodes are VMs so we have a lot of flexibility in how the resources are
allocated.

In production, I would recommend:

1) at least 6GB per core, and preferably 8GB per core.
2) 8 concurrent tasks per worker, and no more than 16 concurrent tasks per worker.

So, if each VM has 16 cores and 128 GiB memory, you could run two workers on each VM: 

const workerEnv: worker.T = {
  workerMemoryInMb: 64 * 1024,
  workerCores: 8,
  numTasksInWorker: 8,
  numMaxWorkers: 100,
  llapIoEnabled: false,
  llapIo: {
    memoryInGb: 20,
    memoryMapped: false
  },
  useSoftReference: false,
  tezIoSortMb: 1040,
  tezUnorderedOutputBufferSizeInMb: 307,
  noConditionalTaskSize: 1145044992,
  maxReducers: 1009,
  javaHeapFraction: 0.7,
  numShuffleHandlersPerWorker: 8,
  useShuffleHandlerProcess: true,
  numThreadsPerShuffleHandler: 10,
  enableShuffleSsl: false
};

As a VM should allocate a bit of resources for Kubernetes, you would have to adjust the CPU and memory slightly and check if two workers are created on each VM, e.g:

const workerEnv: worker.T = {
  workerMemoryInMb: 60 * 1024,
  workerCores: 7.75,

Creating a single worker (with 128GB, 16 cores, 16 concurrent tasks) on each node is fine, but I think it will be slower than creating two smaller workers. Or, you could just create VMs each with 64GB and 8 cores.

For running Hive-MR3 in production, you need to allocate enough resources to HiveServer2, Metastore, and MR3 master. The resources would depend on the number of concurrent users and the data size. This page has baseline settings for 20 concurrent users, and you could update adjust as necessary to meet your requirements: https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning/

Cheers,

--- Sungwoo

Sungwoo Park

unread,
Sep 26, 2022, 9:13:00 PM9/26/22
to David Engel, MR3
What is the preferred way with the new, typescript configuration to
add additional, NFS mounts within the hiveserver2 and worker images?
Besides workdir, we have 2 more mounts we want to make available.

This is not feasible with the current typescript code, and extending the typescript code would also be quite complicated. A practical (but somewhat ugly) solution would be to manually update run.yaml after executing ts-node.

1. We can add PersistentVolumes sections in run.yaml to create new PersistentVolumes.
2. We can add PersistentVolumeClaim sections in run.yaml to create new PersistentVolumeClaims.
3. We can expand Deployment for HiveServer2 to mount new PersistentVolumeClaims.
4. We can update ConfigMap for env.sh to update WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR.

I wonder if you could put all the data in the same NFS mount points. For example, could you put the additional two directories under the directory for workdir? It should not be a problem for running Hive-Mr3 and would also make everything simpler.

Cheers,

--- Sungwoo

David Engel

unread,
Sep 26, 2022, 9:31:13 PM9/26/22
to Sungwoo Park, MR3
On Tue, Sep 27, 2022 at 09:31:39AM +0900, Sungwoo Park wrote:
> >
> > What is the preferred way, if any, to enable Hive transforms using the
> > new, typescript configuration? It appears that
> > hive.security.authorization.enabled is hard-coded to true and we need
> > to set it to false. We can change it back to true after we refactor
> > our existing transforms but that is likely months down the road.
> >
>
> For using Hive transforms, please see 'Using Python scripts':
> https://mr3docs.datamonad.com/docs/k8s/user/use-udf/
>
> 1.
> hiveEnv.authorization in run.ts should be set
> to SQLStdConfOnlyAuthorizerFactory or SQLStdHiveAuthorizerFactory.

I'd already done that.

> 2.
> To set hive.security.authorization.enabled to false,
> update src/server/resources/hive-site.xml before executing ts-node to
> generate run.yaml:
>
> <property>
> <name>hive.security.authorization.enabled</name>
> <value>false</value>
> </property>

That's what I expected as most users won't want to change it. Just
wanted to make sure there wasn't a better way.

> Alternatively, you could directly update
> hive.security.authorization.enabled in run.yaml (in 3 places) after
> executing ts-node.

That's what I did for testing.

> 3.
> If you don't mount Python scripts in worker Pods,
> mr3.container.localize.python.working.dir.unsafe should be set to true in
> mr3-site.xml. Similarly to setting hive.security.authorization.enabled, you
> could update src/server/resources/mr3-site.xml before executing ts-node (or
> update run.yaml after executing ts-node).
>
> <property>
> <name>mr3.container.localize.python.working.dir.unsafe</name>
> <value>true</value>
> </property>

We will be mounting a conda and the scripts via NFS. That's the
reason for my followup question on NFS mounts.
Thanks.

> As a VM should allocate a bit of resources for Kubernetes, you would have
> to adjust the CPU and memory slightly and check if two workers are created
> on each VM, e.g:
>
> const workerEnv: worker.T = {
> workerMemoryInMb: 60 * 1024,
> workerCores: 7.75,

Does Kubernetes have any hard-coded amounts for its overhead or does
it try to dynamically figure out more exact values? I've already run
into this issue as MinIO recommended 8 cores so that's what we
configured the MinIO VMs to have. Kubernetes wouldn't schedule the
MinIO pods until I changed MinIO to only request 7 cores.

> Creating a single worker (with 128GB, 16 cores, 16 concurrent tasks) on
> each node is fine, but I think it will be slower than creating two smaller
> workers. Or, you could just create VMs each with 64GB and 8 cores.

For now, I'll stick with the slightly, smaller requirements so 1 VM
can hold 2 workers. That is, unless you think smaller VMs with 1
worker each would be noticeably faster. Configuring more nodes is
would be fine by me, but IT would probably complain.

> For running Hive-MR3 in production, you need to allocate enough resources
> to HiveServer2, Metastore, and MR3 master. The resources would depend on
> the number of concurrent users and the data size. This page has baseline
> settings for 20 concurrent users, and you could update adjust as necessary
> to meet your requirements:
> https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning/

Thanks. I'l take a look at that page. I currently have 2 VMs of 16
cores and 128 GIB for running those processes, the Kubernetes
control-plane and MySQL. If needed, I can get more VMs or cores and
memory for the existing VMs.

David Engel

unread,
Sep 26, 2022, 9:45:22 PM9/26/22
to Sungwoo Park, MR3
On Tue, Sep 27, 2022 at 10:12:49AM +0900, Sungwoo Park wrote:
> >
> > What is the preferred way with the new, typescript configuration to
> > add additional, NFS mounts within the hiveserver2 and worker images?
> > Besides workdir, we have 2 more mounts we want to make available.
> >
>
> This is not feasible with the current typescript code, and extending the
> typescript code would also be quite complicated. A practical (but somewhat
> ugly) solution would be to manually update run.yaml after executing ts-node.

Post-processing run.yaml is not a problem. I'm already doing that to
fix up the apiVersion for RoleBinding and ClusterRoleBinding. The
version of Kubernetes we have no longer accepts v1beta1.

> 1. We can add PersistentVolumes sections in run.yaml to create new
> PersistentVolumes.
> 2. We can add PersistentVolumeClaim sections in run.yaml to create new
> PersistentVolumeClaims.
> 3. We can expand Deployment for HiveServer2 to mount new
> PersistentVolumeClaims.
> 4. We can update ConfigMap for env.sh to
> update WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR.
>
> I wonder if you could put all the data in the same NFS mount points. For
> example, could you put the additional two directories under the directory
> for workdir? It should not be a problem for running Hive-Mr3 and would also
> make everything simpler.

I'll have to think hard about this. The 2, NFS mounts are the /home
directories and the "big data" mount on our client system. There is a
lot of code which expects those mounts in their "normal" places.

What about custom, Docker images? I prefer to use stock whenever
possible, but adding the 2 mounts to your Hive DockerFile might be the
simpler path.

Sungwoo Park

unread,
Sep 26, 2022, 9:47:51 PM9/26/22
to David Engel, MR3
> As a VM should allocate a bit of resources for Kubernetes, you would have
> to adjust the CPU and memory slightly and check if two workers are created
> on each VM, e.g:
>
> const workerEnv: worker.T = {
>   workerMemoryInMb: 60 * 1024,
>   workerCores: 7.75,

Does Kubernetes have any hard-coded amounts for its overhead or does
it try to dynamically figure out more exact values?  I've already run
into this issue as MinIO recommended 8 cores so that's what we
configured the MinIO VMs to have.  Kubernetes wouldn't schedule the
MinIO pods until I changed MinIO to only request 7 cores.

I think Kubernetes tries to use a fixed amount of resources, but this doesn't matter because each node has a different amount of available resources anyway. This is indeed annoying, especially on public clouds like Amazon EKS where the consumption by Kubernetes varies on each node. In on-prem clusters, however, the variation is much smaller, so the calculation should be easier. On your VM cluster, I am afraid you should go through the same pain (but only once).

Because of this problem, workerEnv.workerCores can take fractions (e.g., 7.75).
 
> Creating a single worker (with 128GB, 16 cores, 16 concurrent tasks) on
> each node is fine, but I think it will be slower than creating two smaller
> workers. Or, you could just create VMs each with 64GB and 8 cores.

For now, I'll stick with the slightly, smaller requirements so 1 VM
can hold 2 workers.  That is, unless you think smaller VMs with 1
worker each would be noticeably faster.  Configuring more nodes is
would be fine by me, but IT would probably complain.

This sounds good.
 
> For running Hive-MR3 in production, you need to allocate enough resources
> to HiveServer2, Metastore, and MR3 master. The resources would depend on
> the number of concurrent users and the data size. This page has baseline
> settings for 20 concurrent users, and you could update adjust as necessary
> to meet your requirements:
> https://mr3docs.datamonad.com/docs/k8s/performance/performance-tuning/

Thanks.  I'l take a look at that page.  I currently have 2 VMs of 16
cores and 128 GIB for running those processes, the Kubernetes
control-plane and MySQL.  If needed, I can get more VMs or cores and
memory for the existing VMs.

Using a single machine with 16 core and 128GB memory for all of HS2, Metastore, and MR3 master could work okay and is also good for security purposes. Using two machines is also definitely okay.

--- Sungwoo

Sungwoo Park

unread,
Sep 26, 2022, 10:12:07 PM9/26/22
to David Engel, MR3
> I wonder if you could put all the data in the same NFS mount points. For
> example, could you put the additional two directories under the directory
> for workdir? It should not be a problem for running Hive-Mr3 and would also
> make everything simpler.

I'll have to think hard about this.  The 2, NFS mounts are the /home
directories and the "big data" mount on our client system.  There is a
lot of code which expects those mounts in their "normal" places.

What about custom, Docker images?  I prefer to use stock whenever
possible, but adding the 2 mounts to your Hive DockerFile might be the
simpler path.

For using Python scripts, you need to create a new Docker image anyway, so another option might be to use symbolic links. The plan is:

1. When you create a Docker image, create two symbolic links, e.g.:
/home to /opt/mr3-run/work-dir/home
/big_data to /opt/mr3-run/work-dir/big_data

2. Use a single NFS mount, which has two sub-directories or symbolic links for "home" and "big_data".

In this way, we can use a single NFS mount and put all the data in normal places. If step 1 does not work (not sure if this works), we could use init-containers to create the two symbolic links when Pods start.

If you can use a single NFS mount to provide all the data, please let me know. If so, let me try step 1.

--- Sungwoo

David Engel

unread,
Sep 27, 2022, 11:16:55 AM9/27/22
to Sungwoo Park, MR3
On Tue, Sep 27, 2022 at 11:11:56AM +0900, Sungwoo Park wrote:
> >
> > > I wonder if you could put all the data in the same NFS mount points. For
> > > example, could you put the additional two directories under the directory
> > > for workdir? It should not be a problem for running Hive-Mr3 and would
> > also
> > > make everything simpler.
> >
> > I'll have to think hard about this. The 2, NFS mounts are the /home
> > directories and the "big data" mount on our client system. There is a
> > lot of code which expects those mounts in their "normal" places.
> >
> > What about custom, Docker images? I prefer to use stock whenever
> > possible, but adding the 2 mounts to your Hive DockerFile might be the
> > simpler path.
> >
>
> For using Python scripts, you need to create a new Docker image anyway, so

I don't believe this is strictly true. We can make Python binaries
availabe over NFS. Other factors, though, will probably, lead us down
this path.

> another option might be to use symbolic links. The plan is:
>
> 1. When you create a Docker image, create two symbolic links, e.g.:
> /home to /opt/mr3-run/work-dir/home
> /big_data to /opt/mr3-run/work-dir/big_data
>
> 2. Use a single NFS mount, which has two sub-directories or symbolic links
> for "home" and "big_data".
>
> In this way, we can use a single NFS mount and put all the data in normal
> places. If step 1 does not work (not sure if this works), we could use
> init-containers to create the two symbolic links when Pods start.
>
> If you can use a single NFS mount to provide all the data, please let me
> know. If so, let me try step 1.

While using a single volume is possible, I'm still not liking the
changes it would require on our client system. As a result, I think
I'm going to first try post-processing run.yaml to add the needed PVs,
PVCs and mounts.

David

> --- Sungwoo
>
> --
> You received this message because you are subscribed to the Google Groups "MR3" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/CAKHFPXDpqjny94B39EX6vLY2Gpxhzs7W%2Bdbreo1V8L%3DDGReHvg%40mail.gmail.com.

--
David Engel
da...@istwok.net

Sungwoo Park

unread,
Sep 27, 2022, 11:37:05 AM9/27/22
to David Engel, MR3
> For using Python scripts, you need to create a new Docker image anyway, so

I don't believe this is strictly true.  We can make Python binaries
availabe over NFS.  Other factors, though, will probably, lead us down
this path.

Perhaps this is a technique that I don't understand here, as I thought that 'python' command should be found in the default path. (Cf. my previous email of July 22)
 
While using a single volume is possible, I'm still not liking the
changes it would require on our client system.  As a result, I think
I'm going to first try post-processing run.yaml to add the needed PVs,
PVCs and mounts.

Sounds good. For mouning multiple PVCs on HiveServer2, you could update run.yaml after executing ts-node. For mounting multiple PVCs on worker Pods, we discussed a (somewhat ugly) technique in the previous email (July 24).

You can also mount several PersistentVolumes. Either you could rebuild the Docker image from scratch after updating mr3-setup.sh, or use an ugly hack like (with /foo/bar1, /foo/bar2, /foo/bar3 in Docker image):

WORK_DIR_PERSISTENT_VOLUME_CLAIM=pvc1
WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR=/foo/bar1,pvc2=/foo/bar2,pvc3=/foo/bar3


However, if you would like to see an extension of MR3 for facilitating mounting multiple PVCs, please let me know your spec and we will try to implement it.

--- Sungwoo

David Engel

unread,
Sep 27, 2022, 3:49:45 PM9/27/22
to Sungwoo Park, MR3
On Wed, Sep 28, 2022 at 12:36:54AM +0900, Sungwoo Park wrote:
> >
> > > For using Python scripts, you need to create a new Docker image anyway,
> > so
> >
> > I don't believe this is strictly true. We can make Python binaries
> > availabe over NFS. Other factors, though, will probably, lead us down
> > this path.
> >
>
> Perhaps this is a technique that I don't understand here, as I thought that
> 'python' command should be found in the default path. (Cf. my previous
> email of July 22)

We could run a query similar to the following:

select transform(*)
using '/home/path/to/conda/bin/python3 /home/path/to/scripts/transform.py'
from ...

> > While using a single volume is possible, I'm still not liking the
> > changes it would require on our client system. As a result, I think
> > I'm going to first try post-processing run.yaml to add the needed PVs,
> > PVCs and mounts.
> >
>
> Sounds good. For mouning multiple PVCs on HiveServer2, you could update
> run.yaml after executing ts-node. For mounting multiple PVCs on worker
> Pods, we discussed a (somewhat ugly) technique in the previous email (July
> 24).

I have the editing of run.yaml working to add 2, new, pv/pvcs.

> *You can also mount several PersistentVolumes. Either you could rebuild the
> Docker image from scratch after updating mr3-setup.sh, or use an ugly hack
> like (with /foo/bar1, /foo/bar2, /foo/bar3 in Docker
> image):WORK_DIR_PERSISTENT_VOLUME_CLAIM=pvc1WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR=/foo/bar1,pvc2=/foo/bar2,pvc3=/foo/bar3*
>
> However, if you would like to see an extension of MR3 for facilitating
> mounting multiple PVCs, please let me know your spec and we will try to
> implement it.

We are an anachronism in still needing transforms. The only other
possible use for arbitrary, NFS mounts I can think of right now is to
pass files by name into Hive UDFs. I you think that's useful to other
users, feel free to add it to your TODO list. We have a viable work
around so I wouldn't make it a high priority. Our spec would simply
be a list of tuples including server IP/hostname, remote directory,
local directory and read-write/read-only indication.
Reply all
Reply to author
Forward
0 new messages