_SUCCESS file persisting does not work in Spark

465 views
Skip to first unread message

Denis Bolshakov

unread,
Jun 14, 2016, 6:05:00 AM6/14/16
to Alluxio Users
Hello community,

We are experimenting with Alluxio. Our goal is to build whole ETL chains.

So, perhaps we have the following list of ETL processes:
ETL1 downloads data from external storage and lands data to HDFS
ETL2 depends on ETL1 (waits _SUCCESS file created by ETL1) and reads result of ETL1 using alluxio and saves data using alluxio.
ETL3 depends on ETL2 (waits _SUCCESS file created by ETL2) and reads data from HDFS

So lets go back to ETL2 spec:
It's a spark application (not streaming)
It runs with --conf spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH
All data are persisted to HDFS successfully, but _SUCCESS file is not. Also we have other 'hidden' files related to parquet metadata, and them are also not persisted.  BTW all of these missed files are presented in alluxio.

So out ETL3 process waits forever.

Could you please point us where we are wrong with persisting 'hidden' files or which configuration is missed?
Thanks in advance!

---
Best regards,
Denis

  

Gene Pang

unread,
Jun 14, 2016, 11:10:52 AM6/14/16
to Alluxio Users
Hi Denis,

I'm not sure why the _SUCCESS file is not persisted, but I was wondering more about your use case. I have a few clarifying questions:

- Which version of Alluxio are you using?
- Why does ETL3 read from HDFS and not Alluxio?
- Are all the ETL* jobs, spark jobs?

Thanks,
Gene

Denis Bolshakov

unread,
Jun 14, 2016, 11:57:33 AM6/14/16
to Alluxio Users
- We use 1.1.0 release 
- We can read from Alluxio as well (and we plan to do so), but we have to be sure that all data is persisted (including _SUCCESS files)
- They are not Spark jobs.

Bin Fan

unread,
Jun 14, 2016, 1:31:46 PM6/14/16
to Denis Bolshakov, Alluxio Users
Hi Denis,

if you go to your Alluxio webUI (by default its address is AlluxioMasterIP:19999), and click "Browse" tab to locate your _SUCCESS files in Alluxio filesystem (since you mentioned those files are present in Alluxio space), what is the value of their "Persistence State"?

- Bin

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bin Fan

unread,
Jun 14, 2016, 3:43:11 PM6/14/16
to Denis Bolshakov, Alluxio Users
Hi Denis,

it seems the writetype setting of CACHE_THROUGH is not inherited for _SUCCESS files. 
Is it possible that _SUCCESS file is not written by your spark executor but something else so it still uses the default write type MUST_CACHE (without through)?

Since 1.1, you can set alluxio.user.file.writetype.default=CACHE_THROUGH into your ~/.alluxio/alluxio-site.properties file rather than setting through Spark to be respected by all alluxio client. See http://www.alluxio.org/documentation/master/en/Configuration-Settings.html#configuration-properties

- Bin


On Tue, Jun 14, 2016 at 11:17 AM, Denis Bolshakov <bolshak...@gmail.com> wrote:

Bin, hello, thaks for helping me. Their state are unpersisted.

Denis

14 Июн 2016 г. 20:31 пользователь "Bin Fan" <fanb...@gmail.com> написал:

Denis Bolshakov

unread,
Jun 14, 2016, 4:07:54 PM6/14/16
to Alluxio Users
No, I don't think that the problem is related to _SUCCESS file itself, parquet metada files are also not persisted. I will try to set writetype in property file.

Thanks,
Denis

Bin Fan

unread,
Jun 14, 2016, 5:59:50 PM6/14/16
to Denis Bolshakov, Alluxio Users
sounds good. also just reminder, you need to update the property file on every node that may run Alluxio

Gil Vernik

unread,
Jun 15, 2016, 1:17:11 AM6/15/16
to Bin Fan, Denis Bolshakov, Alluxio Users
I do see _SUCCESS files created when I am using Spark and Alluxio configured with Swift API based object store.

Denis Bolshakov

unread,
Jun 15, 2016, 5:16:19 AM6/15/16
to Gil Vernik, Bin Fan, Alluxio Users
Bin, I checked configuration in properties file. We already set there alluxio.user.file.writetype.default=CACHE_THROUGH. (and this confirmed by http://our-node-with-alluxio:19999/configuration)
These properties files are synced across the alluxio nodes.
And in spark this property is not applied, so it's a reason why we use spark.executor.extraJavaOptions.
And I think it could be important detail - we run alluxio on top of Yarn.

Gil could you please share your experience with Spark+Alluxio+Swift? Which resource manager do you use to run Spark? Which resource manager do you use to run Alluxio?
It's a very interesting topic.

Best regards,
Denis
 
--
//with Best Regards
--Denis Bolshakov
e-mail: bolshak...@gmail.com

Bin Fan

unread,
Jun 15, 2016, 12:41:20 PM6/15/16
to Denis Bolshakov, Gil Vernik, Alluxio Users
On Wed, Jun 15, 2016 at 2:16 AM, Denis Bolshakov <bolshak...@gmail.com> wrote:
Bin, I checked configuration in properties file. We already set there alluxio.user.file.writetype.default=CACHE_THROUGH. (and this confirmed by http://our-node-with-alluxio:19999/configuration)

Could you be more specific here, let us know the name and location of your properties file where you set alluxio.user.file.writetype.default=CACHE_THROUGH
This is important because when Spark is using Alluxio as a client, Spark jobs may or may not respect the configuration file depends on whether ${ALLUXIO_HOME}/conf is in your Spark classpath or not.

Denis Bolshakov

unread,
Jun 16, 2016, 3:02:52 AM6/16/16
to Bin Fan, Gil Vernik, Alluxio Users
Bin, now we are starting to consider two issues, and I am not sure that their related to each other.
1. We don't see that hidden files (started with _ ) are persisted in spark  (but all other files are persisted) when using --conf spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH. Alluxio is 1.1.0 release and running on top of yarn.
2. Setting alluxio.user.file.writetype.default=CACHE_THROUGH in properties files does not work. Alluxio is 1.1.0 release and running on top of yarn. Here are more details:
we have a cluster with few nodes (about 10), all of them have running alluxio workers, so lets consider one of the worker, lets say it's running on node with dns name - uat-node004.
After login to the node we can grep alluxio processes

-bash-4.1$ ps uax | grep alluxio

yarn      5745  0.0  0.0 103276   900 pts/1    S+   09:52   0:00 grep alluxio

yarn     42420  0.0  0.0 106108  1212 ?        Ss   Jun15   0:00 /bin/bash -c ./alluxio-yarn-setup.sh application-master -num_workers 9 -master_address uat-node005 -resource_path hdfs://nameservice1/tmp 1>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/stdout 2>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/stderr 

yarn     42425  0.0  0.0 106108  1264 ?        S    Jun15   0:00 /bin/bash ./alluxio-yarn-setup.sh application-master -num_workers 9 -master_address uat-node005 -resource_path hdfs://nameservice1/tmp

yarn     42431  0.0  0.0 106112  1340 ?        S    Jun15   0:00 /bin/bash ./integration/bin/alluxio-application-master.sh -num_workers 9 -master_address uat-node005 -resource_path hdfs://nameservice1/tmp

yarn     42452  1.3  0.1 905024 180604 ?       Sl   Jun15  13:37 /usr/java/default//bin/java -cp /data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/conf/::/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar:/etc/hadoop/conf.cloudera.yarn:/var/run/cloudera-scm-agent/process/4087-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop-yarn/lib/*:/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/* -Dalluxio.home=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001 -Dalluxio.logs.dir=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/logs -Dalluxio.worker.tieredstore.level0.dirs.path=/tmp/ramdisk -Dalluxio.master.hostname=uat-node005 -Dalluxio.underfs.address=hdfs://nameservice1/ -Dalluxio.worker.memory.size=10GB -Dlog4j.configuration=file:/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000001/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Xmx256M alluxio.yarn.ApplicationMaster -num_workers 9 -master_address uat-node005 -resource_path hdfs://nameservice1/tmp

yarn     42705  0.0  0.0 106108  1212 ?        Ss   Jun15   0:00 /bin/bash -c ./alluxio-yarn-setup.sh alluxio-worker 1>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/stdout 2>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/stderr 

yarn     42709  0.0  0.0 106104  1260 ?        S    Jun15   0:00 /bin/bash ./alluxio-yarn-setup.sh alluxio-worker

yarn     42737  0.0  0.0 106112  1328 ?        S    Jun15   0:00 /bin/bash ./integration/bin/alluxio-worker-yarn.sh

yarn     42847  0.1  0.4 33199372 612876 ?     Sl   Jun15   1:49 /usr/java/default//bin/java -cp /data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/conf/::/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar -Dalluxio.home=/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003 -Dalluxio.logs.dir=/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/logs -Dalluxio.worker.tieredstore.level0.dirs.path=/tmp/ramdisk -Dalluxio.master.hostname=uat-node005 -Dalluxio.underfs.address=hdfs://nameservice1/ -Dalluxio.worker.memory.size=10GB -Dlog4j.configuration=file:/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dalluxio.logger.type=WORKER_LOGGER -Dalluxio.home=/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003 -Dalluxio.logger.type=WORKER_LOGGER -Dalluxio.logs.dir=/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003 -Dalluxio.master.hostname=uat-node005 alluxio.worker.AlluxioWorker



So here we can see that we have running yarn container which presents alluxio worker - container_e19_1465799602059_0210_01_000003.
Also we can see that classpath has the following path /data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/conf/
And the following list of commands show what we have there:

-bash-4.1$ cd /data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/conf/

-bash-4.1$ pwd

/data/disk0/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000003/conf

-bash-4.1$ ls

alluxio-env.sh  alluxio-site.properties  core-site.xml  hdfs-site.xml  log4j.properties  mapred-site.xml  topology.map  topology.py  workers  yarn-site.xml

-bash-4.1$ cat alluxio-site.properties | grep type | grep write

alluxio.user.file.writetype.default=CACHE_THROUGH


Now lets check the node with running master (uat-node005)

-bash-4.1$ ps aux | grep alluxio

yarn      7732  0.0  0.0 106108  1212 ?        Ss   Jun15   0:00 /bin/bash -c ./alluxio-yarn-setup.sh alluxio-master 1>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/stdout 2>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/stderr 

yarn      7736  0.0  0.0 106108  1264 ?        S    Jun15   0:00 /bin/bash ./alluxio-yarn-setup.sh alluxio-master

yarn      7748  0.0  0.0 106108  1212 ?        Ss   Jun15   0:00 /bin/bash -c ./alluxio-yarn-setup.sh alluxio-worker 1>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/stdout 2>/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/stderr 

yarn      7752  0.0  0.0 106108  1264 ?        S    Jun15   0:00 /bin/bash ./alluxio-yarn-setup.sh alluxio-worker

yarn      7756  0.0  0.0 106112  1336 ?        S    Jun15   0:00 /bin/bash ./integration/bin/alluxio-master-yarn.sh

yarn      7812  0.0  0.0 106112  1336 ?        S    Jun15   0:00 /bin/bash ./integration/bin/alluxio-worker-yarn.sh

yarn      7912  0.3  0.5 33245016 686232 ?     Sl   Jun15   3:53 /usr/java/default//bin/java -cp /data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/conf/::/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar -Dalluxio.home=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002 -Dalluxio.logs.dir=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/logs -Dalluxio.worker.tieredstore.level0.dirs.path=/tmp/ramdisk -Dalluxio.master.hostname=uat-node005 -Dalluxio.underfs.address=hdfs://nameservice1/ -Dalluxio.worker.memory.size=10GB -Dlog4j.configuration=file:/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dalluxio.logger.type=MASTER_LOGGER -Dalluxio.home=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002 -Dalluxio.logger.type=MASTER_LOGGER -Dalluxio.logs.dir=/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002 alluxio.master.AlluxioMaster

yarn      7950  0.1  0.3 33160472 475616 ?     Sl   Jun15   1:30 /usr/java/default//bin/java -cp /data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/conf/::/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar -Dalluxio.home=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011 -Dalluxio.logs.dir=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/logs -Dalluxio.worker.tieredstore.level0.dirs.path=/tmp/ramdisk -Dalluxio.master.hostname=uat-node005 -Dalluxio.underfs.address=hdfs://nameservice1/ -Dalluxio.worker.memory.size=10GB -Dlog4j.configuration=file:/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -Dalluxio.logger.type=WORKER_LOGGER -Dalluxio.home=/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011 -Dalluxio.logger.type=WORKER_LOGGER -Dalluxio.logs.dir=/var/log/hadoop-yarn/container/application_1465799602059_0210/container_e19_1465799602059_0210_01_000011 -Dalluxio.master.hostname=uat-node005 alluxio.worker.AlluxioWorker

yarn     18961  0.0  0.0 103276   904 pts/1    S+   09:58   0:00 grep alluxio

-bash-4.1$ cd /data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/conf/

-bash-4.1$ pwd

/data/disk7/yarn/nm/usercache/devops/appcache/application_1465799602059_0210/container_e19_1465799602059_0210_01_000002/conf

-bash-4.1$ ls

alluxio-env.sh  alluxio-site.properties  core-site.xml  hdfs-site.xml  log4j.properties  mapred-site.xml  topology.map  topology.py  workers  yarn-site.xml

-bash-4.1$ cat alluxio-site.properties | grep write | grep type

alluxio.user.file.writetype.default=CACHE_THROUGH



So looks like a configuration on classpath for both (master and worker) and on both nodes we see alluxio.user.file.writetype.default=CACHE_THROUGH in alluxio-site.properties.


Best regards,

Denis

Bin Fan

unread,
Jun 16, 2016, 1:27:34 PM6/16/16
to Denis Bolshakov, Gil Vernik, Alluxio Users
Hi Denis,

It looks to me you set the property "alluxio.user.file.writetype.default=CACHE_THROUGH" to the Alluxio daemons (master and workers).
However, alluxio.user.file.writetype.default=CACHE_THROUGH is an Alluxio client-side configuration property where Spark uses in your Alluxio client now.
In other words, your Spark jobs are different JVMs from your Alluxio daemons and setting this value for Alluxio daemons will not affect Spark jobs for CACHE_THROUGH. 

As discussed in your previous email, you use spark.executor.extraJavaOptions to pass this property and I was thinking maybe not every piece of your job
respects this spark.executor.extraJavaOptions. So I was suggesting to put alluxio-site.properties into your "~/.alluxio/" or your application classpath  on each machine running Spark workers.
Since Alluxio 1.1, each application using an Alluxio client jar will try to search "~/.alluxio/" or your application classpath for alluxio-site.properties.

In short, I think the key is to make sure all client-side application respect alluxio.user.file.writetype.default=CACHE_THROUGH, not the daemon.

Hope I don't confuse you more.

- Bin

Denis Bolshakov

unread,
Jun 17, 2016, 9:17:26 AM6/17/16
to Bin Fan, Gil Vernik, Alluxio Users
Hello Bin,

Thanks for helping,

You are correct.
So I think we have a picture now:
1. Spark application should have alluxio config files in its class path.
We've copied these configs to /opt/spark/lib/ and they worked fine.
But could you point us to documentation which describes how to do that? Or provide detailed description of your solution.
So with everything works fine (including persisting _SUCCESS files)
2. About our problem with _SUCCESS, I think we did not see it on hdfs because it's generated by driver, not by a executor. 

Kind regards,
Denis

Bin Fan

unread,
Jun 17, 2016, 1:28:48 PM6/17/16
to Denis Bolshakov, Gil Vernik, Alluxio Users
See my inline replies

On Fri, Jun 17, 2016 at 6:17 AM, Denis Bolshakov <bolshak...@gmail.com> wrote:
Hello Bin,

Thanks for helping,

You are correct.
So I think we have a picture now:
1. Spark application should have alluxio config files in its class path.
We've copied these configs to /opt/spark/lib/ and they worked fine.
But could you point us to documentation which describes how to do that? Or provide detailed description of your solution.
So with everything works fine (including persisting _SUCCESS files)
here is the documentation.

Basically, if you have any properties you want to set for all different clients, put them into ${HOME}/.alluxio/ or /etc/alluxio/

 
2. About our problem with _SUCCESS, I think we did not see it on hdfs because it's generated by driver, not by a executor. 

That's my guess too. 
So an alternative is to also pass the similar configuration property to spark.driver.extraJavaOptions 
Reply all
Reply to author
Forward
0 new messages