permissions error in updateAttributes

16 views
Skip to first unread message

wgmu...@gmail.com

unread,
Feb 12, 2016, 10:44:28 AM2/12/16
to Tessera-Users
Hi ... I'm running into a similar error I found here:
https://groups.google.com/forum/#!topic/tessera-users/g6005_JM_WQ

I have some data (output of a rhipe job) I'm trying to instantiate as a ddo. When I run updateAttributes the job fails:

data<-updateAttributes(data)
---------------------------------
There were R errors, showing 30:

1(1):
R ERROR BEGIN (map):
=============

Error: PB ERROR[LOGLEVEL_E
Autokill is true and terminating job_1455286789626_0015
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: Cannot access /tmp/tmp_output-0498f342445580fbf0bd05ce8f34b2db: No such file or directory.
In addition: Warning message:
In Rhipe:::rhwatch.runner(job = job, mon.sec = mon.sec, readback = readback, :
Job failure, deleting output: /tmp/tmp_output-0498f342445580fbf0bd05ce8f34b2db:

----------------------------------------------------------------------------------------------------------------

I've already changed the permissions of /tmp to 777. I've also tried changing the temp directory using rhoptions("HADOOP.TMP.FOLDER"="/user/tessera/tmp") to a folder I've created using rhmkdir.


----------------------------------------------------------------------------------------------------------------
permission owner group size modtime file
1 drwxrwxrwt hadoop supergroup 0 2016-02-12 14:19 /tmp/hadoop-yarn

I don't know if this helps, but here is my hadoop temp dir setting from core-site.xml

<property><name>hadoop.tmp.dir</name><value>/mnt/var/lib/hadoop/tmp</value></property>

The suggested fix

rhoptions(file.types.remove.regex =(/_meta|/_rh_meta|/_outputs|/_SUCCESS|/_LOG|/_log|rhipe_debug|rhipe_merged_index_db)")

doesn't help. Any ideas? Here is my session Info. I'm using EMR cluster started with ami 3.11.

----------------------------------------------------------------------------------------------------------------
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2015.09

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] datadr_0.7.5.9 Rhipe_0.75.2 rJava_0.9-9 codetools_0.2-14

loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 lattice_0.20-33 digest_0.6.9 dplyr_0.4.3 assertthat_0.1 chron_2.3-47 grid_3.2.2
[8] R6_2.1.2 DBI_0.3.1 magrittr_1.5 data.table_1.9.6 hexbin_1.27.1 tools_3.2.2 parallel_3.2.2

wgmu...@gmail.com

unread,
Feb 12, 2016, 11:16:48 AM2/12/16
to Tessera-Users, wgmu...@gmail.com
Actually I think my problem is related to the following:


Error: PB ERROR[LOGLEVEL_ERROR](google/protobuf/io/coded_stream.cc:171) A protocol message was rejected because it was too big (more than 268435456 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

Sorry, need to investigate more....

Ryan Hafen

unread,
Feb 12, 2016, 3:26:50 PM2/12/16
to wgmu...@gmail.com, Tessera-Users, Saptarshi Guha
Interesting.  You must have some very large key-value pairs?  Unfortunately the protobuf message size is something that is hard-coded at compile-time and requires a re-compile to change.  You can see where it is set in the C code, for example, here: https://github.com/tesseradata/RHIPE/blob/master/src/main/C/display.cc#L372.  In some studies we have done, we’ve found that there is an optimal size for key-value pairs and 256MB is on the large side - I wonder if it is possible to break your data up a bit more?

Also, although this isn’t your issue, it’s probably a good thing to add to the mailing list that if you do run into an issue with not being able to write to the temporary directory, you have the ability to change it with the following rhoptions, e.g.:

rhoptions(HADOOP.TMP.FOLDER = "/user/tmp")


--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/179288b6-b810-45ca-83c1-5996454bf100%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wgmu...@gmail.com

unread,
Feb 19, 2016, 4:36:17 PM2/19/16
to Tessera-Users, wgmu...@gmail.com, saptars...@gmail.com
Yes ... it is a huge dataset. I was able to fix my issue with the key-value pairs ... I mistakenly thought I had divided the data by several attributes, but was only using one. What was the performance like for the 1M+ trelliscope display? I am dealing with similar data ... I have about 1.5M key-value pairs and trelliscope hangs after I have generated a simple display.

Ryan Hafen

unread,
Feb 19, 2016, 4:43:51 PM2/19/16
to wgmu...@gmail.com, Tessera-Users
Glad you got that part working. A 1.5 million panel trelliscope display will probably take 30 seconds or longer to load (it has to read the 1.5 million row data frame of cognostics) but once it has loaded it should be pretty responsive. The viewer should pop up instantaneously and the delay should be after you click on the display you want to view. 30 seconds is a long time to wait, I know, and there should be a way to allow immediate exploration while everything is loading, but I haven’t given this priority as there aren’t a lot of people using it at this scale. I can look into speeding it up.


> On Feb 20, 2016, at 8:36 AM, wgmu...@gmail.com wrote:
>
> Yes ... it is a huge dataset. I was able to fix my issue with the key-value pairs ... I mistakenly thought I had divided the data by several attributes, but was only using one. What was the performance like for the 1M+ trelliscope display? I am dealing with similar data ... I have about 1.5M key-value pairs and trelliscope hangs after I have generated a simple display.
>
> --
> You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/70ebc0da-0939-40d4-b9ed-823a2df7df17%40googlegroups.com.

wgmu...@gmail.com

unread,
Feb 24, 2016, 2:23:05 PM2/24/16
to Tessera-Users, wgmu...@gmail.com
I fixed an issue with the keys ... i had done something wrong when I created the keys which cut off some text and so a lot of the keys were duplicate with different values. the performance is good after the initial loading time which is not very with a few cognostics. thanks ryan!
Reply all
Reply to author
Forward
0 new messages