Hello,
I have problem with dividing 261744 rows and 11 col with function divide:divide(newYorkData ,update=TRUE, by="lang", output = hdfsConn("/paulina/hdfsFiles29", autoYes = TRUE), control = rhctl1)
rhctl1 <- rhipeControl(mapred = list(
rhipe_map_buff_size = 100,
mapred.max.split.size= 1024*1024,
mapred.task.timeout=0,
mapred.tasktracker.map.tasks.maximum = 4,
mapreduce.map.cpu.vcores = 2,
mapreduce.map.memory.mb = 3072+3072), jobname="pucTest")
First thing i noticed for every divide job is that it is not running Rhipe and hadoop. It is always running localy for some reason.
And secondly for this medium dataset i am getting error
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space
i tried type in Rstudio and restarted: options(java.parameters = "-Xmx8000m") or options(java.parameters = "-Xm3072") didnt work. Any ideas? Dont know if
this is tessera related problem or something other.
options(java.parameters = "-Xmx6g" )
6g means 6 GB
- After that u can load library(Rjava)
- if this is not working try to load more GB if u can
But still i have question about divide(). Why is there control function ? I think divide is not using it
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/89e97d01-d71f-450e-bc15-5c08e12d9ee9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Error in data.frame(list(), list(indices = list(c(42L, 57L)), text = "EARTHangelHOUR"), : arguments imply differing number of rows: 0, 1, 2, 7, 6, 3, 5, 11, 8, 4, 12, 9, 10, 13, 14, 16test1 = addTransform(testNY, TimeSeparator)
ddfData = ddf(newData)
system.time({twitterDDF = divide(ddfData ,update=TRUE, by="lang", output = hdfsConn("/paulina/hdfsFiles40", autoYes = TRUE), control = rhctl1)
test = addTransform(ddfData , TimeSeparator)
varMeans <- recombine(test, control=rhctl1, verbose = TRUE)
})
This is my magic circle. Becouse i want use addTransform after divide. But for divide use i need to have transformed data already. Transformation functions are taking care about problem above.
And funny thing is that if i didnt use ddf() function it will convert normally from data.frame but locally.
When i try as u said this:
ddfData = ddf(newYorkData)
rhmkdir("/paulina/hdfsFiles42")
rhchmod("paulina/hdfsFiles42","777")
conn <- hdfsConn("/paulina/hdfsFiles42")
testNY = convert(ddfData, conn)
ddf(newYorkData) will make 1 key and Value from all data. And then addTransform will test function in all data. That is magic circle i have.
I need transformed data to divide. But i need divide for fast transform.
Do you have some solution for this? Am I understand it correctly ? maybe addTransform should have logical parameter to enable or disabling Testing on subset or maybe add control parameter to it so it can calculate it faster.
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/04f04dab-977b-47fe-a8cf-4df4146a02e9%40googlegroups.com.
So for now i used
rhctl3 <- rhipeControl(mapred = list(Container [pid=15470,containerID=container_1463693104474_0011_01_000002] is running beyond physical memory limits. Current usage: 3.0 GB of 1 GB physical memory used; 4.0 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1463693104474_0011_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 15475 15470 15470 15470 (java) 725 37 1461231616 176779 /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_0 2 |- 15511 15475 15470 15470 (RhipeMapReduce) 617 79 2717884416 618134 /usr/local/lib64/R/library/Rhipe/bin/RhipeMapReduce --slave --silent --vanilla |- 15470 15468 15470 15470 (bash) 0 0 108613632 338 /bin/bash -c /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_0 2 1>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000002/stdout 2>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000002/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Container [pid=15561,containerID=container_1463693104474_0011_01_000003] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1463693104474_0011_01_000003 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 15566 15561 15561 15561 (java) 688 29 1590136832 188857 /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000003/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000003 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_1 3 |- 15601 15566 15561 15561 (RhipeMapReduce) 444 18 807518208 123816 /usr/local/lib64/R/library/Rhipe/bin/RhipeMapReduce --slave --silent --vanilla |- 15561 15559 15561 15561 (bash) 0 0 108613632 338 /bin/bash -c /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000003/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000003 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_1 3 1>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000003/stdout 2>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000003/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Container [pid=15631,containerID=container_1463693104474_0011_01_000004] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1463693104474_0011_01_000004 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 15636 15631 15631 15631 (java) 724 31 1594765312 177198 /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_2 4 |- 15671 15636 15631 15631 (RhipeMapReduce) 446 17 807526400 123817 /usr/local/lib64/R/library/Rhipe/bin/RhipeMapReduce --slave --silent --vanilla |- 15631 15629 15631 15631 (bash) 0 0 108613632 338 /bin/bash -c /usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/yarn/nm/usercache/paulina1/appcache/application_1463693104474_0011/container_1463693104474_0011_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 147.232.202.109 59648 attempt_1463693104474_0011_m_000000_2 4 1>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000004/stdout 2>/var/log/hadoop-yarn/container/application_1463693104474_0011/container_1463693104474_0011_01_000004/stderr
On May 20, 2016, at 9:29 AM, Jakub Paulina <jakub.pa...@gmail.com> wrote:
I managed to solve this problem only with global hadoop options in cloudera manager. Where i increased memory for mapreduce.map.memory.mb = 3072+3072, mapreduce.reduce.memory.mb = 3072+3072 operations. Dont like this solution but it works :/
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/78c63ec8-d095-4879-8527-9c65b3afb77a%40googlegroups.com.
> head(twitterData) lang place text id Day Month 1 en Queens, NY @JayyMarley LMFAOOO you know I do \U0001f602 321782606 Tue Mar 2 en Brooklyn, NY Thank you @AliciaSilv for your support of #EARTHangelHOUR !!! 917249834 Tue Mar 3 en Brooklyn, NY @aidenleslie Same to you. Enjoy 605652326 Tue Mar 4 en Midtown, Manhattan in the beautiful city with my sweet heart I couldn't be happier. http://t.co/Wx0olVhuuj 100870683 Tue Mar 5 en Verona, NJ @Eminem @rosenberg I neeeeeed to. http://t.co/fzppEFOivk 967058612 Tue Mar 6 es Manhattan, NY n_earley took me to #Despaña for #lunch. Era delicioso. @ Despaña Vinos y Mas https://t.co/JumyRrIw7v 20028465 Tue Mar DayNumber coordLong coordLat hashtagsText date Hours Mins Secs 1 17 -73.8007 40.6935 NA 3907903-11-03 18 23 23 2 17 -73.9396 40.7225 EARTHangelHOUR 3907903-11-03 18 23 23 3 17 -73.9229 40.6620 NA 3907903-11-04 18 24 24 4 17 -73.9824 40.7679 NA 3907903-11-04 18 24 24 5 17 -74.2482 40.8405 NA 3907903-11-05 18 24 24 6 17 -73.9983 40.7213 Despaña, lunch 3907903-11-06 18 24 24Or divide is not meant to use with MapReduce ?
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/7b6c433f-2852-4633-a26a-c35fdd0c948d%40googlegroups.com.
List of 1 $ :List of 2 ..$ key : chr "Split" ..$ value:'data.frame': 46546 obs. of 14 variables: .. ..$ lang : Factor w/ 40 levels "ar","bg","bs",..: 8 8 8 8 8 9 8 8 8 8 ... .. ..$ place : chr [1:46546] "Queens, NY" "Brooklyn, NY" "Brooklyn, NY" "Midtown, Manhattan" ... .. ..$ text : chr [1:46546] "@JayyMarley LMFAOOO you know I do \U0001f602" "Thank you @AliciaSilv for your support of #EARTHangelHOUR !!!" "@aidenleslie Same to you. Enjoy" "in the beautiful city with my sweet heart I couldn't be happier. http://t.co/Wx0olVhuuj" ... .. ..$ id :List of 46546 .. .. ..$ : int 321782606 .. .. ..$ : int 917249834 .. .. ..$ : int 605652326 .. .. ..$ : int 100870683 .. .. ..$ : int 967058612 .. .. ..$ : int 20028465 .. .. ..$ : chr "2444259996" .. .. ..$ : chr "2270517068" .. .. ..$ : int 43392430 .. .. ..$ : int 238255696 .. .. ..$ : int 161583058 .. .. ..$ : int 1089567061 .. .. ..$ : chr "3086389435" .. .. ..$ : int 18264864 .. .. ..$ : int 14094137 .. .. ..$ : chr "2866041279" .. .. ..$ : int 19243536 .. .. ..$ : int 156030607 .. .. ..$ : int 18399094 .. .. ..$ : int 830805157 .. .. ..$ : int 43392430 .. .. ..$ : int 698073 .. .. ..$ : int 16211722 .. .. ..$ : chr "3018536627" .. .. ..$ : int 303395427 .. .. ..$ : int 59613403 .. .. ..$ : chr "2285152153" .. .. ..$ : int 506520758 .. .. ..$ : chr "2469647625" .. .. ..$ : int 1852746906 .. .. ..$ : int 245429055 .. .. ..$ : int 509282628 .. .. ..$ : int 133847036 .. .. ..$ : int 1320161460 .. .. ..$ : int 20753474 .. .. ..$ : int 227146218 .. .. ..$ : chr "2444259996" .. .. ..$ : int 99529293 .. .. ..$ : int 1258631880 .. .. ..$ : int 865940286 .. .. ..$ : chr "2232233572" .. .. ..$ : int 601929914 .. .. ..$ : int 268944886 .. .. ..$ : chr "2726856134" .. .. ..$ : int 979295208 .. .. ..$ : int 542828934 .. .. ..$ : chr "2866041279" .. .. ..$ : chr "2232233572" .. .. ..$ : int 39769919 .. .. ..$ : chr "2160863864" .. .. ..$ : int 317397017 .. .. ..$ : chr "2716523343" .. .. ..$ : chr "3006971021" .. .. ..$ : int 238255696 .. .. ..$ : int 36107445 .. .. ..$ : int 74694356 .. .. ..$ : int 1708951410 .. .. ..$ : int 19998429 .. .. ..$ : int 133847036 .. .. ..$ : int 23880149 .. .. ..$ : int 303395427 .. .. ..$ : chr "3013313075" .. .. ..$ : int 1525605420 .. .. ..$ : int 24313651 .. .. ..$ : int 281586001 .. .. ..$ : int 189932645 .. .. ..$ : int 63351193 .. .. ..$ : int 315009700 .. .. ..$ : int 105560657 .. .. ..$ : chr "2488519472" .. .. ..$ : int 353327386 .. .. ..$ : int 37797873 .. .. ..$ : int 171163027 .. .. ..$ : int 601929914 .. .. ..$ : int 917215478 .. .. ..$ : int 477300945 .. .. ..$ : int 860841 .. .. ..$ : int 19243536 .. .. ..$ : int 390553714 .. .. ..$ : int 43392430 .. .. ..$ : chr "2878181717" .. .. ..$ : int 611955240 .. .. ..$ : int 860578316 .. .. ..$ : int 883174674 .. .. ..$ : int 281586001 .. .. ..$ : int 61375281 .. .. ..$ : int 979295208 .. .. ..$ : int 236644373 .. .. ..$ : int 317397017 .. .. ..$ : int 406530891 .. .. ..$ : int 601929914 .. .. ..$ : int 143596360 .. .. ..$ : int 158657057 .. .. ..$ : int 19243536 .. .. ..$ : int 1439808247 .. .. ..$ : int 491999779 .. .. ..$ : chr "2690799971" .. .. ..$ : int 15805390 .. .. ..$ : int 181715941 .. .. .. [list output truncated] .. ..$ Day : Factor w/ 7 levels "Mon","Tue","Wed",..: 2 2 2 2 2 2 2 2 2 2 ... .. ..$ Month : Factor w/ 12 levels "Jan","Feb","Mar",..: 3 3 3 3 3 3 3 3 3 3 ... .. ..$ DayNumber : chr [1:46546] "17" "17" "17" "17" ... .. ..$ coordLong : num [1:46546] -73.8 -73.9 -73.9 -74 -74.2 ... .. ..$ coordLat : num [1:46546] 40.7 40.7 40.7 40.8 40.8 ... .. ..$ hashtagsText:List of 46546 .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "EARTHangelHOUR" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 2 .. .. .. ..$ : chr "Despaña" .. .. .. ..$ : chr "lunch" .. .. ..$ : logi NA .. .. ..$ :List of 2 .. .. .. ..$ : chr "BiggieSmalls" .. .. .. ..$ : chr "goat" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 7 .. .. .. ..$ : chr "Laduree" .. .. .. ..$ : chr "lamodaya" .. .. .. ..$ : chr "dayagram" .. .. .. ..$ : chr "Rose" .. .. .. ..$ : chr "parisinny" .. .. .. ..$ : chr "pausa" .. .. .. ..$ : chr "break" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 2 .. .. .. ..$ : chr "Designing" .. .. .. ..$ : chr "Icons" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 6 .. .. .. ..$ : chr "penn_station" .. .. .. ..$ : chr "nyc" .. .. .. ..$ : chr "newyork" .. .. .. ..$ : chr "newyorkcity" .. .. .. ..$ : chr "florida" .. .. .. ..$ : chr "amtrak" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 3 .. .. .. ..$ : chr "myoffice" .. .. .. ..$ : chr "jazzlife" .. .. .. ..$ : chr "yep" .. .. ..$ :List of 5 .. .. .. ..$ : chr "Newark" .. .. .. ..$ : chr "Transportation" .. .. .. ..$ : chr "Job" .. .. .. ..$ : chr "Jobs" .. .. .. ..$ : chr "TweetMyJobs" .. .. ..$ :List of 5 .. .. .. ..$ : chr "TweetMyJobs" .. .. .. ..$ : chr "Marketing" .. .. .. ..$ : chr "Job" .. .. .. ..$ : chr "Newark" .. .. .. ..$ : chr "Jobs" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "Coast2Coast" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 3 .. .. .. ..$ : chr "TheRoyals" .. .. .. ..$ : chr "WilliamMoseley" .. .. .. ..$ : chr "ForNarnia" .. .. ..$ :List of 11 .. .. .. ..$ : chr "spring" .. .. .. ..$ : chr "errands" .. .. .. ..$ : chr "peep" .. .. .. ..$ : chr "the" .. .. .. ..$ : chr "chauvinist" .. .. .. ..$ : chr "pigs" .. .. .. ..$ : chr "photobombing" .. .. .. ..$ : chr "my" .. .. .. ..$ : chr "selfie" .. .. .. ..$ : chr "so" .. .. .. ..$ : chr "meta" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "JaneTheVirgin" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "Coast2Coast" .. .. ..$ :List of 3 .. .. .. ..$ : chr "InstallingMuscle" .. .. .. ..$ : chr "BestTrainer" .. .. .. ..$ : chr "BestPersonalTrainer" .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "CSW59" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "SuperfruitLive" .. .. ..$ :List of 1 .. .. .. ..$ : chr "CrapWeasel" .. .. ..$ :List of 3 .. .. .. ..$ : chr "StPatricksDay" .. .. .. ..$ : chr "blackhistory" .. .. .. ..$ : chr "comedy" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 8 .. .. .. ..$ : chr "50DaysofVegan" .. .. .. ..$ : chr "Day23" .. .. .. ..$ : chr "breakfast" .. .. .. ..$ : chr "toasted" .. .. .. ..$ : chr "tortillas" .. .. .. ..$ : chr "refriedbeans" .. .. .. ..$ : chr "peppered" .. .. .. ..$ : chr "rice" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 1 .. .. .. ..$ : chr "1GottaGo" .. .. ..$ :List of 1 .. .. .. ..$ : chr "StPatricksDay" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 4 .. .. .. ..$ : chr "ALEXMIKA" .. .. .. ..$ : chr "Choker" .. .. .. ..$ : chr "Hamsa" .. .. .. ..$ : chr "Necklace" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ :List of 12 .. .. .. ..$ : chr "ASLAM" .. .. .. ..$ : chr "LOCATIONS" .. .. .. ..$ : chr "SCOUTS" .. .. .. ..$ : chr "MOVIES" .. .. .. ..$ : chr "COMMERCIALS" .. .. .. ..$ : chr "FILMS" .. .. .. ..$ : chr "Luxury" .. .. .. ..$ : chr "Furnished" .. .. .. ..$ : chr "Condos" .. .. .. ..$ : chr "FDALLEN" .. .. .. ..$ : chr "NYC" .. .. .. ..$ : chr "fdallengroup" .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. ..$ : logi NA .. .. .. [list output truncated] .. ..$ date : Date[1:46546], format: "3907903-11-03" "3907903-11-03" "3907903-11-04" "3907903-11-04" ... .. ..$ Hours : chr [1:46546] "18" "18" "18" "18" ... .. ..$ Mins : chr [1:46546] "23" "23" "24" "24" ... .. ..$ Secs : chr [1:46546] "23" "23" "24" "24" ... ..- attr(*, "class")= chr [1:2] "kvPair" "list"
.. ... $ date : Date[1:46546], format: "3907903-11-03" "3907903-11-03" "3907903-11-04" "3907903-11-04" ... .. ..$ Hours : chr [1:46546] "18" "18" "18" "18" ... .. ..$ Mins : chr [1:46546] "23" "23" "24" "24" ... .. ..$ Secs : chr [1:46546] "23" "23" "24" "24" ...
Also you have one more lever to use which is rhipe_reduce_buff_size (number of key/value pairs emitted from Map that are loaded into a reduce buffer at a time). datadr::divide will accumulate chunks in the reduce. I am thinking en level of lang is emitting very large key/value pairs out of the map, so I would have that buffer be small. In fact one of my default behaviors using these systems is to put buffer sizes down to very low numbers and ramp them up when it works.List of IDs is wrong I already fixed it and removed Date format too so it should be work now better . But its same :/
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/813a3f09-b971-4fb6-a0d8-5f0963b65b81%40googlegroups.com.
The lists in lists was definitely problem. But that doesnt change anything about how the divide() work. He can cast data.frame to ddf really fast without problem about that lists localy. Main problem is beginning with divide and HDFS where that lists are starting memory leak. When i removed that lists it works. Ofcourse i found many bugs in my implementation i am not proud of it :/ . I wanted to use wordcloud by day in Trelliscope now i need to find another way. Maybe if this list will be of fixed length as you said it will fix my problem. I will update my progress. Sad thing is that i am starting to run out of Time to some more tunning. But i want to continue with Tessera in my final thesis. Maybe i will be able to contribute with some fixes or at least help with documentation ,future will show me my path! :D
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-user...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tessera-users/f49e2d52-8213-4157-90b0-9477b623de72%40googlegroups.com.
Divide is just a helper function for the construction of distributed data.frame objects (ddf), and even though it says "data.frame" in the name there are some uses of data.frames, allowed by R, that make more sense in datadr's "distributed data objects" (ddo). List of list for example are a classic example of what Ryan was thinking about when making distirbuted data objects.Fact of the matter most of my personal uses of Trelliscope is with DDOs.What that means to you is if you want to break out of the constraints of divide you can use mrExec and define a map/reduce function and then you get the liberty of not being burdened by divide or transform assumptions. Like for example, you can just write a map to divide up that data object pretty fast.
On Mon, May 23, 2016 at 9:00 AM, Jakub Paulina <jakub.pa...@gmail.com> wrote:
The lists in lists was definitely problem. But that doesnt change anything about how the divide() work. He can cast data.frame to ddf really fast without problem about that lists localy. Main problem is beginning with divide and HDFS where that lists are starting memory leak. When i removed that lists it works. Ofcourse i found many bugs in my implementation i am not proud of it :/ . I wanted to use wordcloud by day in Trelliscope now i need to find another way. Maybe if this list will be of fixed length as you said it will fix my problem. I will update my progress. Sad thing is that i am starting to run out of Time to some more tunning. But i want to continue with Tessera in my final thesis. Maybe i will be able to contribute with some fixes or at least help with documentation ,future will show me my path! :D
--
You received this message because you are subscribed to the Google Groups "Tessera-Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tessera-users+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.