Shiny server with bigger data - in memory

4,153 views
Skip to first unread message

Mario Deng

unread,
Jul 2, 2014, 7:37:44 AM7/2/14
to shiny-...@googlegroups.com
Hello everyone,

I developed a shiny app, for testing (alpha versions) it get's deployed to shiny sever. Before shiny server was released, it got tested by just running a R instance. And here is the trouble I am running in.
When the app is running via a R instance, everything is just fine, all data in RAM, shiny is super responsible and answers the first call (via the browser) fast. But when deployed to the shiny server, all data get's first loaded when a client is calling the app (via browser). This results in a) slow responsibility for the user (at the initial call) b) my data is growing and it seems shiny can't handle it (maybe there is timeout via HTTP) and doesn't start (or just when feeling lucky).
I got my in .RData objects, I load from the filesystem, via load() command in the global file. The files are compressed and ~5GB each. So, is there a way to permanently keep them in memory, for every user instance? So the user doesn't have to wait long or it's timing out. I really need to solve this :/
If the app is running, the performance is perfectly fine, just got that startup issue. And my machine isn't running out of memory.

I am running:
R version 3.1.0
shiny_0.10.0
- CentOS 6

With best regards,

Mario

Jan Stanstrup

unread,
Jul 2, 2014, 7:46:12 AM7/2/14
to shiny-...@googlegroups.com
If the data is supposed to always be the same you can "load" it before your shinyServer line. This way it is only read ones.


Jan.

Mario Deng

unread,
Jul 2, 2014, 8:00:08 AM7/2/14
to shiny-...@googlegroups.com
When I do so, I get "ERROR: object 'my.DT' not found", where my.DT is the data to load.

Jan Stanstrup

unread,
Jul 2, 2014, 8:38:16 AM7/2/14
to shiny-...@googlegroups.com
my.DT is defined inside the "shinyServer"?
Outside "shinyServer" it doesn't know about anything that happens inside "shinyServer". You need to define the path to your file also outside "shinyServer".
See an example here: http://shiny.rstudio.com/tutorial/lesson5/ under "Finishing the app".

Mario Deng

unread,
Jul 2, 2014, 8:55:30 AM7/2/14
to shiny-...@googlegroups.com
Ok,

this is confusing me. my.DT is defined nowhere. I used to load() it in global.R, because it needs to be visible in UI. When I put the load() before shinyServer(), then the objects won't be found. I tried relative and absolute paths.
Maybe the error occurs, because it's not visible to ui.R anymore?! From the shiny website

"Objects defined in global.R are similar to those defined in server.R outside shinyServer(), with one important difference: they are also visible to the code in ui.R. This is because they are loaded into the global environment of the R session; all R code in a Shiny app is run in the global environment or a child of it."

But then, how would loading my objects before shinyServer()  in server.R make a difference from loading them in global.R?

Jan Stanstrup

unread,
Jul 2, 2014, 9:02:30 AM7/2/14
to shiny-...@googlegroups.com
Sorry. I missed that you already loaded it in global. So it should already only load once as far as I know. Isn't it only slow the very first time you open the app?

Mario Deng

unread,
Jul 2, 2014, 9:17:54 AM7/2/14
to shiny-...@googlegroups.com
Exactly, this is what I mentioned above. It's just slow, when the app is launched for the first time. I am looking for a way to get rid of that, because after a certain file size the app doesn't start any more or jast at random.
Splitting my data (and reload after an user action) would be horrible.

mbh

unread,
May 28, 2015, 1:05:58 PM5/28/15
to shiny-...@googlegroups.com
Hi Mario,

Did you find a solution ? I'm really interested in.

Thanks,
Matthieu

Joe Cheng

unread,
May 28, 2015, 5:27:57 PM5/28/15
to mbh, shiny-...@googlegroups.com
The Shiny Server config directives app_init_timeout and app_idle_timeout may be relevant:

They don't make the startup any faster, but app_init_timeout can prevent long startup from causing an error, and app_idle_timeout can be used to prevent expensive-to-start R process from being retired.

Ideally though, you would find a way to load less data, perhaps by doing some kind of summarizing/aggregation, and save that as your .RData or .rds file to be loaded at startup. Shiny can only ever be as fast as the instructions you give it, and loading dozens of GB of data directly into memory is going to take time.

--
You received this message because you are subscribed to the Google Groups "Shiny - Web Framework for R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shiny-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shiny-discuss/cc095ad8-0823-4401-b6a5-1872921e5d5e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

mbh

unread,
May 28, 2015, 6:42:11 PM5/28/15
to shiny-...@googlegroups.com, matthie...@bluestone.fr
Thanks Joe for this clarification.

I understand that we need time (R time, not shiny) to load the data in memory, but why should it be for every user ? And not only once and for all with shiny server.

Joe Cheng

unread,
May 28, 2015, 6:45:00 PM5/28/15
to mbh, shiny-...@googlegroups.com, matthie...@bluestone.fr
That's basically what these directives let you do. The reason this isn't the default is that most apps are quick to start up and the assumption is that you don't want them sitting around consuming memory when nobody is using them. 

Sent from Outlook




Carlos Sánchez

unread,
May 29, 2015, 3:06:09 AM5/29/15
to shiny-...@googlegroups.com
Hi Mario,

Have you thought about putting your data in a database and pulling the data from it as you needed? I'm not really sure what your app does but in my case I needed to access 25GB of data. For my solution, I decided to go with Elasticsearch and then just query the DB to pull the data. It works really well and you avoid the loading issue as the data is always there via ELasticsearch. There are other options besides ES within R like MongoDB or Postgres so you can choose the one you like the most.

Carlos

Mario Deng

unread,
May 29, 2015, 3:19:26 AM5/29/15
to shiny-...@googlegroups.com

Hey everyone,

just as Joe mentioned, there is no solution, it just requires its time.

Using an external DB (benchmarked Redis and MongoDB for my use case) is not feasible, both systems are way to slow. I kept going with data.table, keep everything in memory and accept some extra time during startup. Setting the timeout for an app to ~20h did the job in my case.

Best,

mbh

unread,
May 29, 2015, 5:29:11 AM5/29/15
to shiny-...@googlegroups.com
In my case, it's working as I want to. I put data loading in global.R and launch the app, it lasts 45 seconds.
Then i'm connecting the app through another computer and it's immediate, I don't have to wait 45 seconds again.

Wen Yu

unread,
Oct 15, 2015, 9:38:50 AM10/15/15
to Shiny - Web Framework for R
Hi, can you elaborate on what you need to do to "load the data in global R"? Is it something you can do in 'server.R' or on the Shiny server side? I'm desperately looking for a solution to improve the data loading. It's killing the user experience!

Thanks!
Wen Yu

Joe Cheng

unread,
Oct 15, 2015, 12:15:27 PM10/15/15
to Wen Yu, Shiny - Web Framework for R
It's not what mbh was referring to, but you can load data just once by putting it at the top of server.R. See "Objects visible across all sessions" on this page: http://shiny.rstudio.com/articles/scoping.html

If that's taking too long, if possible you should do whatever data massaging/transformation/aggregating/etc. outside of the Shiny app, and save the results as an .Rds file using saveRDS(), and load the data at the top of server.R using readRDS().

--
You received this message because you are subscribed to the Google Groups "Shiny - Web Framework for R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shiny-discus...@googlegroups.com.

Wen Yu

unread,
Oct 15, 2015, 2:57:41 PM10/15/15
to Shiny - Web Framework for R, wyu...@gmail.com
Thanks for your suggestion! I think I'm doing the global data loading already (see my codes below). The data is stored with save() as a .RData file and load in outside of the shinyServer function. However, I'm not seeing the timing difference on the subsequent session, even on the same browser (Chrome) instance. 

Thanks!
Wen

library(shiny)

# place the setup code here. they will be run just once when the app was invoked 
library(lattice)

# Reading the search output. This is where the data loading occure
load("PD_lf.RData");

shinyServer(function(input, output) {

# render codes    
  });
  
  output$distPlot3 <- renderPrint({
    
    ctable(lf$psm5[,c(input$param_y3,"Digestion")]);
    
  });
  
})

Nick Mead

unread,
Oct 19, 2015, 8:37:03 PM10/19/15
to Shiny - Web Framework for R, wyu...@gmail.com
Sorry to tack onto another user's question, but I'm trying to test the suitability of Shiny for a couple of projects and I'm trying to tackle a similar issue...

I read the linked article, but I'm not clear on something.  If I want an object (eg, the output of a SQL query against our data warehouse) to be persistent across sessions, I put it at the top of my code, outside the shinyServer function.  I assume this will be created the first time the server.R code runs.  But if it's persistent across sessions, when does it refresh?  If ever?  Is there some way to force a refresh?  If so, can I refresh against a schedule?

The data in our data warehouse refreshes on a daily basis.  So there is no point re-loading data large data sets each time a user runs a shiny app.  But at the same time, I don't want the data to be persistent over a period of days.  Is there a recommended way to handle this?

Joe Cheng

unread,
Oct 20, 2015, 3:15:17 AM10/20/15
to Nick Mead, Shiny - Web Framework for R, wyu...@gmail.com

Nick Mead

unread,
Oct 20, 2015, 4:50:03 PM10/20/15
to Shiny - Web Framework for R, naj...@gmail.com, wyu...@gmail.com
Thanks Joe.  I've been playing around with reactivePoll but keep getting an error.  It looks like the data loads initially, but after the polling period, I get the following error;

Error in thisFunc() : object 'session' not found

Here is a (slightly abridged) copy of my code;

library(shiny)
library(dplyr)
library(lubridate)

data <- reactivePoll(100000, session,

checkFunc = function() {
if (difftime(Sys.time(), attr(sales,”TIMESTAMP"), units="mins") > 1) {
return("Old")
} else {
return("Current")
}
},
valueFunc = function() {
sales <- read.csv(“Sales_Data.pip", header = TRUE, sep = "|", quote = "\"")
attr(sales, “TIMESTAMP”) <- Sys.time()
summary <-
summarise(
group_by(
sales,
y=as.integer(year(Sales_Date)),
m=as.integer(month(Sales_Date)),
o=Office,
p=as.character(Product)
),
c=n_distinct(Sales_ID)
)
return(summary)
})


shinyServer(function(input, output, session) {

output$mytable2 = renderDataTable({
data()
}

output$plot1 = renderPlot({
plot<-ggplot(data(), aes(x=m, y=c))+
geom_bar(stat = "identity", colour='black', fill='#B1005D') +
xlab(“Sales by Product“) +
ylab("Month") +
scale_x_discrete()+
facet_wrap(~p) +
theme(strip.text.x = element_text(size = 14))
plot(plot)
})
})

Joe Cheng

unread,
Oct 21, 2015, 5:25:41 AM10/21/15
to Nick Mead, Shiny - Web Framework for R, wyu...@gmail.com
There's no "session" variable in scope where you're calling reactivePoll, since you're operating outside of any user session at that point. Try passing NULL instead.

The reason for passing a session is so the reactivePoll can automatically stop itself when the user session ends (i.e. that particular browser window navigates away or closes). Passing NULL will just keep the reactivePoll running until the R process exits, which is what you want in this case.

Nick Mead

unread,
Oct 21, 2015, 7:10:21 PM10/21/15
to Shiny - Web Framework for R, naj...@gmail.com, wyu...@gmail.com
Thanks Joe, that works perfectly.

I could see that the session variable wasn't in scope, but removing didn't work.  Passing NULL however, works nicely.

While it is a bit of a pain to have to code up the rules for the checkFunction, it's really nice having the ability to fine-tune the data caching frequency. 

Joe Cheng

unread,
Oct 26, 2015, 4:08:49 PM10/26/15
to Nick Mead, Shiny - Web Framework for R, wyu...@gmail.com
Oh, if you're operating on a file, you don't want reactivePoll; you want reactiveFileReader, which implements checkFunc by consulting the last-modified-time of the file.

sales <- reactiveFileReader(1000, NULL, "Sales_Data.pip", read.csv, header = TRUE, sep = "|", quote = "\"")
data <- reactive({
  summarise(
    group_by(
      sales(),
      y=as.integer(year(Sales_Date))
      m=as.integer(month(Sales_Date)), 
      o=Office, 
      p=as.character(Product)
    ), 
    c=n_distinct(Sales_ID)
  )
})

And for completeness, if all your checkFunc is going to do is wait for a certain amount of time to expire, you don't even need reactivePoll. You can just use invalidateLater().

data <- reactive({

sales <- read.csv(“Sales_Data.pip", header = TRUE, sep = "|", quote = "\"")
  summary <- 
summarise(
group_by(
sales,
y=as.integer(year(Sales_Date)),
m=as.integer(month(Sales_Date)),
o=Office,
p=as.character(Product)
),
c=n_distinct(Sales_ID)
)
  invalidateLater(60*1000, NULL)
return(summary)
})

Reply all
Reply to author
Forward
0 new messages