Test for errors when loading data files

11 views
Skip to first unread message

Marshall Feldman

unread,
Mar 30, 2017, 10:05:40 AM3/30/17
to ProjectTemplate
Today I had this disconcerting experience.

I'm working on a project that relies on various data sources that are updated and added to periodically. Because I'll be giving a paper on the topic next week, I've kept the project set up to reach out across the Internet and load the latest versions of the data sources every time I work on it.

Unfortunately, the web site for the most important data source is down today, and virtually all my other work depends on these particular data. Fortunately I haven't yet gotten around to deleting a duplicate version of the data, so I should still be able to work on the project later today. (Messiness has its virtues.)

But this raises an important set of issues. The <project home>/data/file.url that loads these data simply contains the url of the data source (a csv file) and a separator character. ProjectTemplate's documentation for such files makes no mention of error checking, nor does it seem error checking is discussed in the documentation for other kinds of sources. 

Because ProjectTemplate does certain things "under the hood," it's not clear how to use R's standard error detection and handling facilities. For example, in my case I'd want something like this:

      url: <data source url>
      separator: ,
      if data loaded successfully
                continue
      else
              use an older version of the data
              issue a warning message

So what's the recommended way to accomplish this?

And please modify the documentation to discuss this crucial topic.

Thanks.


Kenton White

unread,
Mar 30, 2017, 10:26:22 AM3/30/17
to ProjectTemplate
Hi Marshall,

This is a very interesting use case!  Working within the current ProjectTemplate flow I would recommend using cache() to keep a cached copy of the data were the source might disappear.  If you head on over to https://github.com/johnmyleswhite/ProjectTemplate you can see much of the new work that is being done to help improve the automated caching, which I think may help solve your problem.  That is also the place to discuss how we should address this problem. 

The google groups thread as you've noted in another post is rather dead and most of the conversation has move to the GitHub page.  I monitor the group simply to help direct people to the GitHub page :)

BTW, I really like your idea of having ProjectTemplate fall back to a cached copy of the data if an error is encountered.  I think that is a a strategy that might work more generally.
Reply all
Reply to author
Forward
0 new messages