R Trycatch Error

0 views

Skip to first unread message

Berniece Domnick

unread,

Aug 5, 2024, 12:11:38 PM8/5/24

to landtecbittmo

Inthese ways tryCatch can be used as a shortcut in data cleaning (allowing problem inputs to be avoided) or as a tool to help speed up debugging by identifying situations where exceptions are arising. Below I demonstrate both of these use cases.

Consider the following: a function that simply divides input numbers by 5 and a messy list of inputs that contains both integers and strings. For the integers the division function will work fine and for the strings it will throw an error and halt execution.

If we apply the function to the entire list of inputs then it will yield no output, but instead throw the same error as when we applied the function to only the problem list member. We can try calling the output variable divided_out after running the sapply statement, but this results in an error as well.

Despite the fact that the function could have successfully run on five of the six members of the list, we receive no output because the error results in a total failure of the sapply. Using tryCatch, we can isolate the error to only those members of the list that caused the problem. I wrap the previous call to the function div_by_5 in the tryCatch function. The first argument to tryCatch is the block of code it should attempt to run for the input, and the second argument is what it should do if an error is encountered. In this case, I have the function return an NA if there is a problem encountered when trying to divide the input

As we can see in the output, despite encountering an error while processing the inputs, the output divided_out has still been generated. Instead of our call failing completely, the outputs were generated for the list members that did not throw an error and the exception was caught and has yielded an NA.

As we saw in the first example, the error messages that we got from applying the function to the list of data told us there was a problem, but it did not tell us where there is a problem. This is fine when working with a small list with six members, but once we scale up to larger inputs then visual inspection to locate all error causing inputs becomes an inaccurate and painful process.

In this example, we have a list with 250,000 members that we want to do some math on. There are two bad inputs introduced where I have changed the list members to strings (so that they break our division function). When we apply our function to the list we get the same error message as before. To try to debug this we could call the input list an scan it visually for deviations from the expected. The problem with this is that the list is really long, so what if we zone out a miss some of the lines causing the errors? Or more likely, what if the source of the error is more subtle than a change of data type? (i.e. an unallowed character within a string) How could we then separate out the lines for which our function is failing?

Enter tryCatch, which we can use to generate more robust errors that direct us to the exact members of the input that are causing the error. We can then subset out the members of the list that are causing the errors and begin to develop to additional cleaning steps that address these deviations from the required input to our function.

The difference here is that we are applying our function to the index of of list (sapply(1:length(nums2), ...) as opposed to directly to the list itself sapply(nums2, ...). Upon encountering an error, our catch code can then print out the index positions associated with the errors, allowing us to locate the problem inputs. These index positions could just as easily be saved to a vector in order to facilitate subsetting of the inputs for detailed inspection.

The application of the function using tryCatch allows us to quickly find the two list members that are causing the problem. We could then double back and add some additional cleaning steps prior to the application of our function or chose to subset out the NAs from the dataframe and proceed with only the clean data.

Being able to identify where, as well as how many, errors are being generated is extremely useful information in the debugging process. We can thereby separate out instances where out code is failing on every member of a list from instances where there are just a few minor deviations from the expected inputs. This can help us determine the source of the problem and more easily find the best way to solve the problem.

Other functions exist that relate to error handling but the above are enough to get started. (The documentation for these functions will lead to all the other error-related functions for any RTFM enthusiasts.)

tryCatch returns the value associated to executing expr unless there's an error or a warning. In this case, specific return values (see NA above) can be specified by supplying a respective handler function (see arguments error and warning in ?tryCatch). These can be functions that already exist, but you can also define them within tryCatch() (as I did above).

Since I just lost two days of my life trying to solve for tryCatch for an irr function, I thought I should share my wisdom (and what is missing). FYI - irr is an actual function from FinCal in this case where got errors in a few cases on a large data set.

Note that these functions, unlike tryCatch(), are expected to wrap a function, not an expression, and they return a modified function.For OP's problem as stated, we would probably use possibly and wrap readLines directly to modify it.

I illustrate with possibly above, but we could easily imagine cases where we would want to use safely() (after which we could extract the result component from each list item, possibly skipping or otherwise handling items with a non-empty error component, perhaps even handling them differently based on the error), or quietly which also captures warnings and messages separately.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Introduction into conditions in standard R Throw your own conditions Handling conditions in R The drawbacks of tryCatch Workaround 1: Interactive debugging Workaround 2: withCallingHandlers + tryCatch

This function evaluates the expression in expr and passes all condition handlers in ... to tryCatch as-is while error, warning and message conditions are logged together with the function call stack (including file names and line numbers).

The default values of some parameters can be set globally via options to avoid passing the same parameter values in each call and to support easy reconfiguration for all calls without changing the code.

This function is a short version of tryCatchLog() that traps any errors that occur during the evaluation of the expression expr without stopping the execution of the script (similar to try in R). Errors, warnings and messages are logged.

By default, most packages are built without source reference information. Setting the environment variable R_KEEP_PKG_SOURCE=yes before installing a source package will tell R to keep the source references.

What happens when something goes wrong with your R code? What do you do? What tools do you have to address the problem? This chapter will teach you how to fix unanticipated problems (debugging), show you how functions can communicate problems and how you can take action based on those communications (condition handling), and teach you how to avoid common problems before they occur (defensive programming).

Not all problems are unexpected. When writing a function, you can often anticipate potential problems (like a non-existent file or the wrong type of input). Communicating these problems to the user is the job of conditions: errors, warnings, and messages.

Messages are generated by message() and are used to give informative output in a way that can easily be suppressed by the user (?suppressMessages()). I often use messages to let the user know what value the function has chosen for an important missing argument.

Generally, you will start with a big block of code that you know causes the error and then slowly whittle it down to get to the smallest possible snippet that still causes the error. Binary search is particularly useful for this. To do a binary search, you repeatedly remove half of the code until you find the bug. This is fast because, with each step, you reduce the amount of code to look through by half.

recover is a step up from browser, as it allows you to enter the environment of any of the calls in the call stack. This is useful because often the root cause of the error is a number of calls back.

dump.frames is an equivalent to recover for non-interactive code. It creates a last.dump.rda file in the current working directory. Then, in a later interactive R session, you load that file, and use debugger() to enter an interactive debugger with the same interface as recover(). This allows interactive debugging of batch code.

Unfortunately the call stacks printed by traceback(), browser() + where, and recover() are not consistent. The following table shows how the call stacks from a simple nested set of calls are displayed by the three tools.

Note that numbering is different between traceback() and where, and that recover() displays calls in the opposite order, and omits the call to stop(). RStudio displays calls in the same order as traceback() but omits the numbers.

A function might never return. This is particularly hard to debug automatically, but sometimes terminating the function and looking at the call stack is informative. Otherwise, use the basic debugging strategies described above.