Rpart.plot Install

0 views

Skip to first unread message

Rode Strawther

unread,

Aug 4, 2024, 3:50:15 PM8/4/24

to inprinoufin

whentrying to install rpart.plot. The fix seems to be here: Error: could not find function "install.packages" for R markdown; but the post is lacking detail. How do I "Look for it [an '.Rprofile with an install command on it'] and remove the command."?

Hey, To address the error message regarding the inability to find the install.packages function when installing rpart.plot in R, you need to locate and edit your .Rprofile file. This file may contain a command to install packages automatically. Search for any lines containing install.packages in the .Rprofile file and remove or comment out these lines. After saving the changes and restarting R, attempt to install rpart.plot again using the install.packages function to resolve the issue. e.g: on the sublime by longinus

Locate the .Rprofile File : The .Rprofile file is a hidden file in your R home directory that contains R startup settings. Depending on your operating system, the location of this file may vary. You can typically find it in your user directory or R home directory.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Tree models are the most natural method to model response variables that are categorical (i.e. soil type, land cover type, etc.). However, they can also be used to model continuous responses but be careful of over fitting.

I had some errors when installing caret and found out you may need to remove the cli library and then reinstall it to get the latest version. The code below will do this and only needs to be exectued once.

CART results additional libraries to be loaded into R as shown below. You only need to execute this code once in R-Studio. Also, there is a library, rpart.plot, that provides much better looking trees than the standard plot() function. There is a section on this library at the bottom of the page.

The standard plot(...) function does not produce good looking trees by default so we'll use rpart.plot(...) to create a figure of the tree. The code below will create a nice simple and relatively readable tree. There are many options to change in rpart.plot(...). Take a look at Plotting rpart trees with the rpart.plot package for more options.

If you print the tree, you will see a rather complicated version of the tree that includes all of the values that the tree was built with. Below is the result of calling print(...) for a small set of data with just 39 values.

Here you can see each of the nodes of the tree, the condition that was used to split the values of the tree into each branch, the number of values that ended up in each branch, the number of values that were miss classified, and the predicted value. As the number of values in the tree becomes large, these printouts become very hard to interpret.

The "Root node error" shows us the number of values that were correctly classified at the first split (branch) divided by the total number of values to give us a proportion of the values that were correctly classified. "n" is just the total number of values.

The CP parameter is important because it helps us determine the level of complexity that provides for different amounts of error and we'll use it to "prune" trees or reduce their complexity. The "rel error" is 1 minus the R squared value and can be used to see how much of the variability in the data the model explains.

When we create trees using values that are text (e.g. Common names), R will convert the text into factors. Factors have a number that is matched to each unique text string. The model actually uses the numbers internally. The problem is that when we do a prediction, we will get the number back instead of our string. To address this, we want to be able to model based on the integer values of the factors and find out which number matches which text string.

Each time we make a split we increase the fit of the model but we also increase the trees complexity. Thus we want to control the complexity of the tree. We can do this with the rpart.control() function which can include a variable for the minimum number of values for a split to occur (minsplit) and the complexity parameter (cp). Typically we will allow for more complex trees by decreasing the cp value (i.e. try 0.002).

The diagrams of the trees are one of the best tools for evaluating the trees. Another tool is a "confusion matrix". This is a table that contains the response values along the top and left side and then each entry in the table shows the number of response values that were matched to each predicted value. For a perfect tree, the only counts that would be greater than 0 would be along the diagonal. You can view these tables in R but I recommend using MS-Excel.

The code above shows that it is easy to create a prediction for the original data. However, typically we'll want to predict into a new dataset that contains a grid of points that we can convert to a raster in a GIS application. The code below shows how to do this by changing the "newdata" parameter when we do a prediction. Note that the names of the columns in the NewData must exactly match the columns in the dataset that was used to create the original model for this to work.

rpart.plot provides tree plots that are typically better looking and allow for more customization than the standard plot() function. We need to install and include the library rpart.plot and then we can call rpart.plot() to display the tree.

In this tutorial, we demonstrate how to use the treevalues package to perform inference on a CART (Breiman et al. (1984)) tree fit using the rpart package. Throughout this tutorial, we work with an example tree fit to the Box Lunch Study dataset, which was originally provided in the visTree package and described in Venkatasubramaniam et al. (2017).

The treevalues package is designed for use with the rpart package. All trees should be built using the package rpartwith the parameter model=TRUE, which saves a copy of the training data inside of the fitted rpart object.

The argument cp, or complexity parameter, is a scaled version of the complexity parameter \(\lambda\) described in our manuscript. For response variable \(y\), \(\lambda =\textcp \times \sum_i=1^n (y_i-\bary)^2\). The larger the value of cp, the more the tree will be pruned.

We begin by plotting our tree. While we could plot the tree using plot() from the rpart package, or rpart.plot() from the rpart.plot() package, we instead use our internaltreeval.plot() function. To start, we set inferenceType=0 so that no p-values or confidence intervals are computed.

We now pass this branch into branchInference. We specify that we are interested in the difference between region 8 and its sibling by setting type="sib". The branchInference function returns a p-value for the test of the null hypothesis that regions 8 and 9 have the same mean response. It also returns a confidence interval for the true difference in means between the regions. In this case, the confidence interval includes 0 and the p-value is large.

In our framework, it is assumed that \(y_i \sim N(\mu_i, \sigma^2)\) and \(\sigma^2\) is assumed known. If no argument sigma_y is provided to the branchInference() function, the conservative estimate sd(y) is used.

By default, when type="reg", the function branchInference() conditions on the event that the exact branch branch appears in the tree. As mentioned in Neufeld, Gao, and Witten (2021), there is potential for higher powered inference if we condition on all possible permutations of the branch. Conditioning on all possible permutations increases power and, in this case, makes the confidence interval significantly narrower.

While this seems like an argument for always setting permute=TRUE, we note that computations can be prohibitively slow for large trees when permute=TRUE. We also note that adding permute=TRUE tends to make a large difference only in trees where the overall signal is weak. Trees with strong signal and highly significant splits tend to be more stable, and in more stable trees the addition of permute=TRUE does not tend to substantially shorten confidence intervals.

We can bypass the need to specify a specific branch by making the following plot, which includes p-values for each split and confidence intervals for each region. By default, the plot makes 95% confidence intervals for the mean within each region and reports p-values for a test of no difference in means across each split. When looking at a large tree, this plot should be interpreted with care, as the p-values have not been corrected for multiple testing.

There are many ways to customize the output of treeval.plot(). If the default version is too congested, alternate values of inferenceType can be provided to customize how much information is displayed. Additional arguments provided will be passed on to rpart.plot().

Alteryx Designer Desktop includes a suite of Predictive Analytics tools that use R,an open-source code base used for statistical and predictive analysis. In order to use the Predictive Macros in Alteryx, users must install R and the packages used by the R tool.

Then install Rattle using R's package manager. As a separate stepit is usually best to install the RGtk2 package which will downloadthe GTK libraries for Mac OS/X and link them into R. This can takesometime and is a prerequisite for loading Rattle. Start R and enter the following command at the R prompt. R asks us to nominate a CRAN mirror. Choose a nearby location. > install.packages("RGtk2") > install.packages("rattle") > q()

Restart R and then load Rattle with the following two commands at the R prompt. This loads the Rattle package into the library and starts up Rattle. > library(rattle)> rattle() Note on Installation of Rattle on Mountain Lion Thanks to Joe Trubisz for this script that he tested on severaldifferent machines and confirms it works (15 November 2013).