# piecewise regression

23 views

### Christoph Ruehlemann

Mar 5, 2018, 5:04:50 AM3/5/18
Hi all,

Can't seem to figure out where the mistake is in this code for a piecewise regression model:

model <- lm(X ~ (xyz < 0.89474) * xyz + (xyz >= 0.89474) * xyz)

Its plot has an ugly V-shape added to the break point (see plot attached):

XYZ=sort(unique(xyz))
plot(XYZ ~ xyz, frame=F, type="n")
lines(XYZ, predict(model, list(xyz=XYZ)))

Does anybody spot the mistake?

Chris
Piecewise regression.tiff

### Stefan Th. Gries

Mar 5, 2018, 11:52:49 PM3/5/18
to CorpLing with R
Let me first seize this opportunity to remind everyone of a few useful
- <https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example>
- and maybe <https://cran.r-project.org/web/packages/reprex/index.html>,
which I haven't used myself.

In that spirit:

rm(list=ls(all=TRUE))
# define some data
qwe <- structure(list(x = c(0.98332368268674, 0.98332368268674,
1.52269832078749,
2.14415170816445, 2.61317313259988, 3.04701795020266, 3.64502026635785,
4.52443543717429, 5.08726114649681, 5.69698899826288, 5.49765489287782,
5.49765489287782, 4.92310364794441, 4.38372900984366, 3.77400115807759,
3.38705848291836, 2.39038795599305, 2.39038795599305, 1.86273885350318,
1.40544296467863, 0.936421540243196, 1.92136653155761, 3.31670526925304,
4.16094383323683, 5.00518239722061, 5.55628257093225, 6.28326577880718,
6.65848291835553, 7.25648523451071, 8.21797915460336, 8.69872611464968,
9.30845396641575, 8.79253039953677, 8.39386218876665, 8.39386218876665,
7.77240880138969, 7.02197452229299, 6.77573827446439, 6.31844238563984,
6.31844238563984), y = c(5.47882045322487, 5.47882045322487,
4.70223707147008, 4.45514235909355, 4.10214991284137, 4.03155142359094,
3.64325973271354, 2.54898314933178, 1.87829750145264, 2.12539221382917,
2.26658919233004, 2.26658919233004, 3.07847181871005, 3.07847181871005,
3.53736199883789, 3.78445671121441, 4.87873329459616, 4.87873329459616,
5.30232423009878, 5.83181289947705, 5.58471818710052, 5.23172574084834,
4.59633933759442, 3.25496804183614, 3.11377106333527, 3.67855897733876,
4.73753631609529, 5.65531667635096, 6.50249854735619, 8.19686228936665,
9.00874491574666, 9.43233585124927, 8.83224869262057, 8.16156304474143,
8.16156304474143, 7.06728646135967, 5.40822196397443, 5.40822196397443,
4.63163858221964, 4.63163858221964)), .Names = c("x", "y"))

# visualize
plot(qwe); grid()

# prepare for analysis
y <- qwe\$y; x <- qwe\$x # original data
splitter <- ifelse(x>=5, "hi", "lo") # breakpoint indicator
summary(model <- lm(y ~ x*splitter)) # fit and summarize the model

# prepare for adding regression lines
PREDICTOR <- seq(1, 9, 0.25) # x-axis values for predictions
PREDICTIONS <- predict( # y-axis values for predictions
model,
newdata=list(
x=PREDICTOR,
splitter=ifelse(PREDICTOR>=5, "hi", "lo")))

# augment visualization
abline(v=5) # add line for where the split occurs
lines(PREDICTOR, PREDICTIONS)

HTH,
STG

### Christoph Ruehlemann

Mar 6, 2018, 4:39:00 AM3/6/18
thanks for this, but it seems this graph too plots the two regression lines with an unwanted 'nose', doesn't it (see attachment)?

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.

--
Screen Shot 2018-03-06 at 10.35.10.png

### Alex Perrone

Mar 6, 2018, 8:48:27 AM3/6/18
No, it doesn't. Change this line

PREDICTOR <- seq(1, 9, 0.25) # x-axis values for predictions

to

PREDICTOR <- seq(1, 9, 0.01) # x-axis values for predictions

and re-plot.

### Alex Perrone

Mar 6, 2018, 8:49:22 AM3/6/18

On Tue, Mar 6, 2018 at 8:48 AM, Alex Perrone wrote:
No, it doesn't. Change this line

PREDICTOR <- seq(1, 9, 0.25) # x-axis values for predictions

to

PREDICTOR <- seq(1, 9, 0.01) # x-axis values for predictions

and re-plot.

### Stefan Th. Gries

Mar 6, 2018, 11:25:18 AM3/6/18
to CorpLing with R
Well, even

PREDICTOR <- seq(1, 9, 0.01) # x-axis values for predictions

has that nose, it's 'behind' the vertical abline. The nose is 'the connection' between the first and the second regression line - if you don't have the abline hiding it, just plot them separately:

lines(PREDICTOR[PREDICTOR>=5], PREDICTIONS[PREDICTOR>=5])
lines(PREDICTOR[PREDICTOR<5], PREDICTIONS[PREDICTOR<5])

### Christoph Ruehlemann

Mar 8, 2018, 1:58:18 PM3/8/18
The nose and the disconnect between the segments may be due to lack of precision in the way the break point is determined.
After re-determining ithe break point for my data based on the method detailed in Crawley (2007: 427), the two segments perfectly connect.

The steps involved are:

1. define a vector "breaks" for potential breaks:
2. run a for loop for piecewise regressions for all potential break points and yank out the minimal residual standard error (mse) for each model:

mse <- numeric(length(breaks))
for(i in 1:length(breaks)){
piecewise <- lm(V_indep ~ V_dep*(V_dep < breaks[i]) + V_dep*(V_dep >=breaks[i]))
mse[i] <- summary(piecewise)[6]
}
mse <- as.numeric(mse)

3. identify the break point with the least mse:

breaks[which(mse==min(mse))]

This methodology solved my problem. Perhaps it solves the problem in other data sets too.

Chris

--
You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corpling-with-r+unsubscribe@googlegroups.com.
To post to this group, send email to corpling-with-r@googlegroups.com.