row by row comparisons in data.frame

1 view
Skip to first unread message

Mark Knecht

unread,
Jul 19, 2009, 12:38:11 PM7/19/09
to Bay Area R Helpers
Hi all,
I'm trying to compare a value with it's previous value on a row by
row basis in this data.frame but it's not working. What am I doing
wrong?

Clearly I'm doing the if statement wrong but I don't see how it's
different from the subtraction so need my eyes opened a bit this
Sunday morning.

Thanks,
Mark


TestDF =
structure(list(
Trade = 1:10,
PosType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
EnDate = c(1040106L, 1040107L, 1040107L, 1040108L,
1040108L, 1040108L, 1040109L, 1040112L, 1040112L, 1040113L)),
.Names = c("Trade","PosType", "EnDate"),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
)

TestDF
TestDF$PrevDate <- with(TestDF, c(0, head(EnDate, -1)))
TestDF$NewDay <- with(TestDF, if (EnDate != PrevDate) 1 else 0)
TestDF$NewDay <- with(TestDF, EnDate - PrevDate)

TestDF

Ted Dunning

unread,
Jul 19, 2009, 1:28:05 PM7/19/09
to Mark Knecht, Bay Area R Helpers

with is not a parallel execution operation.   If requires a scalar argument as a condition.  This results in this message:

In if (EnDate != PrevDate) 1 else 0 :
  the condition has length > 1 and only the first element will be used

My own suggestion is to leave this as a boolean:

TestDF$NewDay    <- with(TestDF, EnDate != PrevDate)

This will have the same effect in arithmetic as the 1 and 0 values that you want and it has the advantage of being usable as an index vector.


On Sun, Jul 19, 2009 at 9:38 AM, Mark Knecht <markk...@gmail.com> wrote:
TestDF$NewDay    <- with(TestDF, if (EnDate != PrevDate) 1 else 0)



--
Ted Dunning, CTO
DeepDyve

Mark Knecht

unread,
Jul 19, 2009, 2:30:01 PM7/19/09
to Ted Dunning, Bay Area R Helpers
Hi Ted,
Humm.....I'm such a newb...

OK, that works in terms of simply creating that column - which is
GOOD so THANKS - but somewhere down the road I'm going to need to do
an "if", or something equivalent. The requirement in the following
data frame is that on "new days" MargAvail = Initial-$4K, while on "!
new days" MargAvail = PrevMarg -$4K. If I use an if to try and get
there I just get the same problem.

I took a shot at trying to write values only when NewDay is TRUE or
FALSE. The code below writes Initial-$4K for all values and the I
tried to overwrite the previous-$4K when it's FALSE, but that is
creating a shorter list and R complains about that so it seems like
I'm back to the list issue.

I'm such a newb...

- Mark

DF =
structure(list(Trade = 1:10, PosType = c(1L, 1L, 1L, 1L, 1L,


1L, 1L, 1L, 1L, 1L), EnDate = c(1040106L, 1040107L, 1040107L,
1040108L, 1040108L, 1040108L, 1040109L, 1040112L, 1040112L, 1040113L

), EnTime = c(1227L, 641L, 915L, 909L, 930L, 953L, 1241L, 641L,
708L, 840L), ExDate = c(1040106L, 1040107L, 1040107L, 1040108L,
1040108L, 1040108L, 1040109L, 1040112L, 1040112L, 1040113L),
ExTime = c(1251L, 1306L, 1300L, 1300L, 1300L, 1301L, 1311L,
1306L, 1311L, 1311L), Pos_PL = c(-146L, 294L, 164L, 184L,
124L, 24L, -146L, 344L, 874L, 224L)), .Names = c("Trade",
"PosType", "EnDate", "EnTime", "ExDate", "ExTime", "Pos_PL"), class =


"data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

DF

InitialCash = 10000
ReqMargin = 4000

DF$PLSum <- with(DF, cumsum(Pos_PL))
DF$Initial <- with(DF, InitialCash + c(0, head(cumsum(Pos_PL), -1)))
DF$PrevDate <- with(DF, c(0, head(EnDate, -1)))
DF$NewDay <- with(DF, EnDate != PrevDate)

### Fill column with Initial-$4K
DF$MargAvail <- with(DF, (Initial - ReqMargin) )
DF$PrevMarg <- with(DF, c(0, head(MargAvail, -1)))

### Overwrite with Previous-$4k - this fails becuase !NewDay isn't
valid all the time

DF$MargAvail <- with(DF, (PrevMarg - ReqMargin)[!NewDay] )

DF

Mark Knecht

unread,
Jul 19, 2009, 4:11:49 PM7/19/09
to Ted Dunning, Bay Area R Helpers
So here's a very simple example that does seem to work, but it feels
very brute force. It depends on R converting TRUE/FALSE to 1/0 and
wouldn't be pretty if I started trying to nest things very deeply.

I need to keep studying. This doesn't feel right.

- Mark

DF <- data.frame(cbind(a=1:4, b=1:2, c=1:8, d=1:16, e=0, f=0))
DF$Test <- with(DF, a == b)
DF$e = (DF$c*DF$d) * DF$Test + (DF$c+DF$d) * !DF$Test

DF$f = with(DF, (c*d)*Test + (c+d)*!Test)

DF

Jim Porzak

unread,
Jul 19, 2009, 11:09:35 PM7/19/09
to Mark Knecht, Bay Area R Helpers
Hi Mark,

## The diff() function get you started:
diff(TestDF$EnDate)

## with() extracts indices of elements which are true:
with(TestDF, Trade[which(diff(EnDate) == 0)])

## so you can easily get a subset of dups (high or low elements) with:
(TestDF.dupsL <- with(TestDF, TestDF[which(diff(EnDate) == 0), ]))
## or
(TestDF.dupsH <- with(TestDF, TestDF[which(diff(EnDate) == 0) + 1, ]))

BTW, most of us use the data.frame function like:

TestDF <- data.frame(Trade = 1:10,
                     PosType = 1,
                     EnDate = c(1040106L, 1040107L, 1040107L, 1040108L,
                                1040108L, 1040108L, 1040109L, 1040112L,
                                1040112L, 1040113L))
to set up a test case.

HTH,
Jim Porzak
Ancestry.com
San Francisco, CA
www.linkedin.com/in/jimporzak
use R! Group SF: www.meetup.com/R-Users/


On Sun, Jul 19, 2009 at 9:38 AM, Mark Knecht <markk...@gmail.com> wrote:

Earl

unread,
Jul 20, 2009, 5:34:24 AM7/20/09
to Bay Area R Helpers
Mark,

First, when you create a data frame there is no need to cbind
anything.

> mk <- data.frame(a=1:4, b=1:2, c=1:8, d=1:16, e=0, f=0 )

if you want a column that is true if a==b, then the easiest way is
> mk$test1 <- mk$a==mk$b
> mk
a b c d e f test1
1 1 1 1 1 0 0 TRUE
2 2 2 2 2 0 0 TRUE
3 3 1 3 3 0 0 FALSE
4 4 2 4 4 0 0 FALSE
[...]

if you want a more complicated conditional, use the ifelse function:
# syntax: ifelse (condition, true value, false value)
> mk$test2 <- ifelse(mk$a == mk$b & mk$b == mk$c, T, F)
> mk
a b c d e f test1 test2
1 1 1 1 1 0 0 TRUE TRUE
2 2 2 2 2 0 0 TRUE TRUE
3 3 1 3 3 0 0 FALSE FALSE
4 4 2 4 4 0 0 FALSE FALSE
5 1 1 5 5 0 0 TRUE FALSE
[...]

now, as near as I can tell, your 3rd line is nonsense:
e = c*d*Test + c*d*!Test
but since Test in {0,1}, e is equal to c*d by definition...???

perhaps you want to assign to e some value if a==b and some other
value otherwise?
df$e <- ifelse(df$a==df$b, df$d, df$c)

============

now, regarding your first post, if you're trying to create 1-lagged
differences, here's a brute force approach, assuming that dates
increase in your rows:
mk[ 2:nrow(mk), ]$val1 - mk[ 1:(nrow(mk)-1), ]$val1
will produce 1-lagged val1

This is easier to see if you use a helper function like diff:


> diff(mk$c, lag=1)
[1] 1 1 1 1 1 1 1 -7 1 1 1 1 1 1 1
> mk[2:16,'c'] - mk[1:15, 'c']
[1] 1 1 1 1 1 1 1 -7 1 1 1 1 1 1 1
>
> sum(abs( diff(mk$c, lag=1) - (mk[2:16,'c'] - mk[1:15,'c'])))
[1] 0
>

Honestly, I'm having some trouble figuring out what you are trying to
do with the R code in your first post. Can you explain?
> > On Sun, Jul 19, 2009 at 10:28 AM, Ted Dunning<ted.dunn...@gmail.com> wrote:
>
> >> with is not a parallel execution operation.   If requires a scalar argument
> >> as a condition.  This results in this message:
>
> >> In if (EnDate != PrevDate) 1 else 0 :
> >>   the condition has length > 1 and only the first element will be used
>
> >> My own suggestion is to leave this as a boolean:
>
> >> TestDF$NewDay    <- with(TestDF, EnDate != PrevDate)
>
> >> This will have the same effect in arithmetic as the 1 and 0 values that you
> >> want and it has the advantage of being usable as an index vector.
>

Ted Dunning

unread,
Jul 20, 2009, 11:41:50 AM7/20/09
to Earl, Bay Area R Helpers

The problem is that "if" is a scalar operation.  You have two good options, one is that you can use the ifelse function as suggested to select things or you can use apply and its friends sapply and tapply.

There are some nice functions for grouping all rows with a particular value.

Mark Knecht

unread,
Jul 20, 2009, 12:34:18 PM7/20/09
to Ted Dunning, Earl, Bay Area R Helpers
On Mon, Jul 20, 2009 at 8:41 AM, Ted Dunning<ted.d...@gmail.com> wrote:
>
> The problem is that "if" is a scalar operation.  You have two good options,
> one is that you can use the ifelse function as suggested to select things or
> you can use apply and its friends sapply and tapply.
>
> There are some nice functions for grouping all rows with a particular value.
>
<SNIP>

> --
> Ted Dunning, CTO
> DeepDyve
>

In general the ifelse option is the solution I chose yesterday
afternoon. My problem in reading the R help files was that it wasn't
clear (to me - stupid new R user) from those docs that there was any
difference between "if (A) B else C" vs ifelse(A,B,C) when actually
there is quite a difference.

I think the other problem has a lot to do with how much I burden you
guys with my data and descriptions of what I'm trying to do. My
inclination was to try and make the question very self contained in
some minimal example. In doing that maybe I create a data.frame in a
way some folks consider non-standard (but in my case does seem to have
value) but unfortunately causes folks to give answers about how to
create data.frames. That's OK - it's good info and helps me learn -
but it wasn't an answer to the root question.

I'm not sure how much information anyone here wants or how deeply to
go into what I'm trying to do. I'm happy to talk about it. There's
nothing greatly secret about this part of it, but it would take time
to write and maybe people aren't interested and all that ends up being
a waste of everyone's time. For me it's not about having a big
database and extracting information. It's more about having a little
bit of data, using it to develop a strategy to handle incoming data,
and then seeing how that strategy worked out using another portion of
the data, then repeat as necessary until I exhaust the data, etc.

- Mark

Earl

unread,
Jul 20, 2009, 2:04:30 PM7/20/09
to Bay Area R Helpers
Yeah, the docs regarding why you need ifelse and what exactly it does
are lacking.

On Jul 20, 9:34 am, Mark Knecht <markkne...@gmail.com> wrote:

Ted Dunning

unread,
Jul 20, 2009, 5:19:15 PM7/20/09
to Earl, Bay Area R Helpers

Mark,

If you worry about how much you are burdening us, pay it forward by building some explanatory pages that describe the issues that you have faced and overcome.  The Excel => R transformation is probably a pretty common use case for newcomers and would be a valuable contribution.  It is hard for anybody not coming from the same place to describe your mind-set and the language that you use would be ideal for other people in the same boat.

Mark Knecht

unread,
Jul 20, 2009, 6:20:45 PM7/20/09
to Ted Dunning, Bay Area R Helpers
Reasonable comments. I'll give that some thought. I don't personally
have any place to put up web pages but maybe I should join the 21st
Century and learn to blog or something. ;-)

Take care,
Mark

Ian P Cook

unread,
Jul 20, 2009, 6:42:30 PM7/20/09
to Mark Knecht, Ted Dunning, Bay Area R Helpers
As innumerable math profs have said, if you have a question, it's
likely that others do as well. I know the discussion is valuable, even
if I'm not having the exact same issue.

Might I suggest that, if you do write it up, since you're using Google
for the forum, you could just share the writeup via Google Docs and
post the link, at least as a short term solution.

-Ian

On Jul 20, 2009, at 6:20 PM, ncMark Knecht <markk...@gmail.com> sha
wrote:
Reply all
Reply to author
Forward
0 new messages