Replacing Excel - data.frame? matrix? array? Other?

1 view
Skip to first unread message

Mark Knecht

unread,
Jun 30, 2009, 1:25:49 PM6/30/09
to Bay Area R Helpers
Hi,
I'm building an R model that is essentially an attempt to replace a
bloated Excel spreadsheet with something that is both more
maintainable as well as programmable. I have a general question about
data in R and am looking for some general guidance from anyone who
wishes to share and/or make a suggestion. I'll keep this very simple.
This is a learning exercise and I'm not really worried about the
perfect answer or whether I end up doing it a second time later. For
now I just want to get similar functionality.

OK, the spreadsheet is linked to csv files which fill the first 10
columns of a dozen worksheets in the same spreadsheet. In R I can read
this data in and I get data.frames that are 10 columns wide. I
currently have a dozen data frames.

Now, for each data.frame I need a scratch pad area just like the
extra cells to the right in Excel. How should I create this and why?

1) Make some new data.frames and if required later merge them with my
csv data.frame?

2) Make a larger matrix or array, fill it with the data from my
data.frames and leave space to work?

3) Something else completely?

I am unlikely to create anything to fill this work area other than
numerical values for now. I have very little logic (I.e. - true/false)
in my current spreadsheets and I do not create new strings. In the
short term this won't change.

If you need to ask questions please do.

Thanks in advance for any ideas.

Cheers,
Mark

Mark Knecht

unread,
Jun 30, 2009, 3:32:05 PM6/30/09
to Bay Area R Helpers

Could life be any more simple? I cannot believe how easy this language
seems to make working with data!

System1<- data.frame(System1,Profit=0,Loss=0,Winners=0,Losers=0)

And here we go - I now have 4 named columns to work within?

Or am I somehow fooling myself here?

Cheers,
Mark

Daniel Levine

unread,
Jun 30, 2009, 3:37:53 PM6/30/09
to Mark Knecht, Bay Area R Helpers
that does work, but you don't need to add columns before you use them. When you want the new column all you need to do is:

dataframe1 <- data.frame(vector1, vector2, vector3... etc)
then you have a dataframe with the vectors of data when you want to add a new column all you need to do is use the $ operator

dataframe1$new_vector <- NULL to set every element in it to empty or etc, whatever you like more

in a similar manner you can use the $ operator to access columns in the dataframe as vectors

dataframe1$vector1 will give you access to vector1 in that dataframe as a vector for performing operations

Mark Knecht

unread,
Jun 30, 2009, 3:51:01 PM6/30/09
to Daniel Levine, Bay Area R Helpers
Daniel,
Certainly, I can add columns on the fly. However in the initial
pass at this I know I'm going to need all the columns I currently have
in the spreadsheet so I've simply created a function which adds them
all at once.

Or are you saying there's a real advantage to specifically not
setting a value to 0?

I can certainly add columns on the fly should I find the need but
currently my needs are pretty much specified so I think I know the
columns and they all get filled up before I even start looking at the
charts.

I've been using the $ operator when making plots.

Cheers,
Mark

Ted Dunning

unread,
Jun 30, 2009, 5:38:22 PM6/30/09
to Mark Knecht, Bay Area R Helpers

You can define new columns of a data frame simply by referring to them on the left side of an assignment.  See below.

On Tue, Jun 30, 2009 at 10:25 AM, Mark Knecht <markk...@gmail.com> wrote:
3) Something else completely?

> x = data.frame(a=1:20)
> x
    a
1   1
2   2
3   3
...
18 18
19 19
20 20
> x$b = exp(-x$a)
> x
    a            b
1   1 3.678794e-01
2   2 1.353353e-01
...
17 17 4.139938e-08
18 18 1.522998e-08
19 19 5.602796e-09
20 20 2.061154e-09
>

Ted Dunning

unread,
Jun 30, 2009, 5:40:14 PM6/30/09
to Mark Knecht, Daniel Levine, Bay Area R Helpers

Your code will be smaller, easier to read, and faintly faster if you don't define these until you have a real value to put into them.

Earl

unread,
Jun 30, 2009, 5:53:46 PM6/30/09
to Bay Area R Helpers
If they are working variables, you also don't have to add them to a
data frame. Eg,

> d <- data.frame(x=1:20, y=runif(n=20))
> tt <- rep(0, nrow(d))
> x <- rep(0, nrow(d))
> #
> # some computations with tt, x here
> #
> d$result <- ( # some computation of x, tt, etc)
> > On Tue, Jun 30, 2009 at 12:32 PM, Mark Knecht <markkne...@gmail.com> wrote:

Mark Knecht

unread,
Jun 30, 2009, 6:11:42 PM6/30/09
to Earl, Bay Area R Helpers
Earl,
They are to some extent working variables. I probably don't have to
save them all forever but I currently have 20 data frames. The data
frames all have the same width but they vary in length from 200 to
about 5000 rows. Adding the variables to the data frame was my
beginners way of keeping everything aligned with the data that created
it. It might be slower, bigger, etc., but at least if I got there
there correctly and then go back to get it I should know what I'm
getting. This is clearly me coding my spreadsheet in R but for now I'm
comfortable as I can compare a data frame and the spreadsheet cell by
cell if I choose to to look for errors.

After all these variables are computed and placed in the data frame
I'll make charts of some of the variables against some of the stuff in
the original data frame, as well as build a bigger data frame that
uses values from all 20 frames to make decisions.

For now I'm *most* concerned about not making mistakes and getting
valid results. I fully expect that when I'm done I will have learned
enough to do better. In the meantime I really appreciate all the ideas
I'm getting. They are making an impact on even my first pass, proof of
concept code.

Thanks,
Mark

Ted Dunning

unread,
Jun 30, 2009, 7:29:23 PM6/30/09
to Mark Knecht, Earl, Bay Area R Helpers

I think that this is good practice long-term.  The only other think I would do would be to put all the data into a single data.frame with a "Company" or some such field to distinguish one from another.  If you do that, then you can compute all of the per-line values for all companies in a single statement and you can write a nice loop to do all the plots:

> x1 = data.frame(a=1:5, b=5:1)
> x1
  a b
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
> x2 = data.frame(a=seq(10,50,by=10), b=2)
> x2
   a b
1 10 2
2 20 2
3 30 2
4 40 2
5 50 2
> x = rbind(data.frame(x1, key="first"), data.frame(x2, key="second"))
> for (i in levels(x$key)) {
+   j = x$key == i
+   plot(b~a, x[j,])
+ }

Mark Knecht

unread,
Jun 30, 2009, 7:52:27 PM6/30/09
to Ted Dunning, Earl, Bay Area R Helpers
I think you have the right idea but my data isn't really quite like
that. These are results of trading systems applied to different
markets. (Mainly index futures.) so for instance I might have systems
A, B & C, and I might apply these systems to Dow, Gold and Oil
futures. It might turn out that over the same period of time - say 5
years - that system A produces 300 trades for $50K profit on the Dow,
700 trades for $45K profit on Gold and 1300 trades for $60K profit on
Oil. (completely fictional markets, returns and trades.) The data
comes form other applications and each system/market combination is
spit out as an individual csv file.

For now anyway keeping everything separate just mimics the way I do it
in Excel where I don't have nearly as much freedom in the way I
arrange the data. A worksheet in Excel is a data frame in R - that
sort of thing.

None the less the ideas are great and are really helping me see the
possibilities. I'm now using functions in a moderately successful way.
I've got multiple charts up on the screen. It takes mines in Excel to
calculate what takes 2-3 seconds in R. Yeah - I'll be there's ways to
do it better but I'm on the edge of both making a 30x speed
improvement AND I can see possibilities in terms of real time links
between R and my trading platforms that would never work in Excel.

Life is looking more and more interesting, but I still have quite a ways to go.

thanks very much,
Mark

Reply all
Reply to author
Forward
0 new messages