Re: Transform many columns of relative abundance data

Message has been deleted

Matt Espe

unread,

May 18, 2016, 5:21:52 PM5/18/16

to Davis R Users' Group

Hi Zach,

It would be good to know a little bit more about what you are trying to accomplish. What type of regression do you plan to use? What is your response? What is very large?

Without knowing more, all I can say is that numbers constrained to be between 0 and 1 can cause some issues at the boundaries at 0 and 1 (as the model will assume less than 0 or greater than 1 is possible). You can transform the data to unconstrained space (not a Box/Cox - power transformations are not what you want here) to avoid some of these concerns.

There are other issues in general with using relative frequency (it depends highly on your count number which is a function of sampling variability). If possible, using the "raw" counts can be more robust, depending on what you are doing.

Matt

On Wednesday, May 18, 2016 at 12:01:16 PM UTC-7, Zachary Pierce wrote:

Hi all,
I am working with a very large dataset that involves microbial relative abundance data (between 0-1). I am TOLD that it would be wise to transform these values for regression particularly, but there seems to be some contention here. Perhaps someone has some insight into that matter (using rel abundance data as predictors).

Anyway, I want to perform a transform of all columns of rel abund data using a box/cox process to determine the best transform, then apply that function to all specified columns that contain rel abund data.

Does anyone have any familiarity with this process? I'm still learning here, so go easy on me.

Thanks very much!!

Zach

Reply all

Reply to author

Forward

Message has been deleted