Ah, my bad - you're correct.
On Tue, Jun 20, 2017 at 4:40 PM, Emanuel Gonçalves
<
emanuelv...@gmail.com> wrote:
> I think what I'm looking for is:
>
>> extractRawCopyNumbers(cbsmodel, array=1, chromosome=2)
>
>
> Problem is I have ~200 and the function seems to be taking quite sometime to
> get me the results (is this because I don't have all the cbs computed yet?),
> is there a way to export the whole data-set raw copy numbers?
Yes, that's the way to do it.
That's what fit() for CbsModel is using internally, and yes, there's a
bit of overhead in each of those calls. There's quite a bit of
validation etc going on. It's probably possible to make it faster for
the case when one would want to pull out data for all samples and all
chromsomes at once - but, as far as I remember, I don't think we
implemented that per se. If you want to do your own segmentation, see
example at the end.
These day you can parallelize it quite easily using future, cf.
http://www.aroma-project.org/howtos/parallel_processing/. For
example, here's how you can extract chromosomes in parallel:
future::plan("multiprocess")
array <- 1L
cns <- future_lapply(1:23, FUN = function(chr) {
extractRawCopyNumbers(cbsmodel, array = array, chromosome = chr)
})
If you have completed fit(), then all the CBS results (segments and
locus-level data) are saved to file for each (array, chromosome).
These "internal" data files are located in:
path <- getPath(sm)
You can pull out, say, all Chr 1 results across all samples as:
pathnames <- dir(path = getPath(sm), pattern = ",chr01,", full.names = TRUE)
fits <- lapply(pathnames, FUN = loadObject)
locusData <- lapply(fits, FUN = `[[`, "data")
However, those files don't contain information of the cell names
("clones" aka "probe names").
Below is a way you can get the normalized locus-level CN signals that
can be used for downstream segmentation. This example uses
Mapping10K_Xba142 data, but the idea is the same:
> clones <- getUnitNames(getUnitNamesFile(gsT))
> positions <- readDataFrame(getAromaUgpFile(gsT))
> signals <- extractMatrix(gsT)
> data <- cbind(clones, positions, signals)
> str(data)
'data.frame': 10208 obs. of 13 variables:
$ clones : Factor w/ 10208 levels "AFFX-5Q-123",..: 1 2 3 4 5013
9020 3048 8172 4263 10008 ...
$ chromosome: int NA NA NA NA 6 7 10 1 15 12 ...
$ position : int NA NA NA NA 162491313 42608794 68113302 22754354
28848178 53641300 ...
$ GSM226867 : num 5209 2042 5005 3750 1563 ...
$ GSM226868 : num 4799 1995 4726 3719 1675 ...
$ GSM226869 : num 5069 2070 4661 3450 1881 ...
$ GSM226870 : num 5502 2492 5164 3809 1828 ...
$ GSM226871 : num 5462 2298 4917 3678 2131 ...
$ GSM226872 : num 5232 1886 4671 3459 2227 ...
$ GSM226873 : num 5878 2479 5588 4372 1817 ...
$ GSM226874 : num 6363 2668 5768 4356 1396 ...
$ GSM226875 : num 4789 1765 4543 3610 1876 ...
$ GSM226876 : num 4762 1876 4614 3241 2568 ...
With those data, you can run your own segmentation - but you do have
to worry about how calculate CN ratios, i.e. what is T and what is N
in C = T / N etc.
Hope this helps
Henrik