Sleuth workflow for group factors

42 views

Skip to first unread message

Asan Emirsale

unread,

Jan 24, 2024, 6:39:44 AMJan 24

to kallisto and applications

Hello!

The aim of the study to assess the change of expression between several genotypes of the same species under the same experimental conditions. There are two questions regarding this task:
1) how should (a) the matrix be designed and (b) the model fitted?
2) it is clear that in the absent of control, the design matrix should be re-leveled each time the first factor changes to fit model against new "reference" base. (a) Should the sleuth object be prepared every time base level changes? (b) Would the "reference" role of the genotype affect the test result ("A(reference) vs B", or "B(reference) vs A")?

I tried to implement this in code, following the maximalistic approach in comparing all possible pairs (i.e. "A_vs_B", "B_vs_A" etc.).

```R
metadata <- dplyr::select(metadata, c('lib', 'sample'))
metadata$sample <- as.character(metadata$sample)

head(metadata, n = 18)
```

| | lib | sample |
| ---- | ---- | ---- |
| | \<chr\> | \<chr\> |
| 1 | Q5 | A |
| 2 | Q8 | A |
| 3 | Q10 | A |
| 4 | Q1 | B |
| 5 | Q4 | B |
| 6 | Q7 | B |
| 7 | Q12 | C |
| 8 | Q14 | C |
| 9 | Q16 | C |
| 10 | Q18 | D |
| 11 | Q3 | D |
| 12 | Q6 | D |
| 13 | Q9 | E |
| 14 | Q11 | E |
| 15 | Q13 | E |
| 16 | Q15 | F |
| 17 | Q17 | F |
| 18 | Q2 | F |

```R
genotypes <- unique(metadata$sample)

# preparing sleuth object
so <- sleuth_prep(s2c, num_cores = 12, extra_bootstrap_summary = TRUE, read_bootstrap_tpm = TRUE)

for (idx in 1:length(genotypes)) {
message <- paste("Starting loop", idx, "with sample", genotypes[idx], sep = " ")
print(message)

# relevel matrix
new_s2c <- so$sample_to_covariates
new_s2c$sample <- as.factor(new_s2c$sample)

new_s2c$sample <- relevel(as.factor(new_s2c$sample), ref = genotypes[idx])

md <- model.matrix(~new_s2c$sample, new_s2c)
colnames(md)[2] <- paste('sample', genotypes[idx], sep = "")

# fit new model with the reference changed
new_fit_name <- paste("full", genotypes[idx], sep = "_")
so <- sleuth_fit(so, ~sample, fit_name = new_fit_name)

# redo LRT
so <- sleuth_lrt(so, 'reduced', new_fit_name)

for (bidx in 1:length(genotypes)) {
if (idx != bidx) {
comp <- paste(genotypes[idx], genotypes[bidx], sep = "_vs_")
print(comp)
# Perform Wald Test
new_cond <- paste("sample", genotypes[bidx], sep = "")
sleuth_wt(so, new_cond, which_model = new_fit_name)
}
}

so_label <- paste("so", "ref", genotypes[idx], sep = "_")
so_label <- paste(so_label, ".sleuth", sep = "")
sleuth_save(so, file = so_label)
models(so)
tests(so)
}
```

Best regards

Asan

Asan Emirsale

unread,

Jan 24, 2024, 6:45:41 AMJan 24

to kallisto and applications

P.S. sorry for Obsidian markdown artifacts.

Here is more friendly representation

lib sample

<chr> <chr>
1 Q5 A
2 Q8 A
3 Q10 A
4 Q1 B
5 Q4 B
6 Q7 B
7 Q12 C
8 Q14 C
9 Q16 C
10 Q18 D
11 Q3 D
12 Q6 D
13 Q9 E
14 Q11 E
15 Q13 E
16 Q15 F
17 Q17 F
18 Q2 F