ank youHello James,
I think that this is more likely a limitation of DiffBind than our pipeline. From the DiffBind documentation, by default DiffBind expects at least 3 replicates per experimental group:
“This call will set up any "default" contrasts by examining the project metadata factors and assuming we want to look at the differences between any two sample groups with at least three replicates in each side of the comparison (that is, any factor that has two different values where there are at least three samples that share each value.)”
This makes sense, since DiffBind does a normalization step within the differential binding analysis to improve results, which means that it expects experimental groups to have an n>1 . I think we have adapted it to work with 2 samples per experimental group, but the statistics of differential binding analysis probably break apart if you have an n=1.
In general, any differential analysis benefits greatly from having an n >1, in general, we tend to recommend an n >3 if possible, but even more if the effect you are looking for is not very pronounced. Anything below 3 replicates per experimental group will likely result in a lot of false positives and negatives that will be hard to correct for.
With that being said, it is important to a make a distinction between technical and biological replicates and how they translate to “readsets” in GenPipes:
Remember, in the ChIP seq pipeline, your experimental groups have to have an n>1 of samples, not readsets.
So to keep the example going, you have an ChIP experiment comparing two cell lines: A and B. You sample the cell lines at 2 different points in the day, so you have: A1, A2, B1, and B2. Finally, when preparing your ChIP libraries, each for each sample you have your mark (H3K27ac) plus it’s input. You now have 8 different Samples:
Notice that so far, each sample consists of one readset. In your design file (like Mareike explained) you should have something like this and it should work:
Sample MarkName A_vs_B
A1 H3K27ac 1
B1 H3K27ac 2
A2 H3K27ac 1
B2 H3K27ac 2
Now, you might be wondering what is the point of readsets and how you should use them. To extend our example, let’s assume that when you send your samples for sequencing, the technicians preparing the libarires load each of your samples in two sequencing lanes, which means that for each you get 2 sets of paired fastq files (or equivalent) from the sequencer, 1 pair for lane 1 and 1 pair for lane 2.
Since the only difference between them is just the sequencing lane, it doesn’t make sense to consider these true biological replicates, they are just a technical replicate that is a result of how samples were loaded into the sequencer. Therefore, they are readsets, you will want to merge them in the end, because they don’t reflect your experimental design.
Based on the information you shared with us so far, it seems that what you have in your examples are not actually readsets but different samples, which is why, when you are merging them, GenPipes is erroring out. You are essentially reducing your experimental groups to an n=1 and therefore all the differential binding analyses will not work.
I hope this makes sense and let us know if you still have any other questions.
Best,
--
You received this message because you are subscribed to the Google Groups "GenPipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genpipes+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genpipes/c51b0732-36f1-4cfd-8938-0e7350402224n%40googlegroups.com.