Amplified pooled CRISPR sub-library prep for NGS quality control

1,419 views
Skip to first unread message

dora.tarlungeanu

unread,
Apr 1, 2019, 11:45:59 AM4/1/19
to Genome Engineering using CRISPR/Cas Systems
Dear all,

I recently started a new project and I am planning to perform screens using both the human CRISPR KO whole-genome pooled library (Brunello) as well as pooled KO sub-libraries (101927 and 101928 from Addgene).
Currently I have some issues figuring out the right primers and protocols for the NGS analysis of the library representation post-amplification.

For the whole-genome CRISPR (Brunello) library I am planning to use the primers provided in the Addgene protocols which are actually the same as the ones from the GeCKO library for the 2 system approach in lentiGuide-Puro vector. I will mix together 8 P5 primers - have different stagger regions to avoid monotemplate while sequencing- with one P7 reverse primer that contains a barcode, as follows:

P5/P7 flowcell attachment sequence
Illumina sequencing primer
Vector  priming sequence
Stagger region/barcode

P5 Fwd primers:
  F5 0 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TTGTGGAAAGGACGAAACACCG
  F5 1 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT C TTGTGGAAAGGACGAAACACCG
  F5 2 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT GC TTGTGGAAAGGACGAAACACCG
  F5 3 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT AGC TTGTGGAAAGGACGAAACACCG
  F5 4 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT CAAC TTGTGGAAAGGACGAAACACCG
  F5 6 nt stagger: AATGATACGGCGACCACCGAGATCT ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGCACC TTGTGGAAAGGACGAAACACCG
  F5 7 nt stagger: AATGATACGGCGACCACCGAGATCT  ACACTCTTTCCCTACACGACGCTCTTCCGATCT ACGCAAC  TTGTGGAAAGGACGAAACACCG
  F5 8 nt stagger: AATGATACGGCGACCACCGAGATCT  ACACTCTTTCCCTACACGACGCTCTTCCGATCT GAAGACCC TTGTGGAAAGGACGAAACACCG

P7 Rev primer:   CAAGCAGAAGACGGCATACGAGAT AAGTAGAG  GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTCTACTATTCTTTCCCCTGCACTGT

For the sub-libraries (which are in the pMCB320 vector) I found the following primers:
Fwd primer:  aatgatacggcgaccaccgaGATCTACACGATCGGAAGAGCACACGTCTGAACTCCAGTCACgcacaaaaggaaactcaccct
Rev primer:   caagcagaagacggcatacgagat ACACGATC  ACACTCTTTCCCTACACGACGCTCTTCCGATCT CGACTCGGTGCCACTTTTTC

My questions are:
  1. I would like to sequence all 3 libraries in one lane thus  would I need to use a 3rd barcode for the reverse primers (especially for the sub-library) ?
  2.  I noticed that the flowcell attachement sequences (green) are the same for both cases however the illumina sequencing primers(black) differ (between lentiGuidePuro and pMCB320), should I change to the same ? Or it depends on the type of NGS approach?
  3.  Would I also need stagger regions for the fwd primers of the sub-libraries?
 
I am new to this thus please feel free to give me your input and let me know if the reasoning I described makes sense.

Thanks a lot in advance.
Dora

Julia Joung

unread,
Apr 3, 2019, 6:40:50 PM4/3/19
to dora.tarlungeanu, Genome Engineering using CRISPR/Cas Systems
Hi Dora,

It looks like your two different primer sets have the same P5 and P7 sequences, but the corresponding sequencing primer sequences are swapped. This means that when you are sequencing Read 1, you are sequencing from the U6 promoter of your Brunello library, but sequencing from the end of the gRNA sequence for your sub-libraries. To sequence all 3 libraries in 1 lane, this will require more NGS cycles than would normally be required and increase the cost of your sequencing run.

Therefore, I would recommend that you redesign the primers for your sub-libraries using the format you have for the Brunello library, in order to sequence the gRNA target region only in Read 1 (See Joung Nat Protocols 2017 for more details). You will need to include additional barcodes for your sub-libraries, but since you will likely be sequencing your sub-libraries at lower depth (fewer reads), you will not need to design stagger primers, so you only need to design 1 Fwd primer and 2 Rev primers for your sub-library.

Best,
Julia

--
You received this message because you are subscribed to the Google Groups "Genome Engineering using CRISPR/Cas Systems" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crispr+un...@googlegroups.com.
To post to this group, send email to cri...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dora.t

unread,
Apr 4, 2019, 7:42:54 AM4/4/19
to Genome Engineering using CRISPR/Cas Systems
Hi Julia,

Thanks a lot for the helpful tips. I will then proceed with the next steps.

Best wishes,
Dora

joi, 4 aprilie 2019, 00:40:50 UTC+2, Julia Joung a scris:
Hi Dora,

It looks like your two different primer sets have the same P5 and P7 sequences, but the corresponding sequencing primer sequences are swapped. This means that when you are sequencing Read 1, you are sequencing from the U6 promoter of your Brunello library, but sequencing from the end of the gRNA sequence for your sub-libraries. To sequence all 3 libraries in 1 lane, this will require more NGS cycles than would normally be required and increase the cost of your sequencing run.

Therefore, I would recommend that you redesign the primers for your sub-libraries using the format you have for the Brunello library, in order to sequence the gRNA target region only in Read 1 (See Joung Nat Protocols 2017 for more details). You will need to include additional barcodes for your sub-libraries, but since you will likely be sequencing your sub-libraries at lower depth (fewer reads), you will not need to design stagger primers, so you only need to design 1 Fwd primer and 2 Rev primers for your sub-library.

Best,
Julia

To unsubscribe from this group and stop receiving emails from it, send an email to cri...@googlegroups.com.

dora.t

unread,
Apr 9, 2019, 12:44:32 PM4/9/19
to Genome Engineering using CRISPR/Cas Systems
Hi Julia,

I have another two questions regarding the NGS of gRNAs.

1. I was about to order the primers and since they are pretty costly I wanted to know whether I could use the same primers for gRNA fold change evaluation via NGS at the end of the screening?
If so, then I would need to order, in addition to the above mentioned primers, an extra 7 P7 rev primers with different barcodes, is that correct? (reading the Joung Nat Protocols 2017)

2. Regarding the NGS of the gRNA library to determine representation, a read length of 50 bp single-read on the MiSeq at 100 reads/gRNA coverage would be the right approach?

Many thanks in advance.
Kind regards,
Dora

Julia Joung

unread,
Apr 12, 2019, 10:54:37 PM4/12/19
to dora.t, Genome Engineering using CRISPR/Cas Systems
Hi Dora,

1. Yes that is correct - the same set of primers are used for sgRNA library QC and library readout. You will need to order different rev primers for each of your screening conditions.

2. A read length of 50bp will be too short to cover the sgRNA target sequence. For these primers, I believe you will need >76bp, so I usually sequence 80bp.

Best,
Julia

To unsubscribe from this group and stop receiving emails from it, send an email to crispr+un...@googlegroups.com.

dora.t

unread,
May 29, 2019, 11:46:00 AM5/29/19
to Genome Engineering using CRISPR/Cas Systems
Hi Julia,

Thank you for all your helpful answers.
I got my sequencing data back and now I would need to run the count_spacers.py script to check the representation of my gRNA in the library.
I noticed that I would need a fastq file as input file, however I got my sequencing data as bam files. Would this also work or I should first perform some conversion and then run the script?
Sorry for the beginner questions but I just started learning command line and python so all this bioinformatics is still new to me.

Thank you very much in advance.
Dora


sâmbătă, 13 aprilie 2019, 04:54:37 UTC+2, Julia Joung a scris:
Hi Dora,

1. Yes that is correct - the same set of primers are used for sgRNA library QC and library readout. You will need to order different rev primers for each of your screening conditions.

2. A read length of 50bp will be too short to cover the sgRNA target sequence. For these primers, I believe you will need >76bp, so I usually sequence 80bp.

Best,
Julia

Julia Joung

unread,
May 30, 2019, 3:59:37 PM5/30/19
to dora.t, Genome Engineering using CRISPR/Cas Systems
Hi Dora,

The bam file is an alignment file that is created from the fastq files,  so you may already have the fastq files somewhere - maybe ask your sequencing core? To get the fastq files from the Illumina output, the fastest way is to run bcl2fastq to generate fastqs and demultiplex your sequencing run at the same time.

Best,
Julia

To unsubscribe from this group and stop receiving emails from it, send an email to crispr+un...@googlegroups.com.

To post to this group, send email to cri...@googlegroups.com.

dora.t

unread,
Jun 5, 2019, 1:31:14 PM6/5/19
to Genome Engineering using CRISPR/Cas Systems
Hi Julia,

Thanks a lot for the previous tips. I contacted the facility but they could only provide the bam files so in the end I used the bedtools bamtofastq function to convert my bam files to fq.
I tred to run the count_spacers.py script but I ran into some other issues this time.
I tried adjusting the KEY REGION START since I used different primers than the ones from your paper (see above the first thread in our conversation) but I get very low guide detection (see statistics below).

Number of perfect guide matches: 15
Number of nonperfect guide matches: 301
Number of reads where key was not found: 7291857
Number of reads processed: 7292173
Percentage of guides that matched perfectly: 4.7
Percentage of undetected guides: 100.0
Skew ratio of top 10% to bottom 10%: Not enough perfect matches to determine skew ratio

This is the command I ran:  python count_spacers.py -f BSF_0622_H72K5BBXY_1_Brun_73178_S52986.fq -o library_count.csv -i Brunello_list.csv
(without the no-g since I have the G appended at the 5' end of the guide sequence).

I also tried looking for specific guide sequences with grep and I could find many hits (see below). Given that I can see these hits and they basically represent the stagger+vector+KEY+guide sequence (I highlighted the first few hits in color) I realized that given the different lengths of the stagger the KEY REGION START varies. Do you have any suggestions on how I could adjust the START REGION such that I catch all the guides or am I missing something?

grep "CAGGGAAGAAATCACAACCA" BSF_0622_H72K5BBXY_1_Brun_73178_S52986.fq

TGCACCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAAT
TTGAGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
GCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCA
CTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAA
ACGCAACTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAA
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
AGCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGC
CAACTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAG
CAACTTGTGGAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGC
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
AGCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGC
GCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCA
TGCACCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAAT
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
GCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCA
GAAGACCCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAA
CAACTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAG
GCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCA
TGCACCTTGTGGAAAGGACGCAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAAT
TGCACCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAAT
GAAGACCCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAA
ACGCACCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAA
TGCACCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAAT
GAAGAGCCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAA
ACGCAACTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAA
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
TTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGCAAG
AGCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGC
CAACTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAG
AGCTTGTGGAAAGGACGAAACACCGCAGGGAAGAAATCACAACCAGTTTTAGAGCTAGAAATAGC

Regarding the sub-libraries I can't find any guide by using grep so I guess there must be something more problematic/complicated. 

I highly appreciate if you have any suggestions and thank you once again for all tips.
Dora

joi, 30 mai 2019, 21:59:37 UTC+2, Julia Joung a scris:
Hi Dora,

The bam file is an alignment file that is created from the fastq files,  so you may already have the fastq files somewhere - maybe ask your sequencing core? To get the fastq files from the Illumina output, the fastest way is to run bcl2fastq to generate fastqs and demultiplex your sequencing run at the same time.

Best,
Julia

dora.t

unread,
Jun 12, 2019, 4:46:05 AM6/12/19
to Genome Engineering using CRISPR/Cas Systems
Dear Julia,

In the meantime I figured this one out by trying different starting positions for the KEY region and it got me 92% perfect match with a skew ratio of 4.8 :) This made me very happy.
Number of perfect guide matches: 5676882
Number of nonperfect guide matches: 517765
Number of reads where key was not found: 1097526

Number of reads processed: 7292173
Percentage of guides that matched perfectly: 91.6
Percentage of undetected guides: 0.4
Skew ratio of top 10% to bottom 10%: 4.821428571428571

Regarding the sub-libraries I still have the problem that I can't find the guide sequences by using grep for a specific guide. Do you think maybe something major went wrong with sequencing?
Thanks again.

Dora

Julia Joung

unread,
Jun 16, 2019, 6:46:06 PM6/16/19
to dora.t, Genome Engineering using CRISPR/Cas Systems
Hi Dora,

If you have trouble finding the guide sequences using grep for the sub libraries, then something may have gone wrong with either the sequencing library preparation, sequencing quality, or the guide library generation. The problem would not be due to differences in NGS analysis.

Best,
Julia

To unsubscribe from this group and stop receiving emails from it, send an email to crispr+un...@googlegroups.com.

To post to this group, send email to cri...@googlegroups.com.

Sarah P.

unread,
Jun 19, 2019, 2:01:47 PM6/19/19
to Genome Engineering using CRISPR/Cas Systems
Hi Julia, 

Thank you so much for your continued help with these complicated experiments! 

I have a very quick question-- do you recommend all primers (from Joung Nat. Protocols 2017) are PAGE purified? Given the large number of PCRs, it would be quite expensive to use PAGE purified primers. If this step is not necessary, that would be great for us.

Thanks again,
Sarah


On Sunday, June 16, 2019 at 6:46:06 PM UTC-4, Julia Joung wrote:
Hi Dora,

If you have trouble finding the guide sequences using grep for the sub libraries, then something may have gone wrong with either the sequencing library preparation, sequencing quality, or the guide library generation. The problem would not be due to differences in NGS analysis.

Best,
Julia

Julia Joung

unread,
Jun 23, 2019, 10:44:59 PM6/23/19
to Sarah P., Genome Engineering using CRISPR/Cas Systems
Hi Sarah,

No problem! We do not order PAGE-purified primers. All of our primers (including NGS) are standard desalted primers.

Best,
Julia

To unsubscribe from this group and stop receiving emails from it, send an email to crispr+un...@googlegroups.com.

To post to this group, send email to cri...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages