MOSAICS for paired end data

98 views
Skip to first unread message

SR

unread,
Aug 15, 2012, 9:22:51 PM8/15/12
to mosaics_u...@googlegroups.com
Hello,

I would like to try MOSAICS on my dataset. I have paired-end ChipSeq data. How can I use them? How should I change the scripts to take this into account?

Thank you!

Regards,

Sigrid

Dongjun Chung

unread,
Aug 15, 2012, 11:04:22 PM8/15/12
to mosaics_u...@googlegroups.com
Hi Sigrid,

Thanks for your interest in our mosaics package.

When constructBins() function in the current version of mosaics package in bioconductor processes aligned read files, it assumes SET ChIP-seq data. In order to handle PET ChIP-seq data, I wrote a perl script to process PET aligned read files to bin-level files. Currently, it supports eland result and SAM file formats. If your file is in one of these file formats, I can send you this perl script and you can just utilize this perl script. Otherwise, I can modify the perl script so that it can handle your file format. Please just let me know which is your case.

Briefly, for SET data, mosaics extends each read to its 3' end by average fragment length. For PET data, each paired reads contributes to the positions that are located between two ends of the paired reads.

I plan to incorporate this perl script into the mosaics package soon. So, in the next version of mosaics package, constructBins() function will be able to process PET aligned read files as well.

Thanks,
Dongjun

Sigrid Rouam

unread,
Aug 15, 2012, 11:57:21 PM8/15/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

thank you very much for your reply.
Our data consists of PET SAM files aligned using BWA.
I would be grateful if you could send me your scripts.

Regards,

Sigrid



2012/8/16 Dongjun Chung <dongju...@gmail.com>

Dongjun Chung

unread,
Aug 15, 2012, 11:59:48 PM8/15/12
to mosaics_u...@googlegroups.com
Hi Sigrid,

This sounds great. Then, I can think that you can use the script. Just to make sure, can you provide the first few lines of your file?

Thanks,
Dongjun

Sigrid Rouam

unread,
Aug 17, 2012, 3:00:23 AM8/17/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

we are having problems on our work server for the moment. I am not able to retrieve the SAM files. I hope the IT department will fix this issue soon, so that I can send you the first lines of my data!

Thank you.

Sigrid Rouam

unread,
Aug 17, 2012, 5:58:14 AM8/17/12
to mosaics_u...@googlegroups.com
Hello Dongjun,

here is the first few lines of my SAM file.
I would be glad to be able to try your normalization on it!

Thank you.
Regards,

Sigrid

@SQ     SN:1    LN:249250621
@SQ     SN:2    LN:243199373
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
@SQ     SN:13   LN:115169878
@SQ     SN:14   LN:107349540
@SQ     SN:15   LN:102531392
@SQ     SN:16   LN:90354753
@SQ     SN:17   LN:81195210
@SQ     SN:18   LN:78077248
@SQ     SN:19   LN:59128983
@SQ     SN:20   LN:63025520
@SQ     SN:21   LN:48129895
@SQ     SN:22   LN:51304566
@SQ     SN:X    LN:155270560
@SQ     SN:Y    LN:59373566
@SQ     SN:MT   LN:16569
@SQ     SN:GL000207.1   LN:4262
@SQ     SN:GL000226.1   LN:15008
@SQ     SN:GL000229.1   LN:19913
@SQ     SN:GL000231.1   LN:27386
@SQ     SN:GL000210.1   LN:27682
@SQ     SN:GL000239.1   LN:33824
@SQ     SN:GL000235.1   LN:34474
@SQ     SN:GL000201.1   LN:36148
@SQ     SN:GL000247.1   LN:36422
@SQ     SN:GL000245.1   LN:36651
@SQ     SN:GL000197.1   LN:37175
@SQ     SN:GL000203.1   LN:37498
@SQ     SN:GL000246.1   LN:38154
@SQ     SN:GL000249.1   LN:38502
@SQ     SN:GL000196.1   LN:38914
@SQ     SN:GL000248.1   LN:39786
@SQ     SN:GL000244.1   LN:39929
@SQ     SN:GL000238.1   LN:39939
@SQ     SN:GL000202.1   LN:40103
@SQ     SN:GL000234.1   LN:40531
@SQ     SN:GL000232.1   LN:40652
@SQ     SN:GL000206.1   LN:41001
@SQ     SN:GL000240.1   LN:41933
@SQ     SN:GL000236.1   LN:41934
@SQ     SN:GL000241.1   LN:42152
@SQ     SN:GL000243.1   LN:43341
@SQ     SN:GL000242.1   LN:43523
@SQ     SN:GL000230.1   LN:43691
@SQ     SN:GL000237.1   LN:45867
@SQ     SN:GL000233.1   LN:45941
@SQ     SN:GL000204.1   LN:81310
@SQ     SN:GL000198.1   LN:90085
@SQ     SN:GL000208.1   LN:92689
@SQ     SN:GL000191.1   LN:106433
@SQ     SN:GL000227.1   LN:128374
@SQ     SN:GL000228.1   LN:129120
@SQ     SN:GL000214.1   LN:137718
@SQ     SN:GL000221.1   LN:155397
@SQ     SN:GL000209.1   LN:159169
@SQ     SN:GL000218.1   LN:161147
@SQ     SN:GL000220.1   LN:161802
@SQ     SN:GL000213.1   LN:164239
@SQ     SN:GL000211.1   LN:166566
@SQ     SN:GL000199.1   LN:169874
@SQ     SN:GL000217.1   LN:172149
@SQ     SN:GL000216.1   LN:172294
@SQ     SN:GL000215.1   LN:172545
@SQ     SN:GL000205.1   LN:174588
@SQ     SN:GL000219.1   LN:179198
@SQ     SN:GL000224.1   LN:179693
@SQ     SN:GL000223.1   LN:180455
@SQ     SN:GL000195.1   LN:182896
@SQ     SN:GL000212.1   LN:186858
@SQ     SN:GL000222.1   LN:186861
@SQ     SN:GL000200.1   LN:187035
@SQ     SN:GL000193.1   LN:189789
@SQ     SN:GL000194.1   LN:191469
@SQ     SN:GL000225.1   LN:211173
@SQ     SN:GL000192.1   LN:547496
HWI-ST740_0001:1:1101:1222:2116#0       83      5       159866484       60      92M     =       159866412       -164    AGAGCGTGGAGCTGACCCTCATGTTAGTGTGATAGATTTCTAGTCACGTGTACCCACCATGAGTACAGTCAGATGCTTTAAAATTTTAGTTT    GGHGEHHGFGFFGGCGG?D
CDF?FE;FHHHHHFHHHHGBGGGFBDDDDHHECHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHFHHHHH    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:92
HWI-ST740_0001:1:1101:1222:2116#0       163     5       159866412       60      88M     =       159866484       164     CCCGTTAGACTCTTCCTGAGGTCCCCGGAGTGGCACAGGGGGTTGTGGTGGAGAGTGAAGCGAAGAAGTCACAGAGCGTGGAGCTGAC        HHHHHHHHHHHHHHHHHHH
HHHHHHGGGFGDFHHGHHHHHHHDHHCFGAFGEEEECGFEGHHHHHEHHCFHHHHHHEHBHHFGBEHHB        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1095:2131#0       99      19      41945325        60      92M     =       41945428        191     CATGNTTTTGAGTTTTCTGTAAAAATTCGTCAATCTCCCACTAACTTTCCTTACATTCATCCTTCGGTCTTCTAAGACCAAATCCCACATCC    CACC%DDDDDHHHHHHHHH
HHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHFHHHHHHEHHBGFF;GGGGGGHGHGHHEGHHH    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4C87
HWI-ST740_0001:1:1101:1095:2131#0       147     19      41945428        60      88M     =       41945325        -191    CTTCCAAGCTCTCTTATCCTACAGGCCCCCCCCTTGCCTGTCCATTAGCTCCTTGTGTCATCCCTCCCTCTGATTGGTTAGTCGCGTC        ?D=D@AA>CAD?DC>B>D@
>@>;EEEEED)DDDGGCGEHHBHHHHHHHHBHHHHHHHGHHFHHHFHHHHHHHHHHHHHHHHHHHHHHH        XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:29A58
HWI-ST740_0001:1:1101:1195:2138#0       99      16      23169966        60      92M     =       23170207        329     CCCTATGTTAAGCCCCATGCTTAAGCTTACCCAGCTTTTGCTGCATCTGATGCCTCCAATTCCCGACTATGTTAGTTGGGGCTGTGTGATGG    HGHHHHHHHHHHEHHHHHH
HHGHHHHHHGFHHHHGHHHHHHHHHFHHHGHFHHEHHHFEHHFHFHHHHHHGFEHHFGGFGHHHHDFEGGGGG    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:92
HWI-ST740_0001:1:1101:1195:2138#0       147     16      23170207        60      88M     =       23169966        -329    TCAGCTGCCAGTGTGGTTAGAATAAAAGCAGGCCGCAGAACGTGGAAGGACTTGACTTGCTGAGTCTTCCGGCCTTCATCTTTCTCCC        <>4>HHGFHHHHHHHHHHG
HHHHHHHFHHHHHHIGIHGFDHFHHHHHFHEHHEHEHHHEHFHHHHHHHHHHFHHFEHHHHHHHHHHHH        XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:36C51
HWI-ST740_0001:1:1101:1090:2159#0       83      14      102457791       60      92M     =       102457573       -310    AAACTCTGTCTTAAAAAAATTTTTTTTATTGAAAAATAGATTGCTGAGTAGAAATGAAACCTTTCTTTCACAGAATGTCGTTCATGANCTAA    CFCGGDGHHEHHFFFFEBA
??CBEA6ACDFHEGHHHHFCHEFFFFGGHHEHHEFDDE7EGGGEAFDGHFFHGHDHHEFGGFGAAA@A%?<?<    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:87G4
HWI-ST740_0001:1:1101:1090:2159#0       163     14      102457573       60      88M     =       102457791       310     GGCTGAAGCGGGTGGATCACCTGAGGTCAGGAGTTTGAGACCAGCCTGACCAACATGCTGAAACCCGTCTCTACTAAAAATATATAAT        DCACABEECDFDBFA?BEC
EB>FFDE8D?FE;CAEEF<DEF<DCA9AAA;DEDDEHHBHHDEBEEFH?BCAADAD?BAACFDEB@D:<        XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:43A44
HWI-ST740_0001:1:1101:1221:2162#0       83      9       116274509       60      92M     =       116274458       -143    TCTTGAGACTGAGGATGGTGCTTTAAACAAAGGGTCCAGGGCCTGGCAGCACACATCCCTCTGGGACTGATATGTAGGGCCAGATCAGACAA    HDGHHHBHHGHFHGHEHHB
FGGGDHGHHHHHHFHHHHHHHHHHHHHGHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:92
HWI-ST740_0001:1:1101:1221:2162#0       163     9       116274458       60      88M     =       116274509       143     GAAGTCAGTGATCAAGCTCGGGTGTCTTATCCAACTCAAATTCAGTGATGATCTTGAGACTGAGGATGGTGCTTTAAACAAAGGGTCC        HHHHHHHHHHHHHHHHHHH
HHHBHFHHHHHFHHHHHHHHHHHHHHFHHFHHHHHHHHHFHHHHFHHHHHHHHHHHDBGHFHHCFH?FG        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1125:2168#0       99      12      96944707        60      92M     =       96944869        250     AGTTNGAACCTACAAAAGATTATATCCTAGGAAAAGGTAGACTAGAAAAAAACTCTGGCCTGGGTATGGTGGCTCACGCCTTTTATCCTAGC    BCAC%GGDGDHHHGHHHHH
HHHHHHHHHHHHHHHHHHEHHGHHHFEHHGHHHHEHHHHFHHHHHEEFCFDEGADDFFFFFBIFEF::C:@B<    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4A87
HWI-ST740_0001:1:1101:1125:2168#0       147     12      96944869        60      88M     =       96944707        -250    AAACCATGACTCTGCAAAAAAAATACAAAAATTAGCTGGGTGTGATGACATGTGCCTGTAGTCCTAGCTACTTGGGAGGATCGCCTGA        HHHHHGHHEGGGGFEHHHH
GHHHHHHHHHHHHHHHHHHHGHHHHHFHHHHHHHFHHHFHHHHHHFHHHHHHHHHHHHHHHGHFHHHHH        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1155:2173#0       99      10      5727936 60      92M     =       5727949 101     GTCTNCAGTGTTCCCTGAATAAAACTAGGGGTGAGCCTGGCGGTAATCCTGCCTAGGAGGTTTATAGCAAAGCAGTGTCCCCTAACGGTGCA    CBCC%DDDDDHHHHHHHGFHHHHHHHGHHHHEHHH
EHHHHDHHDFHFHHHHHHFHHHHHHDHFEFFGECFCFFFEHGFHHHHHHGGHHFHGH    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4A87
HWI-ST740_0001:1:1101:1155:2173#0       147     10      5727949 60      88M     =       5727936 -101    CCTGAATAAAACTAGGGGTGAGCCTGGCGGTAATCCTGCCTAGGAGGTTTATAGCAAAGCAGTGTCCCCTAACGGTGCAAGAATACCC        EE>EHFHFED9EEDHHHAFFHEHFFFGHD9GGFEG
GDEHHHHHHHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHGDGHI        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1099:2182#0       99      2       38974888        60      92M     =       38975118        318     GTATNATCCAATCTGAAGCCAGAAAAGTTTATAGCAACATGCCTACCCAGAGTGTTGAGTAGAACTGTATTGCTTGGGACCCTTGTCTACCA    C?BC%DDDDDHHHHHHHHH
HHHGHHHHEHHHHHGHHHHHHHFHHHHHHHHFG?GEHHFHFFHFDHFHFBFHHHFFHHHHHHHHHHHHEHHEH    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4C87
HWI-ST740_0001:1:1101:1099:2182#0       147     2       38975118        60      88M     =       38974888        -318    TGTCCTTAAGACACTCTACTGATAAGATGTGAACAATCCAAACACAAGTAAGGAAAAAAAGTGTTACATACTGGAAATACCTCGATCC        HHHHHFHHGHEHCHGHHHH
HHHHHHHHEHHHHHHHHHHFHHHHFHHHHHHHHHHHFHHHHFHFIGFGFFFHHHHHHHHDHHHHHHHHH        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1067:2205#0       83      8       96090111        60      12S80M  =       96089991        -200    ATGAGCAAAAGGCAAGCCAGTCTAGGTGCATGGCCAGCCCCCCGAGATGGAGCTCATGGTGAGGGACCAGCCAAGTCAATCACATCTNCCTT    %%%%%%%%%%%%%@;7@C<
BBCDCFEECDC5CCFCC4HHHHHDFDG?FHCFEEEEBFDCDDBGFF@FFHHHFHHHHHHHHHFDCDCD%AAAB    XC:i:80 XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:75G4
HWI-ST740_0001:1:1101:1067:2205#0       163     8       96089991        60      88M     =       96090111        200     TTGTAAAGGTCATGAGAAGTGCCTGTCCCTCCAAGGGTGCAAGGGTGACCAGAAGCAAGCTCTTAGGCATGGGTGGGCTTGCTGTGGA        HHHHHHHHHHHHHHHHHEH
FHHHHHFHHHHHHHHHFHDHHFEHHHBFEFFGGGIGHHHHHHGHHHHEHHHHHHDHDGGEEHHBHCGFF        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1092:2208#0       99      16      23963265        60      92M     =       23963410        233     GGTGNTGACCTGTTTGTCTCCCTGGCTGTCAGACTGTGAGAAACTTGAGGGCAGGGGCTGTACTTTATTCTTCTCTGTCTCTCTAGGGACTA    >86;%8>===FHFHHHECH
FFEE?FFBFEEFHFEFEEHHH?DCD+FACBDF<EFECFE9C9=C?CD5C<0C<CC:6@0><DDDE<FCBDFCB    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4G87
HWI-ST740_0001:1:1101:1092:2208#0       147     16      23963410        60      88M     =       23963265        -233    ACGTGGTATAAAACAGGTCTTGTGACTGAATGAGGGAATGAATGAAAAAGCCCTGTCCTGCCCTCCCTGCAGAGATGGGGTCTGCTGC        @EAD>C?E<?DAADC==:H
EDDFDGFEFDCC9D@;@B75BDFBC@DFD?B?FFBFFFFDDDDD9D@FBHHBHEFGACGC@BDEFFDFG        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88
HWI-ST740_0001:1:1101:1177:2226#0       99      1       182615131       60      92M     =       182615381       338     TTTGCAAGCTGTATCTTTCTTAATGGCAGTTCCACTTCAGAAGTCATGAAGAACATCACCCAAGACTCTTTCTGGAAATGTAAGGTTAGGGT    HHHHHHHHHHHHHHHHHHH
HHHGHHHHHG9DADADCFDBFEFE?FHHFBHFCHCHHEHFCD<DFDCAFEEGEBF>CBGEGDDFFE@E7?E@3    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:92
HWI-ST740_0001:1:1101:1177:2226#0       147     1       182615381       60      88M     =       182615131       -338    TCTCTCCTAAGCTGTGAGATCCCAGTGAGCAGAGCTGGCCACTACAAATCACAGAGTTGAGCCTAAGTCTTTCAGCTATTCCTCCCAG        <=7D@D?DE:6EBG<BDF@
<DD69EEEE<CEEDBAIFGGGCGBEGFGFEGEGFHHHHFHHHHHGGGGBGHHFHHFHHHHEGHHHHHHH        XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:63C24
HWI-ST740_0001:1:1101:1066:2233#0       99      11      122934086       60      92M     =       122934250       252     ACGANAGACCATACCGACGCTTTGAAAACGTGACTGAGCTGGTGCACAAAACAATGGTGCCTAGAATTTTGAATTGTGCAATGTTGTGCATG    BABA%DDDDDGHHHHHHHH
HHHHHHHFHHHHHHHHHHHHHHHEHHCHHFHEHHHHHHEHFCEEGDFFFFHHFHHHHHHHHHHHEHHFHHFHH    XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:4C87
HWI-ST740_0001:1:1101:1066:2233#0       147     11      122934250       60      88M     =       122934086       -252    AAGCAGGCGAGGTAGCTGTTATTCCTATTTAGTAGATAAATAGGCACCGATGGGGGAAAAGATTTTTTAAGGCTACACCAAAAGAACT        HHHHHHHGDHHHHHHHGGH
FHHHGHHHHDBHHHHHHHHHH@HHHHFHHHHHHHFHHHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHHH        XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:88




2012/8/17 Sigrid Rouam <sigrid...@gmail.com>

Dongjun Chung

unread,
Aug 18, 2012, 6:14:57 PM8/18/12
to mosaics_u...@googlegroups.com
Hi Sigrid,

Thanks for providing the example lines. Your sam file looks nice and it should work well with my perl script to process PET data. I attached the perl script to process PET data here.

Usage:

> perl process_readfiles_PET.pl [infile] [outdir] [format] [binsize] N N NULL

[infile]: name of aligned read file
[outdir]: directory that the bin-level file is exported to
[format]: format of aligned read file; either 'eland_result' or 'sam'
[binsize]: bin size

For example,

> perl process_readfiles_PET.pl example.sam . sam 150 N N NULL

Then, using the bin-level file obtained from this perl script, you can analyze your data using the following command lines:

> bin <- readBins( type = c("chip","input"),  fileName=c("chip.txt_bin150.txt","input.txt_bin150.txt") )
> fit <- mosaicsFit( bin )
> peak <- mosaicsPeak( fit )
> export( peak, type="bed", filename="peak.bed" )

Please just let me know if you have any questions.

Thanks,
Dongjun

Sigrid Rouam

unread,
Aug 19, 2012, 1:42:39 AM8/19/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

Thank you very much. I will try. 

Regards, 
Sigrid


--
---------------
Sigrid ROUAM
+65 8168-7656

Sigrid Rouam

unread,
Aug 20, 2012, 9:13:29 PM8/20/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

I would like to try your perl script, but I could not find it in attachments. Could you send it again?
I have another question regarding the preprocessing of the mappability and GC content files into bin level files. What parameters should I use for the PET data?

Thank you!

Sigrid


2012/8/19 Dongjun Chung <dongju...@gmail.com>

Dongjun Chung

unread,
Aug 21, 2012, 7:57:58 PM8/21/12
to mosaics_u...@googlegroups.com
Hi Sigrid,

I put the perl script to process PET ChIP-seq data, "process_readfiles_PET_v2.pl", here:

https://www.dropbox.com/sh/rt2s3bv3b9mfto0/wfjRUaXQrl

Regarding the second question, at this point, we did not develop a script to process mappability and GC content scores for PET data yet. Although it is not perfect, you can use mappability and GC content scores for SET data as an approximation. You can also try two-sample analysis without using mappability and GC content if you have matched control sample. I will let you know if we have any updates regarding this issue.

Thanks,
Dongjun

Sigrid Rouam

unread,
Sep 5, 2012, 2:15:25 AM9/5/12
to mosaics_u...@googlegroups.com
Great.
Thank you!

Sigrid


2012/8/22 Dongjun Chung <dongju...@gmail.com>
Reply all
Reply to author
Forward
0 new messages