Questions about bam files and error in genomicregion constructor

Yunzhou Yang

unread,

Dec 3, 2019, 5:30:02 AM12/3/19

to STITCH imputation

Hi, robert.

Thanks for your stitch software.

Recently i have a question about the input bamfile. which kind of bam files should we input to STITCH.

I get unsorted.bam file after aligning fastq files to Referecne. Then in 3 steps i created final bams: from unsorted.bam to sorted.bam (sorted by samtools), then to sorted_RG.bam (add read groups), then finally to sorted_RG_markduplicates.bam (markduplicated by Picard).

Is it ok if i just use sorted.bam(not add read groups, not mark duplicates)?

Another question, i got an error (see picture) and similar error was found here. but i do have SNPs in imputed region. The error is "Error in get_sampleReadsRaw_from_SeqLib(useSoftClippedBases = useSoftClippedBases, : GenomicRegion constructor: Failed to set region for NC_006103.5:499705-2844789".Sometimes i found minus positions like this "Error in get_sampleReadsRaw_from_SeqLib(useSoftClippedBases = useSoftClippedBases, :GenomicRegion constructor: Failed to set region for NC_006115.5:-266-5108081". -266 position? Impossible.

Any idea?

Thanks a lot.

best

yunzhou

stitcherror_01.png

Yunzhou Yang

unread,

Dec 3, 2019, 5:35:08 AM12/3/19

to STITCH imputation

Sorry for another question.

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Seems STITCH can not find files needed.

best

yunzhou

在 2019年12月3日星期二 UTC+1上午11:30:02，Yunzhou Yang写道：

stitcherror_02.png

Robbie Davies

unread,

Dec 3, 2019, 9:51:04 AM12/3/19

to Yunzhou Yang, STITCH imputation

Hi,

re: input bam files, ideally in most bioinformatic pipelines, you'd add read groups at the start then do alignment, merging, sorting, indexing and duplicate read removal, I believe. In any case, STITCH should only really care about sample name, and I believe allows arbitrary read group combinations within a single bam file, where that single bam file is a single sample with a single SM tag. Whether you remove read duplicates depends on your precise library construction method, and isn't done (I think?) for GBS type approaches, but would be done most of the rest of the time

Are your bams sorted when you saw the error message below? The GenomicRegion constructor should be able to take negative values (for example I just confirmed on some test data that a (fake) chromosome named "rep_1.1:-589-668" works for GenomicRegion constructor). Also, weird that would happen on one bam, but not others? Is there anything different about the header of that file? Can you re-index it? Can you figure out exactly what BAM(s) through the process of elimination? (run STITCH on each bam for one iteration, see when it crashes)

For the other message, did you run out of temp space? Can you set the tempdir to somewhere with more storage and see if that message goes away?

Best,

Robbie

--
You received this message because you are subscribed to the Google Groups "STITCH imputation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stitch-imputat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stitch-imputation/58897929-3070-4580-a7b3-cf77ec597f2d%40googlegroups.com.

Yunzhou Yang

unread,

Dec 4, 2019, 3:38:36 AM12/4/19

to STITCH imputation

hi, robert.

thanks for your quick answer.

i will re-index or run pipline per bam sample to check where problematic bam is.

For second error, i will set tmp dir to a new place with much more storage space. Hope it works and will come back to tell the results.

best

yunzhou

在 2019年12月3日星期二 UTC+1下午3:51:04，Robert Davies写道：

To unsubscribe from this group and stop receiving emails from it, send an email to stitch-i...@googlegroups.com.

Yunzhou Yang

unread,

Dec 9, 2019, 4:34:04 AM12/9/19

to STITCH imputation

Hi, come to report progress.

The first error might be because of assembly versions. i aligned fastq files to 2 different assembly versions, but they have different coordinates. That's why stitch throw out errors. And i used bam files by sort, add read groups and mark duplicates.

For the second error, it is because of memory limit. This time i gave more memory to the biggest chromosome. Finally, i works fine now.

sorry to bother and thanks again.

yunzhou

在 2019年12月4日星期三 UTC+1上午9:38:36，Yunzhou Yang写道：

Reply all

Reply to author

Forward