Inquiry Regarding Umap and Bismap

19 views
Skip to first unread message

許靖

unread,
Jun 12, 2024, 1:25:58 AM6/12/24
to Ubismap
Dear Author,

I hope this message finds you well.

I have several questions regarding Umap and Bismap and would appreciate your assistance:

    When checking kmers, did you use hg38 with soft-masking (lower case) or hard-masking (N, analysis_set)?
    Did you utilize any decoy, alt, or patch contigs?

You have provided k250 and k150 results to T2T, but hg38 results only go up to k100. Would you be willing to provide additional k-mer to cover common sequencer read lengths, specifically:

    300
    250
    150
    100
    75
    50
    36
    24

Please notify me once these updates are available in the UCSC Table Browser or https://bismap.hoffmanlab.org/.

Thank you for your assistance.

Best regards,

Ching Hsu

Eric Roberts

unread,
Jun 13, 2024, 11:38:08 AM6/13/24
to Ubismap
Hi,

I'm not the original author of the software, but I'm doing my best to maintain it currently.
As far as I know only hard-masking 'N' was used when checking k-mers.
I don't believe any decoy alt or patch contigs were considered.

Work is being carried out to hopefully support a much wider range of kmer lengths for both T2T and hg38. When there's been a big update we will sure to update on here as well.

Hope that helps!

Eric

許靖

unread,
Jun 13, 2024, 10:28:56 PM6/13/24
to Ubismap
I found the following links on the forum:
https://bismap.hoffmanlab.org/raw/uint/hg38all/umap/bedFiles/k150.GRCh38.Umap.bed.gz
https://bismap.hoffmanlab.org/raw/uint/hg38all/umap/bedFiles/k200.GRCh38.Umap.bed.gz
https://bismap.hoffmanlab.org/raw/uint/hg38all/umap/bedFiles/k250.GRCh38.Umap.bed.gz

Since https://bismap.hoffmanlab.org/raw/uint/hg38all/umap/bedFiles/ is inaccessible, it may give the impression that it is not available.

On the website, I found the following explanation:
we generated mappability of the hg38 genome without haplotypes but including unlocalized and unplaced contigs using the following kmers.

So, hg38 includes only primary contigs, while hg38all includes unlocalized and unplaced contigs.

There is also the question of the chrY PAR region. It seems hg38 is masked, while t2t is not. Should the PAR region be masked or not?

Also, should chrM be included in the kmer analysis? Would you recommend including it?

Eric Roberts 在 2024年6月13日 星期四晚上11:38:08 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages