invariant sites in FASTA format

28 views
Skip to first unread message

Zachary Hancock

unread,
Jun 13, 2020, 11:37:56 AM6/13/20
to msprime-users
Hi all,

I'm attempting to print out the full sequence length (including invariant sites) from tree-sequences in FASTA format. We're hoping to use some common phylogenetic methods, and most rely on invariant sites. They can be any symbol (I can just change them in a text file later), but they need to be present. I saw there was some discussion about it here (https://github.com/tskit-dev/tskit/issues/338), but was unsure if a method was ever implemented for it. To just print the variant haplotypes, I've been using:

haps = []
for i in ts.haplotypes():
    haps.append(i)
sequence_IDs = []
for i in range(len(haps)):
    sequence_IDs.append(f'sample_{ts.samples()[i]}_pop_{ts.node(i).population}')
with open('ts_mig01.fas', 'w') as f:
    for i in range(len(haps)):
        f.write(f'>{sequence_IDs[i]}\n{haps[i]}\n')

Thanks in advance for any help!

Zach

Peter Ralph

unread,
Jun 13, 2020, 3:11:20 PM6/13/20
to Zachary Hancock, msprime-users
I don't believe there's been any movement on this, no.
> --
> You received this message because you are subscribed to the Google Groups "msprime-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to msprime-user...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/msprime-users/310a0172-7070-449e-9c5d-9183ed0147deo%40googlegroups.com.

Jerome Kelleher

unread,
Jun 15, 2020, 3:27:37 AM6/15/20
to msprim...@googlegroups.com

No, we haven't done anything around this yet. With msprime 1.0 we'll
optionally have discrete coordinates and so this type of operation will
make more sense.

Please do comment on the GitHub issue though - it helps to know if
people would like a particular feature.

Cheers,
Jerome
Reply all
Reply to author
Forward
0 new messages