combine subsets for human subcatalog

6 views
Skip to first unread message

Ulrike Löber

unread,
Jul 15, 2025, 9:53:48 AMJul 15
to gmgc-users
Dear Luis,
for a new project, we would like to run different human body sites to the same gmgc subset containing all human body sites included in gmgc. Is it possible to pipe the subsets or construct a "all human" subcatalog?

Yours sincerely,
Ulrike

Luis Pedro Coelho

unread,
Jul 16, 2025, 6:38:46 AMJul 16
to Ulrike Löber, gmgc-users
This is the sort of thing that would still be a bit of a hassle a few years ago, but quite trivial nowadays. Yes, you can just concatenate all the sequences together and making sure you are not repeating them, something like

seen = set()
output = open('human-combined.fna', 'wt')
for fna in glob('*human*.fna.gz'):
    for h,seq in fasta_iter(fna):
        if h in seen: continue
        seen.add(h)
        out.write(f">{h}\n{seq}\n")

HTH, Luis
--
You received this message because you are subscribed to the Google Groups "gmgc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gmgc-users+...@googlegroups.com.

Luis Pedro Coelho | Queensland University of Technology | https://luispedro.org


Reply all
Reply to author
Forward
0 new messages