Per sample abundances for sub-catalogs

8 views
Skip to first unread message

Matthieu Pichaud

unread,
Nov 8, 2025, 12:52:55 AMNov 8
to gmgc-users
Hi,
Are the per sample abundance table available for sub-catalogs?
I am interested by GMGC10.human-gut.95nr.fna.gz in particular.
Many thanks for your time considering this question!
Matthieu

Luis Pedro Coelho

unread,
Nov 10, 2025, 1:13:48 AMNov 10
to Matthieu Pichaud, gmgc-users
Hi Matthieu,

Short answer is that we do not provide it. It would have been a massive undertaking to do it properly, but you can get 99% of the result by filtering the original big abundance file (if that makes sense).

Best, Luis
--
You received this message because you are subscribed to the Google Groups "gmgc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gmgc-users+...@googlegroups.com.

Luis Pedro Coelho | Queensland University of Technology | https://luispedro.org


Matthieu Pichaud

unread,
Nov 10, 2025, 2:12:19 AMNov 10
to gmgc-users

It makes perfect sense.
Thank you for the great insight!

Matthieu Pichaud

unread,
Nov 17, 2025, 3:01:46 PMNov 17
to gmgc-users
Hi, please allow me this follow-up question.

Would you advise to:
- filter from the big abundance file the counts of the genes in the sub-catalog and dismiss the other counts
or 
- sum (or aggregate somehow) the counts of the genes clustered together in the sub-catalog using GMGC.relationships.txt.gz?

Thanks a lot for your guidance!

Luis Pedro Coelho

unread,
Nov 17, 2025, 4:14:17 PMNov 17
to Matthieu Pichaud, gmgc-users
Filter from the big abundance file. Luis

On Tue, 18 Nov 2025, at 6:01 AM, Matthieu Pichaud wrote:
- filter from the big abundance file the counts of the genes in the sub-catalog and dismiss the other counts


Matthieu Pichaud

unread,
Nov 18, 2025, 3:02:49 AMNov 18
to gmgc-users
Hi Luis,
Many thanks for your tome .
I understand why "Filter from the big abundance file" works for the non redundant 100% sub-catalogs,
but for the 95% non redundant, isn't the sum per cluster the right way to go?

Luis Pedro Coelho

unread,
Nov 18, 2025, 11:14:11 PMNov 18
to Matthieu Pichaud, gmgc-users
The abundances were only computed for the 95nr GMGC (what are called unigenes in the manuscript). So sequences that are not present in the nr version don't have an abundance.

The subcatalogs that we provide are subsets of the full catalog (i.e., every sequence in a subcatalog is also present in the big catalog), so the identifiers match.

Best, Luis
--
You received this message because you are subscribed to the Google Groups "gmgc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gmgc-users+...@googlegroups.com.

Matthieu Pichaud

unread,
Nov 19, 2025, 2:22:43 AMNov 19
to Luis Pedro Coelho, gmgc-users
Awesome.
Thanks for taking the time.
Reply all
Reply to author
Forward
0 new messages