Create new taxa in master block from selected

37 views
Skip to first unread message

Santiago Ron

unread,
May 25, 2025, 2:58:03 AMMay 25
to Mesquite Project
Dear all,

It is great to see that a new version of Mesquite is out. 
I have a question regarding an option in version 3.x in the taxa association module that allowed to select unassigned taxa in one taxon block and create those taxa in the master block. The options was "Create new taxa in master block from selected"
Screenshot 2025-05-25 at 8.55.23 AM.jpg
The taxa association module has changed in the new version. I could not find that very useful option there. Any advice will be greatly appreciated. 

Best wishes,
Santiago

Wayne Maddison

unread,
May 25, 2025, 5:42:24 PMMay 25
to Mesquite Project
Thanks, Santiago, I'm happy to get your feedback about this, because we have made some changes, and plan to add a lot more features for 4.1, in which we expect to emphasize gene tree/species tree issues. The current features are not fully settled for 4.0, so we can try to make sure it has what you need.

About the option "Create new taxa in master block from selected" — to be honest, I don't remember exactly what that option did, but when I now try it in 3.81 it seems badly broken. Could you describe in a bit more detail what you want it to do? Is your scenario that you have some contained taxa selected (e.g., genes or parasites) and you want to generate one containing taxon for each (e.g., species or hosts)?

Santiago Ron

unread,
May 26, 2025, 9:10:46 AMMay 26
to Mesquite Project
Thanks Wayne for your prompt response and help solving this issue. 
I use the taxa association module to export a concatenated matrix for several genes (export fused matrix option) using a Master block of taxa. Each terminal in my matrix is one individual. For example, I need to concatenate the 16S sequence of specimen QCAZ 23234 (from one matrix) with the ND1 sequence of the same specimen (from a different matrix). Each matrix has its own taxa block and each taxa block is linked using a taxa association with a "Master" taxa block. 
Of course, I don't want to manually enter the individual names in the Master block of taxa; in 3.x it was possible to select taxon names in the, for example, 16S taxa block and automatically create them in the Master block using "Create new taxa in master block from selected".
I hope it is clearer now. Thanks for your help. 

Santiago

Wayne Maddison

unread,
May 26, 2025, 10:19:05 AMMay 26
to Mesquite Project
Thanks, Santiago. To be honest, I hid that option as I was redesigning the interface, because it wasn't working properly, but then forgot to deal with it. I'll look into it and work with you about resurrecting it properly. (Once David comes back from the field — he knows more about some of these options than I do.)

Your workflow and mine are different, and shows the diversity of ways to manage different genes for a common set of taxa. I've not encountered your use case myself, because I maintain a single block of taxa, which covers the different matrices for the different genes. One way to do it is to drop FASTA files onto existing matrices; another way to do it was do use 3.x's Merge Taxa & Matrices from File. In 4.0 we've enhanced tools for this. For instance, you can read a file with one gene first, then use File>Include & Merge>Quick Merge Taxa & Matrices from File to stitch them together, if the taxon names are consistent. The Process Data Files option can stitch them together (it's designed to do it for 1000s of genes) also. There are more ways than this, including a new taxa block fusion that could be improved if there is interest.

One of the advantages of keeping the different matrices all associated with the same taxa block is that you don't have to worry about divergence in names and metadata. Have you found special advantages of keeping the different genes connected with different taxa blocks? — Wayne

Santiago Ron

unread,
May 27, 2025, 4:01:04 AMMay 27
to Mesquite Project
Hi Wayne,

Thanks for letting me know about alternative ways of managing multiple matrices using a single taxa block. I am not familiar with those options, but I will look into them if necessary. 
The main reason why I prefer to use different genes connected to different taxa blocks is that it allows me to use Mesquite also as a database for sequence data. For example, it is useful to keep track of GenBank accession numbers for each sequence. If I have a taxa block for each gene, I can include the accession number in the taxon name for each gene. I use the Master taxa block to format the name in a publication ready format. The autoassign option allowed me to quickly associate taxa across taxon blocks ( if they share the same voucher number) without the need of having equal names across all matrices. 
Screenshot 2025-05-27 at 9.52.22 AM.jpg
That is the main reason for using multiple taxa blocks. If there are alternative ways of doing something similar, please let me know.
One problem of having multiple taxa blocks was that with a large number of taxa (over 1300) Mesquite3.x became painfully slow. However, that problem seem to be solved with the new version. 

Best wishes,
Santiago

Wayne Maddison

unread,
May 28, 2025, 9:45:47 AMMay 28
to Mesquite Project
That's a clever solution to maintaining a GenBank number database. There's another way to do it, with a single taxa block, via the Has Data in Matrix column or the GenBank Number column of the List of Taxa window. These are a bit hidden, and they depend on the fact that the NEXUS format can store metadata for a taxon related to each matrix separately. Here is a screenshot of my Sanger data master file:
Screenshot 2025-05-20 at 13.48.34.png

I was just about to start describing how to get here, but I just realized that one of the most useful tools (importing Genbank # from a file) is in our private set of modules! Also, since you are the first person to ask about this, very few people have used this, and the user interface needs some improvement.  How about this: Next week I'll get back to programming, and I'll polish the system a bit, and I'll start a new thread here explaining the system. - Wayne

Wayne Maddison

unread,
Jun 6, 2025, 7:36:29 PMJun 6
to Mesquite Project
This discussion motivated us to look into our GenBank number recording system, and then to make some major improvements to it. We're planning to release a beta2  with these improvements within a week. When released I'll come back here to describe it! -- Wayne

Santiago Ron

unread,
Jun 7, 2025, 3:34:33 AMJun 7
to Mesquite Project
Thanks Wayne for looking into this, sounds very useful.
One more question about using a single block of taxa for multiple matrices. Frequently, not all taxa have sequences for all genes. Is there any way to hide the taxa that do not have data in a given matrix? Having a taxa block for each matrix allows to do that. I don't know if that is possible if all matrices share the same taxa block.
Many thanks for your help. 

Best wishes,
Santiago

Wayne Maddison

unread,
Jun 7, 2025, 11:22:22 AMJun 7
to Mesquite Project
No, the matrix shows all of the taxa. Do you find that to be a burden? It's never bothered me, and in fact it reminds me about missing data. But, you may like to optimize different things in your workflow.

For me, having each matrix under a different taxa block would generate so many barriers that seeing blank rows is worth it. Trivially, I can change a species name without having to redo it in every matrix. I assign metadata of various sorts to the taxa, and that plays various roles when working with matrices. For instance, I use taxa groups a lot for sorting, selecting, and selective exporting, but I'd have to set up the groups in the taxa blocks separately for each matrix. With a single taxa block I can have a tree that can interact with all of the matrices.  I use trees for selecting taxa, and a tree can apply to only one taxa block at a time (though, you could copy and paste). For instance, I can decide to export sequences from one matrix only for the selected taxa, and I can have chosen that set of taxa by a clade in a tree.

Wayne Maddison

unread,
Jun 8, 2025, 1:04:08 PMJun 8
to Mesquite Project
We've released the Import GenBank Numbers feature with the new 4.beta2!  I'll write a separate post describing it.

Santiago Ron

unread,
Jul 10, 2025, 12:33:06 PM (3 days ago) Jul 10
to Mesquite Project
Dear Wayne

Many thanks for the detailed explanation on how to add the genbank accession numbers. It seems very useful to organize that type of information.
Thank you also for your email explaining why it is better, under some circumstances, to have a single taxa block for several matrices. I can see the utility specially for selecting taxa based on a tree (I have tried to do that in the past without success).
On the other hand, showing all the taxa is not ideal for me because in some of my matrices, only 10% of the taxa have sequences. Also, having a separate taxa block for each matrix helps me to visualize which matrices lack data by looking at the table with the taxa associations:

Screenshot 2025-07-10 at 5.14.28 PM.jpg

That option is not longer functional in version 4.X because now each column header only shows "Contained taxa" instead of the matrix name. 

Screenshot 2025-07-10 at 5.22.04 PM.jpg

Do you think it would be possible to maintain my work flow under the new version? I found changes in the taxa association functionality. Or may be, all is in there, but under different menus?

Many thanks for your help. 

Santiago

Wayne Maddison

unread,
Jul 10, 2025, 12:51:10 PM (3 days ago) Jul 10
to Mesquite Project
Hi Santiago,
Your table of which taxa have data in which matrix done via Taxa Associations is an unexpected solution! Your whole system of using TaxaAssociations for managing multiple loci is quite interesting. I had not thought of TaxaAssociations being used in this way.

The method designed for keeping track of matrix presence/absence in Mesquite, under the assumption of one taxa block, is the Has Data In Matrix column of the List of Taxa window. You can get columns for all matrices at once by choosing List>Show Columns for all matrices. What you get is this:
Screenshot 2025-07-10 at 09.39.49.png
One advantage of this system is that it displays the GenBank numbers, and you can manage other metadata in those columns.

We'll figure out a way to get your column headings back for contained taxa. We're planning to work on the TaxaAssociations management in a couple of months, and our reforms may address your other needs. If not, we'll see what we can do. I can't promise, though, because yours seems to be a unique approach.

-- Wayne

Santiago Ron

unread,
Jul 10, 2025, 2:51:03 PM (3 days ago) Jul 10
to Mesquite Project
Thanks, Wayne. Mesquite is super-flexible and lets you tackle the same job in several different ways—that’s probably why so many people have gotten so much mileage out of it over the years

Santiago
Reply all
Reply to author
Forward
0 new messages