Hello,--
I have a couple of questions for the latest precomputed markers.
Which CARD release was used for the Antibiotic Resistance Factors linked on the homepage (https://bitbucket.org/biobakery/shortbred/downloads/ShortBRED_CARD_2017_markers.faa.gz)? Was there any filtering or pre-processing done prior to running shortbred_identify?
Is the cluster/marker mapping available e.g., to map markers to individual CARD sequences?
How was the CARD annotation (ARO numbers) transferred to the markers? For example, does each True Marker (CD-HIT cluster) get all the lowest-level AROs of the member sequences, or is a higher-level ARO that covers all sequences in the cluster assigned to the marker?
What about Junction and Quasi Markers? I noticed some markers have multiple ARO numbers in the marker sequence header.
And a question perhaps better suited for folks familiar with CARD: If one wanted to map ShortBRED markers to a small set of high-level AROs (like GO-Slim), would you recommend "walking up the graph" from the AROs in the marker headers?
Thanks,
Aram
You received this message because you are subscribed to the Google Groups "shortbred-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
When True Markers or Junction Markers are reported, they are associated with a single family among the input sequences, and are identified by one of the headers for that family.
To your last question, if you have broader ABR categories of interest, then I would recommend summing the abundance of individual proteins/families within those categories, following the "is_a" logic of something like GO. I'm not sure if there are accessory files in CARD to facilitate this?
Thanks,Eric
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.
Hi Eric,
Thanks! We should be able to find the identifiers from the marker files with the individual CARD sequences. I see now that the CARD fasta headers include one ARO.
When True Markers or Junction Markers are reported, they are associated with a single family among the input sequences, and are identified by one of the headers for that family.
Is it possible that a single family can include sequences with different AROs? Perhaps unlikely, since sequences with different antibiotic resistance functions would have to be mostly similar in sequence space to be grouped together (e.g., long enzymes with different active sites?). Did you happen to see any such cases when generating markers?
To your last question, if you have broader ABR categories of interest, then I would recommend summing the abundance of individual proteins/families within those categories, following the "is_a" logic of something like GO. I'm not sure if there are accessory files in CARD to facilitate this?
I think ARO.obo or ARO.owl contain the necessary "is_a" or "subClassOf" relationships.
Best,
Aram
On Monday, August 28, 2017 at 8:13:57 AM UTC-7, Eric Franzosa wrote:
Thanks,Eric
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.