Help me evaluate results from glycresoft

48 views
Skip to first unread message

Tung Nguyen

unread,
Apr 4, 2025, 11:47:02 PM4/4/25
to GlycReSoft
Dear Sir/Madam,
Could you explain to me the index results of score, charge count, isotope_fit, spacing_fit and line score? How I can evaluate the removal of false positive components based on the output indexes? The total signal index is the sum of the intensity of the possible adduct forms of that component, right?

Joshua Klein

unread,
Apr 9, 2025, 10:35:18 PM4/9/25
to Tung Nguyen, GlycReSoft
The description of the scores in mathematical detail are given in the supplementary materials from the 2016 publication in Bioinformatics at https://academic.oup.com/bioinformatics/article/34/20/3511/5001382 in section S3. In general, there are three kinds of IDs that can occur with a glycan composition:

1. The glycan is assigned to a chromatographic feature that supports the assignment, with several of these scores approaching 1.0
2. The glycan is assigned to a chromatographic feature that doesn't support the assignment with the glycan composition dependent scores being low but the chromatographic feature scores like line_score and spacing_fit being high. These may be a mismatch between the experimental design and the glycan-dependent features (e.g. your glycans have a charge-boosting tag on them so GlycReSoft's charge state model doesn't fit anymore), or lower quality assignments. Given that chemists are free to do whatever they like with their samples and I only had a half-dozen samples to learn models from, I can't easily adapt this for new data. These may be real analytes, but not good fits for the glycans if the models hold.
3. The glycan is assigned to a low quality chromatographic feature with line_score, spacing_fit, and isotope_fit being less than 0.5. These IDs are likely incorrect in general and will have low scores.

I provide the following as a rule of thumb in the supplementary material:
image.png
Supposing that the scores were all completely random though, following a uniform distribution, then the sum of five uniform distributions approaches a normal distribution so I can fall back on the probability of a total score being large given its components were random. This is described in supplementary section S10, which provides some intuition and a graphical example. 
image.png

GlycReSoft also attempts to provide an "untargeted" analyte extraction view of the data where a glycan isn't known for all chromatographic features but they are still scored. These features are scored separately and an average model is assumed, but they are still governed by the same interpretation as above.

Yes, the total signal is the sum of intensities over time over all adducts of the glycan composition. The per-adduct total intensity is also given. This isn't the same as the integrated abundance, which I think is exported separately.

If this wasn't a clear enough answer I can try to go into more detail.

On Fri, Apr 4, 2025 at 11:47 PM Tung Nguyen <nqtbach...@gmail.com> wrote:
Dear Sir/Madam,
Could you explain to me the index results of score, charge count, isotope_fit, spacing_fit and line score? How I can evaluate the removal of false positive components based on the output indexes? The total signal index is the sum of the intensity of the possible adduct forms of that component, right?

--
You received this message because you are subscribed to the Google Groups "GlycReSoft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to glycresoft+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/glycresoft/2bb2abd6-7547-4fc6-8ce2-777a051a9960n%40googlegroups.com.


--
Joshua Klein

Reply all
Reply to author
Forward
0 new messages