Some documentation for the "Predict" feature

177 views
Skip to first unread message

alon_...@alumni.brown.edu

unread,
Dec 19, 2019, 3:09:48 PM12/19/19
to HLAthena
Hello!

I hope all is well!

Thanks so much for making your tool publicly available in such an easy to use manner!

I had a quick question for you:

Below is an example output for the inputs C0303 as the allele and KAFVYDPLL as the peptide:

 seq len model_C0303 MSi_C0303 pRank.MSi_C0303 best.MSi best.MSi_allele
1 KAFVYDPLL   9    specific 0.4259282        3.577604 3.577604           C0303
  assign.MSi_ranks assign.MSi_allele
1               NA           unknown

Could you please provide definitions of each of the columns outputted? I can guess from your publication what each of these mean but it would be useful to have confirmation! Thanks a lot!

Alon

alon_...@alumni.brown.edu

unread,
Dec 19, 2019, 3:28:56 PM12/19/19
to HLAthena
Also it would be very helpful to get some clarity on the meanings of the other fields (e.g. "Context available?", "Aggregate by peptide?", input format requirements for expression values). Thanks again for your help!

alon_...@alumni.brown.edu

unread,
Dec 19, 2019, 3:35:53 PM12/19/19
to HLAthena
In addition would you be able to clarify if expression values affect the outputted scores?

For example, below, I have entered a range of expression values that seem to output the same scores

         seq  expr len model_A0201 MSi_A0201 model_A0203   MSi_A0203
1  KAFVYDPLL 1e-18   9    specific  0.120626    specific 0.002890745
2  KAFVYDPLL 1e-09   9    specific  0.120626    specific 0.002890745
3  KAFVYDPLL 1e-05   9    specific  0.120626    specific 0.002890745
4  KAFVYDPLL 1e-04   9    specific  0.120626    specific 0.002890745
5  KAFVYDPLL 1e-03   9    specific  0.120626    specific 0.002890745
6  KAFVYDPLL 1e-02   9    specific  0.120626    specific 0.002890745
7  KAFVYDPLL 1e-01   9    specific  0.120626    specific 0.002890745
8  KAFVYDPLL 1e+00   9    specific  0.120626    specific 0.002890745
9  KAFVYDPLL 1e+01   9    specific  0.120626    specific 0.002890745
10 KAFVYDPLL 1e+02   9    specific  0.120626    specific 0.002890745
11 KAFVYDPLL 1e+03   9    specific  0.120626    specific 0.002890745
12 KAFVYDPLL 1e+04   9    specific  0.120626    specific 0.002890745
13 KAFVYDPLL 1e+08   9    specific  0.120626    specific 0.002890745
14 KAFVYDPLL 1e+15   9    specific  0.120626    specific 0.002890745
15 KAFVYDPLL 1e+26   9    specific  0.120626    specific 0.002890745
16 KAFVYDPLL 1e+43   9    specific  0.120626    specific 0.002890745
   model_C0303 MSi_C0303 pRank.MSi_A0201 pRank.MSi_A0203 pRank.MSi_C0303
1     specific 0.4259282        6.963677        36.91725        3.577604
2     specific 0.4259282        6.963677        36.91725        3.577604
3     specific 0.4259282        6.963677        36.91725        3.577604
4     specific 0.4259282        6.963677        36.91725        3.577604
5     specific 0.4259282        6.963677        36.91725        3.577604
6     specific 0.4259282        6.963677        36.91725        3.577604
7     specific 0.4259282        6.963677        36.91725        3.577604
8     specific 0.4259282        6.963677        36.91725        3.577604
9     specific 0.4259282        6.963677        36.91725        3.577604
10    specific 0.4259282        6.963677        36.91725        3.577604
11    specific 0.4259282        6.963677        36.91725        3.577604
12    specific 0.4259282        6.963677        36.91725        3.577604
13    specific 0.4259282        6.963677        36.91725        3.577604
14    specific 0.4259282        6.963677        36.91725        3.577604
15    specific 0.4259282        6.963677        36.91725        3.577604
16    specific 0.4259282        6.963677        36.91725        3.577604
   best.MSi best.MSi_allele assign.MSi_ranks assign.MSi_allele
1  3.577604           C0303               NA           unknown
2  3.577604           C0303               NA           unknown
3  3.577604           C0303               NA           unknown
4  3.577604           C0303               NA           unknown
5  3.577604           C0303               NA           unknown
6  3.577604           C0303               NA           unknown
7  3.577604           C0303               NA           unknown
8  3.577604           C0303               NA           unknown
9  3.577604           C0303               NA           unknown
10 3.577604           C0303               NA           unknown
11 3.577604           C0303               NA           unknown
12 3.577604           C0303               NA           unknown
13 3.577604           C0303               NA           unknown
14 3.577604           C0303               NA           unknown
15 3.577604           C0303               NA           unknown
16 3.577604           C0303               NA           unknown

Thanks a lot!

Alon
Message has been deleted

Sisi Sarkizova

unread,
Dec 19, 2019, 4:06:14 PM12/19/19
to alon_...@alumni.brown.edu, HLAthena
Detailed explanations of inputs and outputs coming shortly, sorry for the omission!

Sisi 

--
You received this message because you are subscribed to the Google Groups "HLAthena" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hlathena+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hlathena/0e21b532-e5e5-4cbe-990d-af51bcb10676%40googlegroups.com.

Sisi Sarkizova

unread,
Dec 19, 2019, 5:29:08 PM12/19/19
to Alon Galor, HLAthena
Hi Alon, 

Would you please take a look at the descriptions of inputs and outputs under Predict -> How to and let me know if this helps answer your questions?

Regarding predictions with expression - looking into this. 

Thanks,
Sisi

alon_...@alumni.brown.edu

unread,
Dec 19, 2019, 6:13:56 PM12/19/19
to HLAthena
Thanks for adding these very clear docs so quickly, Sisi!

Another question comes to mind now that you've added information regarding up and downstream sequences - since this question may be of relevance to others on this forum, and for those using your tool (this could be a useful addition to the help page you've created), I was wondering if you have a preferred or recommended tool for generating up and downstream sequences?

Also looking forward to hearing regarding predictions with expression!

Alon
To unsubscribe from this group and stop receiving emails from it, send an email to hlat...@googlegroups.com.

Sisi Sarkizova

unread,
Dec 19, 2019, 8:49:09 PM12/19/19
to Alon Galor, HLAthena
Hi Alon, glad descriptions were clear.

Re predictions with expression - you've actually stumbled upon a case that we have omitted, that is predictions when expression is available but expression is not. While we can compute log-likelihood scores for this case, we are missing the percentile rank component. I will work on fully supporting this scenario but it will take a few days to generate the background distributions necessary to compute ranks. In the meantime, you can still run prediction with expression by setting 'Assign peptides to alleles by:' to 'scores'. Make sure to change the threshold accordingly (scores closer to 1 are good). I will send another update once ranks are available.

Re finding context sequences - the approach I'd suggest would be to match the peptide sequences against a fasta file containing the corresponding protein sequences. This will be different based on the organism (e.g. human, viral, etc.) or reference (e.g. UCSC, Ensembl, Uniprot, etc.) used and we don't yet have an automated way of supporting this task. Alternatively, if you are interested in all possible peptides from a few particular protein sequences you can use the 'fasta input format' approach which will tile peptides and context sequences. The limitation there is that expression values are hard to encode within the fasta format, but that can be done in a secondary step.

Thanks for the feedback!
Sisi


To unsubscribe from this group and stop receiving emails from it, send an email to hlathena+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hlathena/53a6f9e5-f22f-4eb6-96a9-4bc3f7fde247%40googlegroups.com.

alon_...@alumni.brown.edu

unread,
Dec 20, 2019, 11:27:21 AM12/20/19
to HLAthena
Hi Sisi,

Thanks for clarifying re predictions with expression! Looking forward to the availability of the percentile rank component!

Thank you for your suggestions re finding context sequences - these are both very helpful!

Very much appreciate your hard work on this,

Alon

alon_...@alumni.brown.edu

unread,
Jan 2, 2020, 2:07:45 PM1/2/20
to HLAthena
Hi Sisi,

Hope all is well!

While you're working on making percentile ranks available when expression is provided, is there a simple conversion formula that can be employed to convert a rank score to a percentile rank?

For example, if my MSiE_B1501 is 0.9987547, is there a way I could convert this to a percentile rank?

Thanks! 

Alon

Sisi Sarkizova

unread,
Jan 2, 2020, 3:15:02 PM1/2/20
to Alon Galor, HLAthena
Hi Alon, 

The percentile ranks are computed based on the distribution of scores for each allele, so unfortunately there isn't a straight forward conversion formula. Distributions over scores for the MSiE models have already been computed so this will be resolved next week! 

Thanks,
Sisi






To unsubscribe from this group and stop receiving emails from it, send an email to hlathena+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hlathena/d5f5d575-ba81-4da0-9984-ce691cce7ccd%40googlegroups.com.

alon_...@alumni.brown.edu

unread,
Jan 2, 2020, 4:06:57 PM1/2/20
to HLAthena
Thanks Sisi! By the way, do you also have plans to release a pan-allele predictor?

Sisi Sarkizova

unread,
Jan 2, 2020, 4:10:54 PM1/2/20
to Alon Galor, HLAthena
Yes! Pan-allele models are already in use for some non-9-mer predictions and will be available for 2000+ HLA alleles sequences form imgt.

To unsubscribe from this group and stop receiving emails from it, send an email to hlathena+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hlathena/5418d58c-6aa2-4e9a-a441-789ce0243295%40googlegroups.com.

Sisi Sarkizova

unread,
Jan 9, 2020, 3:01:29 PM1/9/20
to Alon Galor, HLAthena
Hi Alon, 

Percentile ranks for the expression-only case should be working now. Please let me know if you run into trouble.
Sisi

Reply all
Reply to author
Forward
0 new messages