Number of input and output rows does not match

39 views

Skip to first unread message

Gary Chan

unread,

Aug 8, 2022, 8:41:34 PM8/8/22

to HLAthena

Hi all,

I have tried to use HLAthena "predict" function to predict a list of ~24000 pep with different ctex_up and ctex_dn for 5 different HLA alleles. As the website only work with 10,000 peptides, I broke down my file into 9000 rows per file with the below setting.

Assign peptides to alleles by: scores

Threshold: 0.1

Peptide column name: pep

Log-transform expr? no

Context available? yes

Aggregate by peptide? no

However, the output files constantly contain less rows than the input, suggesting some of the rows were discarded.

I have tried to change the "Aggregate by peptide?" option from no to yes, drop the number of row to 5,000 per file and delete all the other columns but "pep", ctex_up" and "ctex_dn", but the problem persists.

Some of the pep sequence are duplicated, but they will have different ctex_up or ctex_dn.

I have gone through the "How to" page and cannot figure out what I did wrong, so I will greatly appreciate if you have any suggestion.

Thanks a lot!

Gary

Gary Chan

unread,

Aug 8, 2022, 10:31:01 PM8/8/22

to HLAthena

oops, nvm, sorry!

I found some of the pep or ctex sequences contained "?", so the HLAthena dropped them. problem is fixed now =)