Hi Vartika,
The UniProt parser should work with IDs containing a hyphen. Your code works for me without any changes:
In [1]: from pyteomics import fasta
In [2]: human_fasta = fasta.IndexedUniProt("HUMAN.fasta")
In [3]: human_fasta["Q86U42-2"]
Out[3]: Protein(description={'db': 'sp', 'id': 'Q86U42-2', 'entry': 'PABP2_HUMAN', 'name': 'Isoform 2 of Polyadenylate-binding protein 2', 'gene_id': 'PABP2', 'taxon': 'HUMAN', 'OS': 'Homo sapiens', 'GN': 'PABPN1'}, sequence='MAAAAAAAAAAGAAGGRGSGPGRRRHLVPGAGGEAGEGAPGGAGDYGNGLESEELEPEELLLEPEPEPEPEEEPPRPRAPPGAPGPGPGSGAPGSQEEEEEPGLVEGDPGDGAIEDPELEAIKARVREMEEEAEKLKELQNEVEKQMNMSPPPGNAGPVIMSIEEKMEADARSIYVGNVDYGATAEELEAHFHGCGSVNRVTILCDKFSGHPKGFAYIEFSDKESVRTSLALDESLFRGRQIKVIPKRTNRPGISTTDRGFPRARYRARTTNYNSSRSRFYSGFNSRPRGRVYRSG')
This is with an old file where such an entry actually exists. Right now such isoform doesn't seem to exist in UniProt, so with a freshly downloaded database I get a KeyError, too (but also grep shows it's really not there).
Can you show how the entry looks in your file? Or maybe share a copy of your file that allows reproducing the problem? It doesn't have to be the full database, can be just an excerpt.
Best regards,
Lev
Lev Levitsky
Institute for Energy Problems
of Chemical Physics RAS
Laboratory of Physical and Chemical Methods for
Structure Analysis
Leninsky pr. 38, bld. 2 119334 Moscow Russia
tel: +7
499 1378257 fax:
+7 499 1378257,
+7 499 1378258