Sequence cluster files now per-polymer entity, instead of per-chain

40 views
Skip to first unread message

Jose Duarte

unread,
Apr 12, 2022, 1:21:57 PM4/12/22
to APIs @ RCSB PDB

RCSB.org has introduced new files that contain the results of the weekly clustering of protein sequences in the PDB by MMseqs2 at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. Note that these files use polymer entity identifiers, instead of chain identifiers to avoid redundancy. The files are plain text with one cluster per line, sorted from largest cluster to smallest.

Files containing chain-based clustering will be updated only until April 12, 2022. Users should migrate to the new entity-based files as soon as possible.

This change enables more efficient delivery of sequence clustering data.

Jose

Reply all
Reply to author
Forward
0 new messages