seqrepo release

10 views
Skip to first unread message

Reece Hart

unread,
Apr 13, 2020, 1:45:34 AM4/13/20
to hgvs-discuss
Hi All-

biocommons.seqrepo 0.5.5 was released yesterday and there's a new data release, 2020-04-13, as of today.

This release fixes a number of bugs in the parsing of fasta headers that caused some sequences to appear to not be in seqrepo.

The new release has these changes:
  • Some seqrepo aliases included the sequence description from the fasta source. These were fixed.
  • Ensembl <= 84 was dropped. 
  • The remaining Ensembl-nn namespaces were collapsed to the Ensembl namespace
  • The gi and genbank namespaces were dropped
  • RefSeq updates since Jan 2019 were loaded/reloaded.
  • Ensembl sequences from releases 90-99 were loaded (into Ensembl namespace)
  • Japanese Reference Genome v1 and v2 were added  
A byproduct of the above changes that removed sequence descriptions and redundant aliases is that the aliases database was reduced in size by about 1.5 GB (~13% of the release size).

-Reece

Reply all
Reply to author
Forward
0 new messages