Maximum Number of Records for Genbank?

9 views
Skip to first unread message

Nathan Lemoine

unread,
Jun 28, 2014, 10:12:42 PM6/28/14
to dendrop...@googlegroups.com
I'm using the GenBank interoperability functions to download sequences from GenBank. I appear to have hit a wall for how many records can be downloaded at once (somewhere right around 550 records). It's not a problem with accession ID's or anything, because I can download smaller subsets of records no matter how I split them up. But once I get to 550 or 551 records, I get an error:

<HTML><HEAD>
<TITLE>Request Error</TITLE>
</HEAD>
<BODY>
<FONT face="Helvetica">
<big><strong></strong></big><BR>
</FONT>
<blockquote>
<TABLE border=0 cellPadding=1 width="80%">
<TR><TD>
<FONT face="Helvetica">
<big>Request Error (invalid_request)</big>
<BR>
<BR>
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">
Your request could not be processed. Request could not be handled
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">
This could be caused by a misconfiguration, or possibly a malformed request.
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica" SIZE=2>
<BR>
For assistance, contact your network support team.
</FONT>
</TD></TR>
</TABLE>
</blockquote>
</FONT>
</BODY></HTML>
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2827, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
  File "<ipython-input-124-41b75487b2bc>", line 1, in <module>
    lep_seq = genbank.GenBankDna(ids = accGuide['COI_Accession'])
  File "/Library/Python/2.7/site-packages/dendropy/interop/genbank.py", line 365, in __init__
    email=email)
  File "/Library/Python/2.7/site-packages/dendropy/interop/genbank.py", line 348, in __init__
    email=email)
  File "/Library/Python/2.7/site-packages/dendropy/interop/genbank.py", line 130, in __init__
    verify=verify)
  File "/Library/Python/2.7/site-packages/dendropy/interop/genbank.py", line 202, in acquire
    gb_recs = GenBankResourceStore.parse_xml(string=xml_string)
  File "/Library/Python/2.7/site-packages/dendropy/interop/genbank.py", line 59, in parse_xml
    root = ElementTree.fromstring(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: mismatched tag: line 7, column 2

It's not a huge problem, because I think I can always split the record downloads up into smaller bits and use the acquire method, but I was just wondering if this is a real cap or if I'm doing something wrong.

Nate

Jeet Sukumaran

unread,
Jun 29, 2014, 11:09:10 AM6/29/14
to dendrop...@googlegroups.com
We certainly have not implemented any logic to limit the number of
records on the local (DendroPy) side of things. If this is indeed what
is happening, then it must be an issue with the query protocol (we use
`efetch.fcgi`).

Thank you for pointing this out: I will put it on the list of things to
fix. As we have noted before, though, we are in the middle of getting
DendroPy 4 ready for release, so, assuming that this issue can be fixed,
the fix will probably only happen after that (unless I can squeeze in
sometime in between). Apologies for the delay.

-- jeet
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Reply all
Reply to author
Forward
0 new messages