Changing the start and max_results a little bit, I've figured out that
I can only get results until 49945, that is, any value of max_results
bigger than one for the query below doesn't return any result.
On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> wrote: > Using the API, I could notice that the total results for astro-ph > category is 105380 as presented below:
> Changing the start and max_results a little bit, I've figured out that > I can only get results until 49945, that is, any value of max_results > bigger than one for the query below doesn't return any result.
> -- > You received this message because you are subscribed to the Google Groups "arXiv api" group. > To post to this group, send email to arxiv-api@googlegroups.com. > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/arxiv-api?hl=en.
I haven't understood what you've answered. Is it possible to retrieve
more than 50000 articles by using time slices? Because
I'm retrieving 10 articles per iteration (waiting 3 seconds after each
one) and even this way I can't retrieve more than 50000. Is that
right? Because I've read some papers that have used the API to build
collaboration networks by category, e.g Astro-ph; using all the
articles from such category which is exactly what I'm trying to do to
carry on some experiments.
Cheers
Paulo S.
On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote:
> the max number of returned search results is limited to 50000 for
> practical reasons.
> we recommend using time slices for searches that are too broad.
> Cheers
> T.
> On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> wrote:
> > Using the API, I could notice that the total results for astro-ph
> > category is 105380 as presented below:
> > Changing the start and max_results a little bit, I've figured out that
> > I can only get results until 49945, that is, any value of max_results
> > bigger than one for the query below doesn't return any result.
> > --
> > You received this message because you are subscribed to the Google Groups "arXiv api" group.
> > To post to this group, send email to arxiv-api@googlegroups.com.
> > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/arxiv-api?hl=en.
Sorry for being too insistent. The thing is that I really need to do
this as soon as possible. Can anyone explain me how can I retrieve all
the articles from a specific category?
Thanks,
Paulo S.
On Aug 26, 5:55 pm, "Paulo S." <prssoar....@gmail.com> wrote:
> I haven't understood what you've answered. Is it possible to retrieve
> more than 50000 articles by using time slices? Because
> I'm retrieving 10 articles per iteration (waiting 3 seconds after each
> one) and even this way I can't retrieve more than 50000. Is that
> right? Because I've read some papers that have used the API to build
> collaboration networks by category, e.g Astro-ph; using all the
> articles from such category which is exactly what I'm trying to do to
> carry on some experiments.
> Cheers
> Paulo S.
> On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote:
> > the max number of returned search results is limited to 50000 for
> > practical reasons.
> > we recommend using time slices for searches that are too broad.
> > Cheers
> > T.
> > On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> wrote:
> > > Using the API, I could notice that the total results for astro-ph
> > > category is 105380 as presented below:
> > > Changing the start and max_results a little bit, I've figured out that
> > > I can only get results until 49945, that is, any value of max_results
> > > bigger than one for the query below doesn't return any result.
> > > --
> > > You received this message because you are subscribed to the Google Groups "arXiv api" group.
> > > To post to this group, send email to arxiv-api@googlegroups.com.
> > > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com.
> > > For more options, visit this group athttp://groups.google.com/group/arxiv-api?hl=en.
The initial search is cached and you are retrieving subsets of that search. No new search is performed when you subsequently specify subsets of a search. Therefore the total number of results for the lookup is limited to 50000 and your approach will not work and is not the intended use of the api.
If you modify your original search to use time slices, then those individual lookups are separate searches and therefore you can cover the entire search space by using contiguous time slices of adequate size.
However, a better approach to retrieving an entire archive's content is to query the corresponding set via the OAI-PMH interface http://arxiv.org/help/oa/index .
On Fri, Aug 26, 2011 at 2:55 PM, Paulo S. <prssoar....@gmail.com> wrote: > I haven't understood what you've answered. Is it possible to retrieve > more than 50000 articles by using time slices? Because > I'm retrieving 10 articles per iteration (waiting 3 seconds after each > one) and even this way I can't retrieve more than 50000. Is that > right? Because I've read some papers that have used the API to build > collaboration networks by category, e.g Astro-ph; using all the > articles from such category which is exactly what I'm trying to do to > carry on some experiments.
> Cheers > Paulo S.
> On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote: >> the max number of returned search results is limited to 50000 for >> practical reasons.
>> we recommend using time slices for searches that are too broad.
>> Cheers >> T.
>> On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> wrote: >> > Using the API, I could notice that the total results for astro-ph >> > category is 105380 as presented below:
>> > Changing the start and max_results a little bit, I've figured out that >> > I can only get results until 49945, that is, any value of max_results >> > bigger than one for the query below doesn't return any result.
>> > -- >> > You received this message because you are subscribed to the Google Groups "arXiv api" group. >> > To post to this group, send email to arxiv-api@googlegroups.com. >> > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com. >> > For more options, visit this group athttp://groups.google.com/group/arxiv-api?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "arXiv api" group. > To post to this group, send email to arxiv-api@googlegroups.com. > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/arxiv-api?hl=en.
I´m afraid I'm lost about this time slice approach. Could you give me
a short example showing how I´m supposed to use it (how I should build
the url and stuff)?
Thanks again,
Paulo S.
On Aug 30, 1:13 pm, Thorsten S <thorsten.schwan...@gmail.com> wrote:
> The initial search is cached and you are retrieving subsets of that
> search. No new search is performed when you subsequently specify
> subsets of a search. Therefore the total number of results for the
> lookup is limited to 50000 and your approach will not work and is not
> the intended use of the api.
> If you modify your original search to use time slices, then those
> individual lookups are separate searches and therefore you can cover
> the entire search space by using contiguous time slices of adequate
> size.
> However, a better approach to retrieving an entire archive's content
> is to query the corresponding set via the OAI-PMH interfacehttp://arxiv.org/help/oa/index .
> Best
> T.
> On Fri, Aug 26, 2011 at 2:55 PM, Paulo S. <prssoar....@gmail.com> wrote:
> > I haven't understood what you've answered. Is it possible to retrieve
> > more than 50000 articles by using time slices? Because
> > I'm retrieving 10 articles per iteration (waiting 3 seconds after each
> > one) and even this way I can't retrieve more than 50000. Is that
> > right? Because I've read some papers that have used the API to build
> > collaboration networks by category, e.g Astro-ph; using all the
> > articles from such category which is exactly what I'm trying to do to
> > carry on some experiments.
> > Cheers
> > Paulo S.
> > On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote:
> >> the max number of returned search results is limited to 50000 for
> >> practical reasons.
> >> we recommend using time slices for searches that are too broad.
> >> Cheers
> >> T.
> >> On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> wrote:
> >> > Using the API, I could notice that the total results for astro-ph
> >> > category is 105380 as presented below:
> >> > Changing the start and max_results a little bit, I've figured out that
> >> > I can only get results until 49945, that is, any value of max_results
> >> > bigger than one for the query below doesn't return any result.
> >> > --
> >> > You received this message because you are subscribed to the Google Groups "arXiv api" group.
> >> > To post to this group, send email to arxiv-api@googlegroups.com.
> >> > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com.
> >> > For more options, visit this group athttp://groups.google.com/group/arxiv-api?hl=en.
> > --
> > You received this message because you are subscribed to the Google Groups "arXiv api" group.
> > To post to this group, send email to arxiv-api@googlegroups.com.
> > To unsubscribe from this group, send email to arxiv-api+unsubscribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/arxiv-api?hl=en.
Something like the code below should get you the last weeks worth of results if I understand my own code correctly :)
It's been a couple of months since I wrote it so might not be perfect, but should point you in the right direction. If you're grabbing 50k at a time, just increase the max_results and timedelta so you're consistently picking up just less than 50k records in one go.
> I´m afraid I'm lost about this time slice approach. Could you give me > a short example showing how I´m supposed to use it (how I should build > the url and stuff)?
> Thanks again, > Paulo S.
> On Aug 30, 1:13 pm, Thorsten S <thorsten.schwan...@gmail.com> wrote: > > The initial search is cached and you are retrieving subsets of that > > search. No new search is performed when you subsequently specify > > subsets of a search. Therefore the total number of results for the > > lookup is limited to 50000 and your approach will not work and is not > > the intended use of the api.
> > If you modify your original search to use time slices, then those > > individual lookups are separate searches and therefore you can cover > > the entire search space by using contiguous time slices of adequate > > size.
> > However, a better approach to retrieving an entire archive's content > > is to query the corresponding set via the OAI-PMH interfacehttp:// > arxiv.org/help/oa/index .
> > Best > > T.
> > On Fri, Aug 26, 2011 at 2:55 PM, Paulo S. <prssoar....@gmail.com> wrote: > > > I haven't understood what you've answered. Is it possible to retrieve > > > more than 50000 articles by using time slices? Because > > > I'm retrieving 10 articles per iteration (waiting 3 seconds after each > > > one) and even this way I can't retrieve more than 50000. Is that > > > right? Because I've read some papers that have used the API to build > > > collaboration networks by category, e.g Astro-ph; using all the > > > articles from such category which is exactly what I'm trying to do to > > > carry on some experiments.
> > > Cheers > > > Paulo S.
> > > On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote: > > >> the max number of returned search results is limited to 50000 for > > >> practical reasons.
> > >> we recommend using time slices for searches that are too broad.
> > >> Cheers > > >> T.
> > >> On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> > wrote: > > >> > Using the API, I could notice that the total results for astro-ph > > >> > category is 105380 as presented below:
> > >> > Changing the start and max_results a little bit, I've figured out > that > > >> > I can only get results until 49945, that is, any value of > max_results > > >> > bigger than one for the query below doesn't return any result.
> > >> > -- > > >> > You received this message because you are subscribed to the Google > Groups "arXiv api" group. > > >> > To post to this group, send email to arxiv-api@googlegroups.com. > > >> > To unsubscribe from this group, send email to > arxiv-api+unsubscribe@googlegroups.com. > > >> > For more options, visit this group athttp:// > groups.google.com/group/arxiv-api?hl=en.
> > > -- > > > You received this message because you are subscribed to the Google > Groups "arXiv api" group. > > > To post to this group, send email to arxiv-api@googlegroups.com. > > > To unsubscribe from this group, send email to > arxiv-api+unsubscribe@googlegroups.com. > > > For more options, visit this group athttp:// > groups.google.com/group/arxiv-api?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "arXiv api" group. > To post to this group, send email to arxiv-api@googlegroups.com. > To unsubscribe from this group, send email to > arxiv-api+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/arxiv-api?hl=en.
which gives 12485 astro-ph papers in 2009 <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">12485</opensearch:totalResults>
Note that you need to specify cat:astro-ph.* to search across all astro-ph subcategories
On Tue, Aug 30, 2011 at 12:56 PM, Toby Proctor <toby.proc...@gmail.com> wrote: > Paulo, > Something like the code below should get you the last weeks worth of results > if I understand my own code correctly :) > It's been a couple of months since I wrote it so might not be perfect, but > should point you in the right direction. If you're grabbing 50k at a time, > just increase the max_results and timedelta so you're consistently picking > up just less than 50k records in one go. > Toby
> On 30 August 2011 18:04, Paulo S. <prssoar....@gmail.com> wrote:
>> I´m afraid I'm lost about this time slice approach. Could you give me >> a short example showing how I´m supposed to use it (how I should build >> the url and stuff)?
>> Thanks again, >> Paulo S.
>> On Aug 30, 1:13 pm, Thorsten S <thorsten.schwan...@gmail.com> wrote: >> > The initial search is cached and you are retrieving subsets of that >> > search. No new search is performed when you subsequently specify >> > subsets of a search. Therefore the total number of results for the >> > lookup is limited to 50000 and your approach will not work and is not >> > the intended use of the api.
>> > If you modify your original search to use time slices, then those >> > individual lookups are separate searches and therefore you can cover >> > the entire search space by using contiguous time slices of adequate >> > size.
>> > However, a better approach to retrieving an entire archive's content >> > is to query the corresponding set via the OAI-PMH >> > interfacehttp://arxiv.org/help/oa/index .
>> > Best >> > T.
>> > On Fri, Aug 26, 2011 at 2:55 PM, Paulo S. <prssoar....@gmail.com> wrote: >> > > I haven't understood what you've answered. Is it possible to retrieve >> > > more than 50000 articles by using time slices? Because >> > > I'm retrieving 10 articles per iteration (waiting 3 seconds after each >> > > one) and even this way I can't retrieve more than 50000. Is that >> > > right? Because I've read some papers that have used the API to build >> > > collaboration networks by category, e.g Astro-ph; using all the >> > > articles from such category which is exactly what I'm trying to do to >> > > carry on some experiments.
>> > > Cheers >> > > Paulo S.
>> > > On Aug 26, 7:09 am, Thorsten S <thorsten.schwan...@gmail.com> wrote: >> > >> the max number of returned search results is limited to 50000 for >> > >> practical reasons.
>> > >> we recommend using time slices for searches that are too broad.
>> > >> Cheers >> > >> T.
>> > >> On Thu, Aug 25, 2011 at 2:51 PM, Paulo S. <prssoar....@gmail.com> >> > >> wrote: >> > >> > Using the API, I could notice that the total results for astro-ph >> > >> > category is 105380 as presented below:
>> > >> > Changing the start and max_results a little bit, I've figured out >> > >> > that >> > >> > I can only get results until 49945, that is, any value of >> > >> > max_results >> > >> > bigger than one for the query below doesn't return any result.
>> > >> > -- >> > >> > You received this message because you are subscribed to the Google >> > >> > Groups "arXiv api" group. >> > >> > To post to this group, send email to arxiv-api@googlegroups.com. >> > >> > To unsubscribe from this group, send email to >> > >> > arxiv-api+unsubscribe@googlegroups.com. >> > >> > For more options, visit this group >> > >> > athttp://groups.google.com/group/arxiv-api?hl=en.
>> > > -- >> > > You received this message because you are subscribed to the Google >> > > Groups "arXiv api" group. >> > > To post to this group, send email to arxiv-api@googlegroups.com. >> > > To unsubscribe from this group, send email to >> > > arxiv-api+unsubscribe@googlegroups.com. >> > > For more options, visit this group >> > > athttp://groups.google.com/group/arxiv-api?hl=en.
>> -- >> You received this message because you are subscribed to the Google Groups >> "arXiv api" group. >> To post to this group, send email to arxiv-api@googlegroups.com. >> To unsubscribe from this group, send email to >> arxiv-api+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/arxiv-api?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "arXiv api" group. > To post to this group, send email to arxiv-api@googlegroups.com. > To unsubscribe from this group, send email to > arxiv-api+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/arxiv-api?hl=en.