I'm having issues getting Cloudsearch functioning correctly. I have my two domains set up and the appropriate API url's in my .ini. I have created the indexes according to this thread:
I added a cloudsearch_q and it appears to be running.
Searches no longer error as they did before this config, but they are not returning data, and it appears it's not uploading any data to Cloudsearch. I have added the IP of my server to the access policies in Cloudsearch. I see no errors to STDOUT when doing searches, or submitting new links.
Any ideas? I'm not sure how to even troubleshoot this issue.
In your AWS console for cloudsearch, does your domain appear to have any
documents? If not, then your uploads are not working, and we can figure out
what's breaking on that end. If so, then it's searches that are broken, and
we'll investigate that side.
On Wed, Oct 31, 2012 at 8:02 PM, tr <ril...@gmail.com> wrote:
> Hi,
> I'm having issues getting Cloudsearch functioning correctly. I have my
> two domains set up and the appropriate API url's in my .ini. I have
> created the indexes according to this thread:
> I added a cloudsearch_q and it appears to be running.
> Searches no longer error as they did before this config, but they are not
> returning data, and it appears it's not uploading any data to Cloudsearch.
> I have added the IP of my server to the access policies in Cloudsearch. I
> see no errors to STDOUT when doing searches, or submitting new links.
> Any ideas? I'm not sure how to even troubleshoot this issue.
> Thanks in advance.
> --
> You received this message because you are subscribed to the Google Groups
> "reddit-dev" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/reddit-dev/-/x8HTcmtPUU8J.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
On Monday, November 5, 2012 1:54:01 PM UTC-5, Keith Mitchell wrote:
> In your AWS console for cloudsearch, does your domain appear to have any > documents? If not, then your uploads are not working, and we can figure out > what's breaking on that end. If so, then it's searches that are broken, and > we'll investigate that side.
Did you back fill existing documents using
cloudsearch.py:rebuild_link_index()? Is your cloudsearch_changes queue
processor running, and if so, what's in the log?
On Mon, Nov 5, 2012 at 12:59 PM, tr <ril...@gmail.com> wrote:
> Zero documents - so yes I assume an upload issue.
> On Monday, November 5, 2012 1:54:01 PM UTC-5, Keith Mitchell wrote:
>> In your AWS console for cloudsearch, does your domain appear to have any
>> documents? If not, then your uploads are not working, and we can figure out
>> what's breaking on that end. If so, then it's searches that are broken, and
>> we'll investigate that side.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
Forgive me I'm far from an expert on python - how would I run the rebuild_link_index() function manually? I have not done that, but, new content is not being uploaded to Cloudsearch either.
I have "cloudsearch_q 1" in my consumer-counts file, and it is running. Is this different from the cloudsearch_changes queue processor? Where do I find the log?
On Monday, November 5, 2012 5:48:14 PM UTC-5, Keith Mitchell wrote:
> Did you back fill existing documents using > cloudsearch.py:rebuild_link_index()? Is your cloudsearch_changes queue > processor running, and if so, what's in the log?
On Mon, Nov 5, 2012 at 3:32 PM, tr <ril...@gmail.com> wrote:
> Forgive me I'm far from an expert on python - how would I run the
> rebuild_link_index() function manually? I have not done that, but, new
> content is not being uploaded to Cloudsearch either.
> I have "cloudsearch_q 1" in my consumer-counts file, and it is running.
> Is this different from the cloudsearch_changes queue processor? Where do I
> find the log?
> Thanks very much.
> On Monday, November 5, 2012 5:48:14 PM UTC-5, Keith Mitchell wrote:
>> Did you back fill existing documents using cloudsearch.py:rebuild_link_**index()?
>> Is your cloudsearch_changes queue processor running, and if so, what's in
>> the log?
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
Yup, sounds like the q_proc. If you're certain it's running ("sudo initctl
list | grep reddit" should help with that, I think) then the log is the
next important bit.
On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com> wrote:
> Thanks - running manually successfully processed, and I now see documents
> in Cloudsearch. Searching are returning successfully.
> So it seems the issue is with the q proc? I'll poke around syslog and see
> if I see anything, otherwise if you have any suggestions let me know.
> Thanks for the help thus far.
> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>> You can get a python shell for running reddit code in the proper context
>> by cd'ing to {reddit}/r2, then running "paster shell your_ini_file.ini".
>> From there, you can do:
>> import r2.lib.cloudsearch as cs
>> cs.rebuild_link_index()
>> (And of course, you can import anything else from the reddit code base,
>> inspect objects, load things from the database, etc.)
>> I'm not 100% sure, but I think by default the q procs will write to
>> syslog.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
reddit-consumer-cloudsearch_q (1) start/running, process 28212
So it appears to be running as it should. I'm not sure what to even look for as far as logs go. A find for *log* doesn't really come up with anything significant, and I don't see anything significant in syslog either.
On Wednesday, November 7, 2012 2:30:30 PM UTC-5, Keith Mitchell wrote:
> Yup, sounds like the q_proc. If you're certain it's running ("sudo initctl > list | grep reddit" should help with that, I think) then the log is the > next important bit.
> On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com <javascript:>>wrote:
>> Thanks - running manually successfully processed, and I now see documents >> in Cloudsearch. Searching are returning successfully.
>> So it seems the issue is with the q proc? I'll poke around syslog and >> see if I see anything, otherwise if you have any suggestions let me know.
>> Thanks for the help thus far.
>> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>>> You can get a python shell for running reddit code in the proper context >>> by cd'ing to {reddit}/r2, then running "paster shell your_ini_file.ini".
>>> From there, you can do:
>>> import r2.lib.cloudsearch as cs >>> cs.rebuild_link_index()
>>> (And of course, you can import anything else from the reddit code base, >>> inspect objects, load things from the database, etc.)
>>> I'm not 100% sure, but I think by default the q procs will write to >>> syslog.
>> To post to this group, send email to reddi...@googlegroups.com<javascript:> >> . >> To unsubscribe from this group, send email to >> reddit-dev+...@googlegroups.com <javascript:>. >> For more options, visit this group at >> http://groups.google.com/group/reddit-dev?hl=en.
The output is sent to syslog via wrap-job. By default this should go to
/var/log/syslog. The default log facility is cron, so it might also be in
/var/log/cron.log. If that's not the case for you then you need to consult
your syslog documentation.
On Wed, Nov 7, 2012 at 4:28 PM, tr <ril...@gmail.com> wrote:
> initctl list shows:
> reddit-consumer-cloudsearch_q (1) start/running, process 28212
> So it appears to be running as it should. I'm not sure what to even look
> for as far as logs go. A find for *log* doesn't really come up with
> anything significant, and I don't see anything significant in syslog either.
> On Wednesday, November 7, 2012 2:30:30 PM UTC-5, Keith Mitchell wrote:
>> Yup, sounds like the q_proc. If you're certain it's running ("sudo
>> initctl list | grep reddit" should help with that, I think) then the log is
>> the next important bit.
>> On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com> wrote:
>>> Thanks - running manually successfully processed, and I now see
>>> documents in Cloudsearch. Searching are returning successfully.
>>> So it seems the issue is with the q proc? I'll poke around syslog and
>>> see if I see anything, otherwise if you have any suggestions let me know.
>>> Thanks for the help thus far.
>>> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>>>> You can get a python shell for running reddit code in the proper
>>>> context by cd'ing to {reddit}/r2, then running "paster shell
>>>> your_ini_file.ini".
>>>> From there, you can do:
>>>> import r2.lib.cloudsearch as cs
>>>> cs.rebuild_link_index()
>>>> (And of course, you can import anything else from the reddit code base,
>>>> inspect objects, load things from the database, etc.)
>>>> I'm not 100% sure, but I think by default the q procs will write to
>>>> syslog.
>>> To post to this group, send email to reddi...@googlegroups.com.
>>> To unsubscribe from this group, send email to reddit-dev+...@**
>>> googlegroups.com.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
Attached is the syslog output after submitting a link. I don't see any mention of cloudsearch_q in there, but other q proc's such as scraper, vote_link, etc.
On Wednesday, November 7, 2012 8:14:52 PM UTC-5, Ricky Ramirez wrote:
> The output is sent to syslog via wrap-job. By default this should go to > /var/log/syslog. The default log facility is cron, so it might also be in > /var/log/cron.log. If that's not the case for you then you need to consult > your syslog documentation.
> Ricky
> On Wed, Nov 7, 2012 at 4:28 PM, tr <ril...@gmail.com <javascript:>> wrote:
>> initctl list shows:
>> reddit-consumer-cloudsearch_q (1) start/running, process 28212
>> So it appears to be running as it should. I'm not sure what to even look >> for as far as logs go. A find for *log* doesn't really come up with >> anything significant, and I don't see anything significant in syslog either.
>> On Wednesday, November 7, 2012 2:30:30 PM UTC-5, Keith Mitchell wrote:
>>> Yup, sounds like the q_proc. If you're certain it's running ("sudo >>> initctl list | grep reddit" should help with that, I think) then the log is >>> the next important bit.
>>> On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com> wrote:
>>>> Thanks - running manually successfully processed, and I now see >>>> documents in Cloudsearch. Searching are returning successfully.
>>>> So it seems the issue is with the q proc? I'll poke around syslog and >>>> see if I see anything, otherwise if you have any suggestions let me know.
>>>> Thanks for the help thus far.
>>>> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>>>>> You can get a python shell for running reddit code in the proper >>>>> context by cd'ing to {reddit}/r2, then running "paster shell >>>>> your_ini_file.ini".
>>>>> From there, you can do:
>>>>> import r2.lib.cloudsearch as cs >>>>> cs.rebuild_link_index()
>>>>> (And of course, you can import anything else from the reddit code >>>>> base, inspect objects, load things from the database, etc.)
>>>>> I'm not 100% sure, but I think by default the q procs will write to >>>>> syslog.
>>>> To post to this group, send email to reddi...@googlegroups.com. >>>> To unsubscribe from this group, send email to reddit-dev+...@** >>>> googlegroups.com.
>> To post to this group, send email to reddi...@googlegroups.com<javascript:> >> . >> To unsubscribe from this group, send email to >> reddit-dev+...@googlegroups.com <javascript:>. >> For more options, visit this group at >> http://groups.google.com/group/reddit-dev?hl=en.
Try modifying reddit-consumer-cloudsearch_q.conf and make the following
change to the end of the wrap-job line:
change
'run_changed()'
into
'run_changed(min_size=0)'
Then restart the q proc.
Cloudsearch's document upload handling is most efficient when working on
batches of documents, so we wait for a sufficiently large number of items
to queue up before processing and sending. Presumably, you're not
generating enough new/changed submissions to hit the default minimum size
of 500, so the queue proc is just waiting and waiting.
On Wed, Nov 7, 2012 at 7:38 PM, tr <ril...@gmail.com> wrote:
> Attached is the syslog output after submitting a link. I don't see any
> mention of cloudsearch_q in there, but other q proc's such as scraper,
> vote_link, etc.
> Thanks
> On Wednesday, November 7, 2012 8:14:52 PM UTC-5, Ricky Ramirez wrote:
>> The output is sent to syslog via wrap-job. By default this should go to
>> /var/log/syslog. The default log facility is cron, so it might also be in
>> /var/log/cron.log. If that's not the case for you then you need to consult
>> your syslog documentation.
>> Ricky
>> On Wed, Nov 7, 2012 at 4:28 PM, tr <ril...@gmail.com> wrote:
>>> initctl list shows:
>>> reddit-consumer-cloudsearch_q (1) start/running, process 28212
>>> So it appears to be running as it should. I'm not sure what to even
>>> look for as far as logs go. A find for *log* doesn't really come up with
>>> anything significant, and I don't see anything significant in syslog either.
>>> On Wednesday, November 7, 2012 2:30:30 PM UTC-5, Keith Mitchell wrote:
>>>> Yup, sounds like the q_proc. If you're certain it's running ("sudo
>>>> initctl list | grep reddit" should help with that, I think) then the log is
>>>> the next important bit.
>>>> On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com> wrote:
>>>>> Thanks - running manually successfully processed, and I now see
>>>>> documents in Cloudsearch. Searching are returning successfully.
>>>>> So it seems the issue is with the q proc? I'll poke around syslog and
>>>>> see if I see anything, otherwise if you have any suggestions let me know.
>>>>> Thanks for the help thus far.
>>>>> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>>>>>> You can get a python shell for running reddit code in the proper
>>>>>> context by cd'ing to {reddit}/r2, then running "paster shell
>>>>>> your_ini_file.ini".
>>>>>> From there, you can do:
>>>>>> import r2.lib.cloudsearch as cs
>>>>>> cs.rebuild_link_index()
>>>>>> (And of course, you can import anything else from the reddit code
>>>>>> base, inspect objects, load things from the database, etc.)
>>>>>> I'm not 100% sure, but I think by default the q procs will write to
>>>>>> syslog.
>>>>> To post to this group, send email to reddi...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to reddit-dev+...@**
>>>>> googlegroups.**com.
>>> To post to this group, send email to reddi...@googlegroups.com.
>>> To unsubscribe from this group, send email to reddit-dev+...@**
>>> googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/** >>> group/reddit-dev?hl=en <http://groups.google.com/group/reddit-dev?hl=en>
>>> .
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
On Thursday, November 8, 2012 12:45:06 PM UTC-5, Keith Mitchell wrote:
> I may have an inkling, as I think about it.
> Try modifying reddit-consumer-cloudsearch_q.conf and make the following > change to the end of the wrap-job line:
> change > 'run_changed()' > into > 'run_changed(min_size=0)'
> Then restart the q proc.
> Cloudsearch's document upload handling is most efficient when working on > batches of documents, so we wait for a sufficiently large number of items > to queue up before processing and sending. Presumably, you're not > generating enough new/changed submissions to hit the default minimum size > of 500, so the queue proc is just waiting and waiting.
> On Wed, Nov 7, 2012 at 7:38 PM, tr <ril...@gmail.com <javascript:>> wrote:
>> Attached is the syslog output after submitting a link. I don't see any >> mention of cloudsearch_q in there, but other q proc's such as scraper, >> vote_link, etc.
>> Thanks
>> On Wednesday, November 7, 2012 8:14:52 PM UTC-5, Ricky Ramirez wrote:
>>> The output is sent to syslog via wrap-job. By default this should go to >>> /var/log/syslog. The default log facility is cron, so it might also be in >>> /var/log/cron.log. If that's not the case for you then you need to consult >>> your syslog documentation.
>>> Ricky
>>> On Wed, Nov 7, 2012 at 4:28 PM, tr <ril...@gmail.com> wrote:
>>>> initctl list shows:
>>>> reddit-consumer-cloudsearch_q (1) start/running, process 28212
>>>> So it appears to be running as it should. I'm not sure what to even >>>> look for as far as logs go. A find for *log* doesn't really come up with >>>> anything significant, and I don't see anything significant in syslog either.
>>>> On Wednesday, November 7, 2012 2:30:30 PM UTC-5, Keith Mitchell wrote:
>>>>> Yup, sounds like the q_proc. If you're certain it's running ("sudo >>>>> initctl list | grep reddit" should help with that, I think) then the log is >>>>> the next important bit.
>>>>> On Tue, Nov 6, 2012 at 12:38 PM, tr <ril...@gmail.com> wrote:
>>>>>> Thanks - running manually successfully processed, and I now see >>>>>> documents in Cloudsearch. Searching are returning successfully.
>>>>>> So it seems the issue is with the q proc? I'll poke around syslog >>>>>> and see if I see anything, otherwise if you have any suggestions let me >>>>>> know.
>>>>>> Thanks for the help thus far.
>>>>>> On Tuesday, November 6, 2012 1:27:42 PM UTC-5, Keith Mitchell wrote:
>>>>>>> You can get a python shell for running reddit code in the proper >>>>>>> context by cd'ing to {reddit}/r2, then running "paster shell >>>>>>> your_ini_file.ini".
>>>>>>> From there, you can do:
>>>>>>> import r2.lib.cloudsearch as cs >>>>>>> cs.rebuild_link_index()
>>>>>>> (And of course, you can import anything else from the reddit code >>>>>>> base, inspect objects, load things from the database, etc.)
>>>>>>> I'm not 100% sure, but I think by default the q procs will write to >>>>>>> syslog.
>>>>>> To post to this group, send email to reddi...@googlegroups.com. >>>>>> To unsubscribe from this group, send email to reddit-dev+...@** >>>>>> googlegroups.**com.
>>>> To post to this group, send email to reddi...@googlegroups.com. >>>> To unsubscribe from this group, send email to reddit-dev+...@** >>>> googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/** >>>> group/reddit-dev?hl=en<http://groups.google.com/group/reddit-dev?hl=en> >>>> .
>> To post to this group, send email to reddi...@googlegroups.com<javascript:> >> . >> To unsubscribe from this group, send email to >> reddit-dev+...@googlegroups.com <javascript:>. >> For more options, visit this group at >> http://groups.google.com/group/reddit-dev?hl=en.