Thinking Sphinx 3.1.0 - Remote Server Setup

751 views
Skip to first unread message

Joram Okwaro

unread,
Mar 18, 2014, 2:06:34 AM3/18/14
to thinkin...@googlegroups.com
Hi Guys,

I've already replied to a thread that I hope Pat can reply to about this but I thought just in case he's too busy, someone here can help me out in the meantime. I'm having trouble finding concise documentation on how to set up a remote Sphinx server. My main questions are related to Sphinx 3.1.0 and maybe the answer might be that I need to go back to version 2 to set this up painlessly.

1. From my understanding, I need to set up Sphinx on my remote server and also a copy of my Rails application (with Thinking Sphinx of course) in order to index my models. Is this still the case?

2. If point #1 is the case, how does the indexer index my database which lives on the app server. Unless I have to set up a database on the search server too which doesn't make sense. I'm pretty lost as you can see :) So please help. How would this generally work? That's the big question.

Pat Allan

unread,
Mar 18, 2014, 2:18:15 AM3/18/14
to thinkin...@googlegroups.com
Didn’t quite cover this in the other thread.

On 18 Mar 2014, at 5:06 pm, Joram Okwaro <joram...@gmail.com> wrote:

> Hi Guys,
>
> I've already replied to a thread that I hope Pat can reply to about this but I thought just in case he's too busy, someone here can help me out in the meantime. I'm having trouble finding concise documentation on how to set up a remote Sphinx server. My main questions are related to Sphinx 3.1.0 and maybe the answer might be that I need to go back to version 2 to set this up painlessly.
>
> 1. From my understanding, I need to set up Sphinx on my remote server and also a copy of my Rails application (with Thinking Sphinx of course) in order to index my models. Is this still the case?

Yup.

> 2. If point #1 is the case, how does the indexer index my database which lives on the app server. Unless I have to set up a database on the search server too which doesn't make sense. I'm pretty lost as you can see :) So please help. How would this generally work? That's the big question.

You’ll need to have your database accessible remotely - and have the appropriate details in config/database.yml.

If you’re going to the effort of having Sphinx on its own server, do you have the database on its own server too? Perhaps it’s worth discussing why you want to have Sphinx on its own server?

Cheers


Pat

Joram Okwaro

unread,
Mar 18, 2014, 2:57:07 AM3/18/14
to thinkin...@googlegroups.com
Hi Pat,

Thanks for the quick responses on both threads. I took on this task from a colleague of mine so I don't know yet how much research he did on Sphinx performance. That was the main reason why we opted for a remote server. The idea was that Sphinx was too resource-heavy and therefore a risk for our app server which we can't afford to be slow. I would appreciate your 2 cents on this. Otherwise, thank you once again for your help. You have helped a great deal. I'll let you know if I encounter any specific issues.

Thanks!

Pat Allan

unread,
Mar 18, 2014, 8:31:01 AM3/18/14
to thinkin...@googlegroups.com
It really depends on how many records (and how much data per record) you’re indexing… Sphinx is generally pretty well-behaved, but I guess it depends on how limited the resources are on your app server. Whenever indexing happens, it will mean there’s plenty of traffic between the indexer and your database, so having them share a machine is not a bad idea (instead of adding extra external network traffic).

-- 
You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
To post to this group, send email to thinkin...@googlegroups.com.
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Joram Okwaro

unread,
Mar 31, 2014, 9:47:11 AM3/31/14
to thinkin...@googlegroups.com
Hi Pat,

So we still opted for a remote sphinx server. I was able to set up the server and connect to the production database remotely. I can therefor index the production db and thus generate the indices in the sphinx server. Thanks for the help once again. I am now faced with another 'big picture' issue.

1. So now that my app server will be sending search queries to searchd on the sphinx server, I'm guessing I need to open up the port on which Sphinx runs on the sphinx server? There's a mysqld41 setting that I have set to the mysql port on my sphinx server. This I'm assuming is the port that I need to open to get to searchd? Is this all that's needed in this server as far as configuration is concerned?

Maybe my big picture looking at it from the app server is all wrong :)
To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphinx+unsub...@googlegroups.com.

Pat Allan

unread,
Mar 31, 2014, 7:08:34 PM3/31/14
to thinkin...@googlegroups.com
Hi Joram

Yes, the mysql41 port is how Sphinx can be connected to. You’ll need to make sure that’s set and opened up to the world, and also set the address of the Sphinx server (so it binds to that address instead of 127.0.0.1, which is the default).

There was a bug with the address setting discovered recently - it’s in the Riddle gem, but you can get the latest by using the following in your Gemfile:

gem ‘riddle', '~> 1.5.10',
:git => 'git://github.com/pat/riddle.git',
:branch => 'develop’,
:ref => ‘0dfe38063c’

Cheers


Pat
>> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
>> To post to this group, send email to thinkin...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.

Joram Walekhwa Okwaro

unread,
Apr 1, 2014, 10:37:41 AM4/1/14
to thinkin...@googlegroups.com
Hi Pat,

Thanks for the reply. You mentioned this: "...and also set the address of the Sphinx server (so it binds to that address instead of 127.0.0.1, which is the default).". You mean the app server right? The app server is connecting to a remote instance of searchd through MySQL so the address of the app server is set in MySQL configs right?

Secondly, I have opened up MySQL to accept remote connections from my app server, but even before I tried it, I was wondering how Thinking Sphinx on my app server will be able to connect to MySQL without the details of the user I created and granted db privileges let alone the password. Is there an option to declare this in thinking_sphinx.yml? Unless I've got this part all wrong. This indeed ended up being a problem since I still don't have access. I get  thinkingSphinx:SphinxError with the message "Access denied for user 'ubuntu'@'ip-77-77-77-77.eu-west-1.compute.internal' (using password: NO)"

Would appreciate some help. Thanks.

Joram.





On Tue, Apr 1, 2014 at 2:08 AM, Pat Allan <p...@freelancing-gods.com> wrote:
Hi Joram

Yes, the mysql41 port is how Sphinx can be connected to. You'll need to make sure that's set and opened up to the world, and also set the address of the Sphinx server (so it binds to that address instead of 127.0.0.1, which is the default).

There was a bug with the address setting discovered recently - it's in the Riddle gem, but you can get the latest by using the following in your Gemfile:

  gem 'riddle', '~> 1.5.10',
    :git    => 'git://github.com/pat/riddle.git',
    :branch => 'develop',
    :ref    => '0dfe38063c'

Cheers

--

Pat

On 1 Apr 2014, at 12:47 am, Joram Okwaro <joram...@gmail.com> wrote:

> Hi Pat,
>
> So we still opted for a remote sphinx server. I was able to set up the server and connect to the production database remotely. I can therefor index the production db and thus generate the indices in the sphinx server. Thanks for the help once again. I am now faced with another 'big picture' issue.
>
> 1. So now that my app server will be sending search queries to searchd on the sphinx server, I'm guessing I need to open up the port on which Sphinx runs on the sphinx server? There's a mysqld41 setting that I have set to the mysql port on my sphinx server. This I'm assuming is the port that I need to open to get to searchd? Is this all that's needed in this server as far as configuration is concerned?
>
> Maybe my big picture looking at it from the app server is all wrong :)
>
> On Tuesday, March 18, 2014 3:31:01 PM UTC+3, Pat Allan wrote:
> It really depends on how many records (and how much data per record) you're indexing... Sphinx is generally pretty well-behaved, but I guess it depends on how limited the resources are on your app server. Whenever indexing happens, it will mean there's plenty of traffic between the indexer and your database, so having them share a machine is not a bad idea (instead of adding extra external network traffic).

>
> On 18 Mar 2014, at 5:57 pm, Joram Okwaro <joram...@gmail.com> wrote:
>
>> Hi Pat,
>>
>> Thanks for the quick responses on both threads. I took on this task from a colleague of mine so I don't know yet how much research he did on Sphinx performance. That was the main reason why we opted for a remote server. The idea was that Sphinx was too resource-heavy and therefore a risk for our app server which we can't afford to be slow. I would appreciate your 2 cents on this. Otherwise, thank you once again for your help. You have helped a great deal. I'll let you know if I encounter any specific issues.
>>
>> Thanks!
>>
>> On Tuesday, March 18, 2014 9:18:15 AM UTC+3, Pat Allan wrote:
>> Didn't quite cover this in the other thread.
>>
>> On 18 Mar 2014, at 5:06 pm, Joram Okwaro <joram...@gmail.com> wrote:
>>
>> > Hi Guys,
>> >
>> > I've already replied to a thread that I hope Pat can reply to about this but I thought just in case he's too busy, someone here can help me out in the meantime. I'm having trouble finding concise documentation on how to set up a remote Sphinx server. My main questions are related to Sphinx 3.1.0 and maybe the answer might be that I need to go back to version 2 to set this up painlessly.
>> >
>> > 1. From my understanding, I need to set up Sphinx on my remote server and also a copy of my Rails application (with Thinking Sphinx of course) in order to index my models. Is this still the case?
>>
>> Yup.
>>
>> > 2. If point #1 is the case, how does the indexer index my database which lives on the app server. Unless I have to set up a database on the search server too which doesn't make sense. I'm pretty lost as you can see :) So please help. How would this generally work? That's the big question.
>>
>> You'll need to have your database accessible remotely - and have the appropriate details in config/database.yml.
>>
>> If you're going to the effort of having Sphinx on its own server, do you have the database on its own server too? Perhaps it's worth discussing why you want to have Sphinx on its own server?
>>
>> Cheers
>>
>> --

>> Pat
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
>> To post to this group, send email to thinkin...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
> To post to this group, send email to thinkin...@googlegroups.com.
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Thinking Sphinx" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thinking-sphinx/vh51ahsbDXA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to thinking-sphi...@googlegroups.com.

Pat Allan

unread,
Apr 1, 2014, 7:35:38 PM4/1/14
to thinkin...@googlegroups.com
Hi Joram

Apologies if this is covering stuff you already knew:

* In config/thinking_sphinx.yml for your production environment you’ll want to set mysql41 to the port you prefer (or don’t set it, and Sphinx will run on port 9306), and you’ll need to set address to the server Sphinx is running on.

* In config/database.yml for your production environment you’ll want to include your database connection settings.

Both of these files should be present on each server your app code is on - they’re required on your app server so it knows how to talk to both the database and Sphinx, and they’re required on your Sphinx server so Sphinx can bind itself to the appropriate address, and it can talk to the database using the correct credentials when indexing.

Hope this helps - do let me know if you’ve got further questions.

Cheers

— 
Pat

Joram Walekhwa Okwaro

unread,
Apr 2, 2014, 3:29:09 PM4/2/14
to thinkin...@googlegroups.com
Hi Pat,

There seems to be a disconnect somewhere in the way I have understood how the setup works. I have nailed down Indexing and Searching is the issue now. Apologies for any repetition on my side too but I think this will help you guide me better. I will explain what I've done so far and ask questions along the way.

Sphinx/Indexing server

I've set up the 'dumb' app on this server. The thinking_sphinx.yml file looks like this:

development:
  pid_file: "/var/run/sphinx/searchd.pid"
  indices_location: "/home/shared/db/sphinx"
  configuration_file: "/home/shared/development.sphinx.conf"
  sql_sock: /var/run/mysqld/mysqld.sock
  searchd_log: "/home/log/production.searchd.log"
  query_log: "/home/log/production.query.log"
  #mem_limit: 128M
  morphology: stem_en
  min_infix_len: 3
  enable_star: true
production:
  #pid_file: "/var/run/sphinx/searchd.pid"
  indices_location: "/home/ubuntu/projects/shared/db/sphinx"
  configuration_file: "/home/ubuntu/projects/shared/production.sphinx.conf"
  #sql_sock: /var/run/mysqld/mysqld.sock
  searchd_log: "/home/ubuntu/projects/kopo-kopo/log/production.searchd.log"
  query_log: "/home/ubuntu/projects/kopo-kopo/log/production.query.log"
  #mem_limit: 128M
  morphology: stem_en
  min_infix_len: 3
  enable_star: true

The database.yml file is set up to connect to the production rails server and hence index the production db:

production:
  adapter: postgresql
  encoding: unicode
  database: app_production
  pool: 5
  username: joram
  password: password
  host: 11.11.11.111
  port: 5432

I've installed Sphinx and MySQL on this server and run the rebuild rake task which has created my conf files successfully. I've also opened up port 5432 (Postgres port) on my production server to accept remote connections from my Sphinx server and this works well i.e Indexing is working like a charm.

Production server

On my production Rails server, I have put my Sphinx server's address in the thinking_sphinx.yml file as shown below:

development:
  morphology: stem_en
  min_infix_len: 3
  enable_star: true
production:
  address: 22.22.22.222

I'm assuming this means that Thinking Sphinx will connect to the default port 9306 on my Sphinx server (22.22.22.222). My confusion comes in here. Is the mysql41 port on my production server's configs (9306 by default in this case) supposed to be the MySQL port on my Sphinx server or the searchd port? Ive done a netstat on my Sphinx server and MySQL is running port 3306 (default I think) and searchd obviously on 9306.

I had earlier put mysql41 in the production thinking_sphinx.yml file above as the mysql port on my sphinx server and that is why I was getting a connection error since I was trying to connect to MySQL with no password.

So that's the BIG question of the day Pat. What am I missing in order to connect my production server to the Sphinx server so that it can use those generated indices to search.. You're doing a great job being patient with us noobs trying to figure things out. I hope I don't cause you patience to run out :)

Thanks,
Joram.

Pat Allan

unread,
Apr 2, 2014, 5:51:43 PM4/2/14
to thinkin...@googlegroups.com
Hi Joram

Appreciate all the detail - it looks like you’re on track, I think. The ‘address’ setting should indeed be the IP address of the Sphinx server (and you’ll want that in your production settings of thinking_sphinx.yml on both machines).

As for the port - via the mysql41 setting - this is for Sphinx. MySQL itself is not used at all (as your database is PostgreSQL), it’s just that Sphinx communicates as if it were a MySQL server (it uses the MySQL server protocol), hence the name of this setting, and it’s also why the mysql2 gem is required. You don’t need a MySQL database or a MySQL server (so, you should probably close port 3306).

Cheers

— 
Pat

Joram Okwaro

unread,
Apr 14, 2014, 11:12:17 AM4/14/14
to thinkin...@googlegroups.com
Hi Pat,

Got it working. The only thing I did differently was add the listen setting to the thinking_sphinx.yml file of the dummy app in the sphinx/search server. I had trouble connecting to my search server whereby I kept on getting a ThinkingSphinx::ConnectionError with the message Can't connect to MySQL server on '22.22.22.222' (111). This took me a while to figure out but I finally did. Though according to my rules on Amazon EC2 I should have been able to connect to port 9306, I still couldn't. At first, the listen setting wasn't there and by default it resulted in creating the listen setting in the resultant production.sphinx.conf(my Sphinx configuration file) as 0.0.0.0:9306:mysql41. After scouring the interwebs for other people's conf files I realized that in order for searchd to run on port 9306, the conf file had to have 9306:mysql41 instead (http://stackoverflow.com/questions/11374558/sphinx-search-mysql-client-on-production-server). I hardcoded this in my thinking_sphinx.yml file as shown above and it did the trick.

Thanks for the constant help sir. I wouldn't have set this up without your help. One last question.. is there any way for me to estimate the size of my indices before I generate them? I have a massive table that is throwing an error at some point because of what I'm assuming is low disk space:

 ** [out :: 54.247.96.253] ERROR: index 'external_financial_event_core': raw_hits: write error: 1001335 of 1048451 bytes written.

Cheers,
Joram.

Pat Allan

unread,
Apr 15, 2014, 8:00:19 PM4/15/14
to thinkin...@googlegroups.com
Interesting points about the listen setting… but great to know it’s now working for you.

As for size of index data… it’s tricky. Attributes are relatively easy - integers, booleans, timestamps are bytes, but string attributes vary in size, and then fields get even more complicated again, depending on infix/prefix settings and morphology settings. There’s a few tips here that may help clarify things:

— 
Pat

Joram Okwaro

unread,
Apr 29, 2014, 1:25:39 PM4/29/14
to thinkin...@googlegroups.com
Thanks Pat. For everyone else that may be faced with the same task, here's a blog post:

http://joramokwaro.com/setting-up-thinking-sphinx-3-1-0-in-a-remote-server/

Cheers,
Joram.
Reply all
Reply to author
Forward
0 new messages