Indexing all schemas in Postgresql

201 views
Skip to first unread message

praveen

unread,
Jan 20, 2011, 3:28:31 AM1/20/11
to Thinking Sphinx
Hi Everyone,

I would like to know , how can i use thinking sphinx/sphinx to index
all the schemas in a postgres database.

I am using Postgresql Database for my Multitenant application, where i
differentiate each tenant by Postgresql shema , which means , I
connect to the particular schema , when the user requests for the
particular tenant.

To explain in detail, I have one rails application which will connect
to different schema in my postgresql database according to the tenant
id requested, to behave as a multitenant application.

Since i have only one application , and i have to use the same
thinking sphinx plugin for indexing, i am not able configure to index
tables from all the schemas i have in my postgresql database.

Please help. Thanks in advance.

Pat Allan

unread,
Jan 20, 2011, 6:53:39 PM1/20/11
to thinkin...@googlegroups.com
Hi Praveen

When you talk about different schemas, do you mean different databases, each one for a different tenant? If so, there's no simple way to do this with Thinking Sphinx. You may need to create your own script to modify the generated configuration file, or run separate searchd instances for each tenant and change the port and/or address depending on the tenant.

Or do you mean something else when you say schema? I'm not sure.

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
>

praveen

unread,
Jan 21, 2011, 2:16:36 AM1/21/11
to Thinking Sphinx
Thanks Pat.

We can create schemas in a postgresql database to group our tables in
that database. By default postgresql has a schema "public" which is
set by default and contains all the tables created in the database.

So , for my multitenant application , i try to create different
schamas in a database for different tenants created. So when a user
requests for a tenant , the application set the schema to the
particular tenant. so the data for the tenant is pulled from the
tables in that particular schema. So when i use sphinx to indes my
database tables, the indexer indexes only the public schema and i have
no options to specify which schema to be indexed or the sphinx itself
does not have the feature to select all the schemas available and
index them , which i can filter in my application to show search
results depending on from which tenant is the the search requested.

Can you tell if there is any option to atleast specify the schema
along with the host, usr,pass,db details in the config file??

Pat Allan

unread,
Jan 22, 2011, 7:02:41 AM1/22/11
to thinkin...@googlegroups.com
Hi Praveen

Can you select a schema via a SQL statement? If so, you could add a line in your sources something like the following:
sql_query_pre = SQL STATEMENT TO CHOOSE A SCHEMA

You will find existing sql_query_pre lines in your sources - you don't need to replace these, Sphinx will accept as many as you put in. Unfortunately, there's no way to automatically set this via Thinking Sphinx at the moment, though.

--
Pat

Pat Allan

unread,
Oct 29, 2012, 8:08:03 AM10/29/12
to thinkin...@googlegroups.com
Hi Tair

You could specify an index per tenant:

Tenant.each do |tenant|
define_index "#{class.name}_#{tenant.id}" do
# ...
end
end

While this provides a index and source per model per tenant, it doesn't take care of the sql_query_pre value. You could manually add that yourself and then use the ts:reindex task (which doesn't overwrite the configuration), but that's not particularly elegant.

Another option - mind you, this isn't elegant either, but could keep things far easier over time - is to override the generate method in ThinkingSphinx::Configuration. Keep the existing implementation, but also traverse through each source in the configuration tree and add the additional sql_query_pre statement in. Here's the current method for reference:
https://github.com/pat/thinking-sphinx/blob/master/lib/thinking_sphinx/configuration.rb#L156

Have a look at the enforce_common_attribute_type method, which could be used as a starting point:
https://github.com/pat/thinking-sphinx/blob/master/lib/thinking_sphinx/configuration.rb#L329

Hope this helps!

--
Pat

On 28/10/2012, at 1:09 AM, Tair Assimov wrote:

> Hi all. I bumped into the same problem when creating multi-tenant Rails application with PostgreSQL schemas. I tried what Pat suggested and added:
>
> sql_query_pre = SET search_path TO 2,public
>
> Where, 2 is the ID of my tenant. However, when I ran rake ts:in, Thinking Sphinx has rewritten configuration file and indexed my default (public) schema instead.
>
> Anyway, I do not think its the correct approach to update the Rails.env.sphinx.conf, since there are many tenants and you need some kind of rotation to index each tenants data. I think the easier is to create a custom script/Rake task/whatever, which will loop through all your tenants and somehow add the schema selection to the configuration file, and then finally use indexer directly:
>
> indexer --all --rotate -c PATH_TO_CONF
>
> However, in this case one tenants data will overwrite others, unless we also set the location of the index, which needs more tweaking on the Rails side to use the correct one.
>
> Pat, what is the best approach to tackle this problem? I think ideally I would want Sphinx to index all my schemas. Or in the worst case to have per-tenant sphinx configuration files with own index store and schema selection. Is there any way Thinking Sphinx can assist? Is there anything similar on the roadmap? Or am I missing something completely?
>
> Thanks all!

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/zIVsCnFwZNQJ.

Tair Assimov

unread,
Dec 6, 2012, 12:17:42 PM12/6/12
to thinkin...@googlegroups.com
Oh, I almost forgot about this discussion and did not have notifications enabled. Pat, thanks for your hints. I will try the second option suggested. If I manage to find an elegant solution, I will submit a pull request. I think this is quite common use case when dealing with postgresql schemas.

Cheers.

Tiago

unread,
Jan 5, 2013, 9:21:13 AM1/5/13
to thinkin...@googlegroups.com
Did you have any success on this?

I'm trying to build a gem to help this process, but I've a doubt.
Is there already a way to define a sql_query_pre per index basis?

I'm trying to create a new 'schema' method in the index definition, so I could use like:

define_index do
  schema "tenantX"
  ...
end

And this 'shema' would append the new schema search path to the sql_query_pre.
This way would be possible to define an index for each tenant/schema.

Pat Allan

unread,
Jan 5, 2013, 8:25:36 PM1/5/13
to thinkin...@googlegroups.com
Hi Tiago

With Thinking Sphinx v3, you can do the following within an index definition:

set_property :sql_query_pre => ['CUSTOM SQL']

Keep in mind it must be an array of strings, even if there's only one string. If you put a loop around the index definition, then you could have this working easily enough without any need for patches.

And if you've not given TS v3 a go yet, make sure you read these first:
https://groups.google.com/d/msg/thinking-sphinx/QM0BlS3gg3k/s61pfCBBTUoJ
https://github.com/pat/thinking-sphinx/blob/edge/README.textile

Cheers

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/mRB-fvyrTzIJ.

Tiago

unread,
Jan 6, 2013, 10:27:02 AM1/6/13
to thinkin...@googlegroups.com
Hi Pat,

I'm updating here to sphinx v3, I didn't know about it!

Using set_property :sql_query_pre, worked, I did a loop wrapping the index definition, and it created all indexes, with the correct sql_query_pre for each one, I named each index as "<CLASS NAME>_<TENANT ID>".
My doubt now is how to query a specific index of a model.

Would this be the case of using the middlewares? I was looking through the source, and it seems that creating one like this could work:

class ThinkingSphinx::Middlewares::SphinxQLWithTenants < ThinkingSphinx::Middlewares::SphinxQL
  def indices
    return super unless Thread.current[:tenant]
    @indices ||= ThinkingSphinx::IndexSet.new(classes, options[:indices]).select { |index| index.name =~ /_#{Thread.current[:tenant].id}$/ }
  end
end

My question is, how do I load them? =p
Or is there another way of doing this, like using scopes?

Thank you!

Pat Allan

unread,
Jan 6, 2013, 6:43:29 PM1/6/13
to thinkin...@googlegroups.com
Hi Tiago

You could take the middleware approach if you really want to, but your reliance on Thread.current makes me a little nervous. I would just add an :indices option to your searches with the appropriate index names. It's worth noting that _core will be appended on the each of each index name (and _delta for delta indices, if you're using them), so they become <CLASS NAME>_<TENANT ID>_core.

If you're always using the same indices logic, then I would wrap that in a re-usable method (or if more complex, class), which has a standard search within.

Cheers

--
Pat

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/AHbV2s5Y-J8J.

Pat Allan

unread,
Jan 6, 2013, 7:13:42 PM1/6/13
to thinkin...@googlegroups.com
Or, you could add a middleware that modifies the indices and classes options of a search, inserted at the top of the stack:
https://gist.github.com/4471233

(Keep in mind that's untested, so may need some tweaking).

Reply all
Reply to author
Forward
0 new messages