Realistic limit to # of indexed records?

11 views
Skip to first unread message

jer...@shopittome.com

unread,
Feb 5, 2021, 3:28:17 PM2/5/21
to Thinking Sphinx
We've been using Sphinx and ThinkingSphinx and Shphinx since 2008 and it's always been amazing.

We generally have ~1m records indexed across a variety of fields and attributes using it to filter users selections. It's super fast and works great.

We also are using it in a larger scale and the records are now around 10m and growing. Indexing itself is time consuming on this one. We're also using a delta index and ts:merge but do need to re-index for updated caches once each day.

Do we see an upper end to the number of records we can index?

Thanks! Jeremy

Pat Allan

unread,
Feb 9, 2021, 12:31:16 AM2/9/21
to 'Jeremy Meigs' via Thinking Sphinx
Hi Jeremy,

It’s great to hear that TS and Sphinx are still working well for you :)

In terms of expanding its scale in a reliable sense, I’ve a couple of thoughts:

  • You may want to consider real-time indices instead of SQL-backed indices. This removes the need for deltas and merging, and thus for full reindex calls. That said, this only works if all updates/inserts are done in ways that invoke ActiveRecord callbacks, as that’s how the real-time updates happen.
  • Sharding your larger models may also be an option? Especially if there’s clear boundaries between what’s being updated - the more static records can be in certain shards that don’t require the full reindex, whereas the more frequent changes are kept to other indices that get reprocessed daily. This would keep the reprocessing time down.

These two approaches could be used together, too - sharded real-time indices.

As to whether there’s an upper limit of records - not that I’m aware of, but I’m not the best person to ask. It may be worth asking on Sphinx’s own forum, and/or through the Manticore team’s channels as well (given they’re a fork of Sphinx that seems to be getting far more frequent updates - and it works as a drop-in replacement for Sphinx, so Thinking Sphinx doesn’t complain at all).

Hope this helps!

— 
Pat

--
You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thinking-sphi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/thinking-sphinx/40b6f71c-bfd4-4788-a8c7-9a9d35ad7952n%40googlegroups.com.

jer...@shopittome.com

unread,
Feb 9, 2021, 4:56:10 PM2/9/21
to Thinking Sphinx
Awesome thanks Pat,

Thanks for the suggestions. I'm using real-time indices for some models... I did find that initial indexing with real-time for 10m records takes a very very long time ;-)

Sharding may be our best approach. 

Thanks again! Jeremy
Reply all
Reply to author
Forward
0 new messages