I have been using reddit to build a site for awhile, and the tables has
became very large. one of the relation table just got about 50 millions of
records, the latest id is "55453433", too many slow queries and the system
is not stable.
I guess reddit is using londiste for replication, and it seems currently
reddit only segregated things and relations to different db.
maybe the single table of reddit has already reached billions of records,
does postgresql still works fine?
If you look at the example.ini file, you'll see lines for main_db,
comment_db, comment2_db, etc.
By default (for small installations / developer workspaces) they all point
to the same database. You can modify them to point at different databases.
The settings in that area of the .ini file describe a bit about what ends
up where.
On Mon, Sep 17, 2012 at 10:56 PM, Yan Chunlu <springri...@gmail.com> wrote:
> I have been using reddit to build a site for awhile, and the tables has
> became very large. one of the relation table just got about 50 millions of
> records, the latest id is "55453433", too many slow queries and the system
> is not stable.
> I guess reddit is using londiste for replication, and it seems currently
> reddit only segregated things and relations to different db.
> maybe the single table of reddit has already reached billions of records,
> does postgresql still works fine?
> --
> You received this message because you are subscribed to the Google Groups
> "reddit-dev" group.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
> I have been using reddit to build a site for awhile, and the tables has became very large. one of the relation table just got about 50 millions of records, the latest id is "55453433", too many slow queries and the system is not stable.
I wouldn't expect 50 million records to be in the performance-breaking range for postgres, even with reddit's schema. What kinds of queries are you doing that are slow?
> I guess reddit is using londiste for replication, and it seems currently reddit only segregated things and relations to different db.
I really doubt that you're at a point that you need more than one DB machine, I'd try to fix the one you have
> maybe the single table of reddit has already reached billions of records, does postgresql still works fine?
Despite what you've read on the internet, reddit hasn't had just a single table for many years
yeah, I am aware of that there could be many db engines, and db_manager
could balance the load on those engines. Which could split read and write
on tables.
but I did not found any code related to db partitioning, such as consistent
hashing on ids or something like instagram is doing:
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids...
@david, thanks for the tip about the data size and performance. and those
slow queries are probably my bad, I will try to debug and fix it.
just curious about the future plan, that when do I need to split things and
relations to different machine, and when should I split the single table,
do sharding, etc.
thanks!
On Wed, Sep 19, 2012 at 12:30 AM, Keith Mitchell <kemit...@reddit.com>wrote:
> If you look at the example.ini file, you'll see lines for main_db,
> comment_db, comment2_db, etc.
> By default (for small installations / developer workspaces) they all point
> to the same database. You can modify them to point at different databases.
> The settings in that area of the .ini file describe a bit about what ends
> up where.
> On Mon, Sep 17, 2012 at 10:56 PM, Yan Chunlu <springri...@gmail.com>wrote:
>> I have been using reddit to build a site for awhile, and the tables has
>> became very large. one of the relation table just got about 50 millions of
>> records, the latest id is "55453433", too many slow queries and the system
>> is not stable.
>> I guess reddit is using londiste for replication, and it seems currently
>> reddit only segregated things and relations to different db.
>> maybe the single table of reddit has already reached billions of records,
>> does postgresql still works fine?
>> --
>> You received this message because you are subscribed to the Google Groups
>> "reddit-dev" group.
>> To post to this group, send email to reddit-dev@googlegroups.com.
>> To unsubscribe from this group, send email to
>> reddit-dev+unsubscribe@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/reddit-dev?hl=en.
> --
> You received this message because you are subscribed to the Google Groups
> "reddit-dev" group.
> To post to this group, send email to reddit-dev@googlegroups.com.
> To unsubscribe from this group, send email to
> reddit-dev+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/reddit-dev?hl=en.
> yeah, I am aware of that there could be many db engines, and db_manager could balance the load on those engines. Which could split read and write on tables.
> but I did not found any code related to db partitioning, such as consistent hashing on ids or something like instagram is doing:
> http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids...
Not in postgres, but the datatypes in Cassandra do this automatically