Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [Pgsphere-dev] GIST index concurrency concern

2 views
Skip to first unread message

Oleg Bartunov

unread,
Nov 10, 2004, 12:08:29 AM11/10/04
to
Daniel,

concurrency is a big issue of current implementation of GiST.
But it should don't bite you for READ ops !
-hackers mailing list is a very relevant mailing list for GiST
discussions. It's pity we several times claimed to work on GiST
concurrency and recovery, but never got a chance :)
I see Neil become interested in GiST concurrency, though.


Oleg
On Tue, 9 Nov 2004, Daniel Ceregatti wrote:

> Hi,
>
> It's recently come to my attention that GIST indices suffer from
> concurrency issues. I have already developed a dating sites using GIST
> for use with attributes using the intarray contrib, and for Earth
> distance/radius calculations using pg_sphere.
>
> I'm wondering if I haven't shot myself in the foot here. So far, I
> understand that a GIST index will be locked by a backend for any DML.
> Basically I'm concerned that my database will not scale in the manner
> that I was hoping, because the sites that access the database are to be
> used by many multiple concurrent users, doing some DML.
>
> I expect my site to sustain something around 1000-3000 new user
> acquisitions per day, all of which will account for an insert into 3
> GIST indices. Additionally there will be people that will be updating
> their attributes and locations as well, but this will probably only
> account for a small fraction of the DML. We don't allow people to delete
> stuff.
>
> My concern now is this concurrency issue. My question is: Is there
> anyone out there using a GIST index on a database where there's a lot of
> DML? Should I be concerned with this issue at all?
>
> If so, what can be done to minimize the impact of heavy DML on a GIST
> index? I've pondered rolling all DML into queues via triggers and then
> de-queuing them in one transaction every so often, like 15 minutes, via
> cron. Any other suggestions?
>
> I'm posting to this list because I understand that both Oleg and Teodor
> read it, and I found no other relevant list. If I've misposted, please
> accept my apology and please direct me to the appropriate list.
>
> Thanks,
>
> Daniel
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Oleg Bartunov

unread,
Nov 10, 2004, 5:59:58 AM11/10/04
to
Patrick,

you didn't say us about your setup. Have you proved you've seen
locking issue for reading ? Are you sure you have no any locks in
your code ? Any tests demonstrated your problem would be great.

Oleg
On Tue, 9 Nov 2004, Patrick Clery wrote:

> Oleg,
> Daniel and I have both been collaborating on this structure for a while now.
> We are aware that GiST reads work very fast. But won't they be "paralyzed"
> when there are writes? Both of us are working on dating sites, and the main
> problem that concerns us is a very heavy traffic load. At this point I am
> planning to queue all changes to a GiST index and commit them every 10-15
> minutes. Is that really necessary? It's realistic to assume here that if
> there is a problem with locking the table for writes, it will be a problem in
> this situation because this structure is going to be hit VERY hard (and
> Daniel's situation is on an even larger scale). We hope that we can alleviate
> that with a "transaction queue", but this is not a simple fix. Have you seen
> any projects that were under a heavy load using a GiST index, and were they
> able to avoid being "paralyzed" somehow?
>
> Thanks in advance,
> Patrick
>
> On Tuesday 09 November 2004 22:08, Oleg Bartunov wrote:
>> Oleg Bartunov <ol...@sai.msu.su>

> _______________________________________________
> Pgsphere-dev mailing list
> Pgsphe...@gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/pgsphere-dev

Chris Albertson

unread,
Nov 10, 2004, 1:21:29 PM11/10/04
to

> > > I expect my site to sustain something around 1000-3000 new user
> > > acquisitions per day, all of which will account for an insert
> into 3
> > > GIST indices.

Most people when they talk about a large load on a DBMS system
talk about "transactins per second". As in "100 per second"
Even if we only assume 12 hour days, 3000 per day is only one
transaction every 14 seconds. That's a triveal rate that
could be handled on an older Pentium II PC. Assume the
system runs for five years at 3000/day. That's only only
about 500,000 rows. In database terms that's not much. Don't
worry you have a problem well within the limits of a small PC
runnig PostgreSQL.

You want to of course place the intire process of adding a
new user inside a begin/commit transaction. This will provide
the type of "queue" you want. All of the inserts will get done
when the "commit" happens. Also you will likely want to run the
user interface in its own process or thread. Those two things
will be all you need as long as your average transaction rate
remains so low. If there are ANY locks done in your code you
need to remove them and re-think the design.

Everyone always thinks they have a "large" database project.
Even a 200,000 row table is small enough that it and its index
files can be cached in RAM.

Where you might run into the kinds of problems you are thinking about
is if you had automated sensor systems (looking either down at
the Earth or up at the sky) and software to automatically
extract features and catlog those in to a DBMS. Then if you
have several of those sensors running you get to the high
rates that drive concurrentcy issues. But if you only have four
or five users each doing a transaction per second it's not an
issue. After you get past the 100 transacton per second rates
you are looking at Ocacle on Sun hardware and terrabyte sized
disk arrays Like we have down in the lab here. BUt belleive my
you need automated data collection systems to gemerate enough
data to get you into trouble But I run low-end
stuff on my very old 500Mhz PIII


=====
Chris Albertson
Home: 310-376-1029 chrisalbe...@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher...@aero.org
KG6OMK



__________________________________
Do you Yahoo!?
Check out the new Yahoo! Front Page.
www.yahoo.com

0 new messages