patch to add BRIN support for spherical objects

23 views
Skip to first unread message

Giuseppe Broccolo

unread,
Dec 31, 2017, 10:27:02 AM12/31/17
to pgsp...@googlegroups.com
Hi all,

I implemented BRIN indexing support for some pgsphere datatypes (actually just for spherical points and coordinates ranges ATM), similarly to what I've done for postgis datatypes the last year. BRINs could be useful for big data context as large stellar maps, etc.

I'm attaching the patch to the present mail (though, I don't know if a direct PR to the github repo is better to submit the patch). I've also added some documentation about it, with some considerations with achievable performances compared with the GiST ones (ATM the only access method supported by the extension).

Considerations are welcome :)

All the best,
Giuseppe.
add_brin_support.patch

Oleg Bartunov

unread,
Dec 31, 2017, 3:42:23 PM12/31/17
to Giuseppe Broccolo, pgsp...@googlegroups.com
Guiseppe, thanks for the patch ! Did you run any benchmarks and
comparison with GiST, Q3C ?

>
> All the best,
> Giuseppe.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pgSphere users and developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pgsphere+u...@googlegroups.com.
> To post to this group, send email to pgsp...@googlegroups.com.
> Visit this group at https://groups.google.com/group/pgsphere.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/pgsphere/CAFtuf8BhcgPJVknvmvHVhDM2T%2BLTnwFJeGY1nV%2B-8JvGYUutQw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Giuseppe Broccolo

unread,
Jan 1, 2018, 4:12:28 PM1/1/18
to obar...@gmail.com, pgsp...@googlegroups.com
Hi Oleg,

2017-12-31 21:42 GMT+01:00 Oleg Bartunov <obar...@gmail.com>:

Guiseppe, thanks for the patch ! Did you run any benchmarks and
comparison with GiST, Q3C ?

I've considered an ad-hoc sample of 10M spherical points inserted in a DB following
a spatial sorting criteria, i.e. points sequentially generated along a straight line
displaced in all the coordinates space lat -90°,+90° lon 0°,+180°. This is a very ideal
case of data distribution where BRIN indexing potential can be fully exploited.

An important note is how indexed data is stored: like GiST, BRIN indexing
algorithm is based reducing the information of the spherical spatial object
to their relative 3D bounding boxes in a cartesian coordinates system.

I've than proceeded querying points included in a spherical box with a size able to
select just one single point. The search has been made comparing cases where
BRINs, GiST or no indexes have been used. I didn't know Q3C (a Quad tree
implementation in 3D if I correctly understood) so I didn't consider this latter case.
BRIN has been created considering a relatively high granularity of the index (i.e.
summarization of no more than 16 data pages per block range).

I've then compared:

* index sizes (important considering big data contexts on not-distributed systems).
* execution times for indexes creation
* execution times of queries - the most important thing maybe

About indexes size: BRIN ~ 300kB, GiST ~ 900MB (table size ~ 450MB).

Time needed for creation: BRIN ~ 2.7s, GiST ~ 160s.

Query's execution time: BRIN ~ 6ms, GiST ~ 0.5ms, SeqScan ~ 900ms.

Obviously, I expect that BRIN performance decreases when data is not physically
sorted on disk following some spatial criteria. But in case where data size
is larger than the amount of available memory resource, BRIN are more easily
used. Though I didn't tried with Q3C indexes, I expect similar behavior like GiST.

Giuseppe.

Oleg Bartunov

unread,
Jan 1, 2018, 4:18:00 PM1/1/18
to Giuseppe Broccolo, pgsp...@googlegroups.com
I share the never-finished paper with you.

Giuseppe Broccolo

unread,
Jan 2, 2018, 11:21:07 AM1/2/18
to Oleg Bartunov, pgsp...@googlegroups.com
Hi Oleg,

2018-01-01 22:17 GMT+01:00 Oleg Bartunov <obar...@gmail.com>:
I share the never-finished paper with you.

Thank you, I'll be happy to add some info about BRIN performance handling spherical objects.

Let me know if I have then to proceed with an "official" PR on github - not sure if here works like
in PostGIS, or in PostgreSQL projects.

Regards,
Giuseppe.

Oleg Bartunov

unread,
Jan 2, 2018, 11:30:21 AM1/2/18
to Giuseppe Broccolo, pgsp...@googlegroups.com
What do you mean "PR" ?

Regards,
Giuseppe.

Giuseppe Broccolo

unread,
Jan 2, 2018, 11:34:17 AM1/2/18
to Oleg Bartunov, pgsp...@googlegroups.com
2018-01-02 17:30 GMT+01:00 Oleg Bartunov <obar...@gmail.com>:

What do you mean "PR" ?

I mean opening a pull request (PR) on github: https://github.com/pgsphere/pgsphere
(sorry if I wasn't clear before).

Giuseppe.

Oleg Bartunov

unread,
Jan 2, 2018, 11:40:08 AM1/2/18
to Giuseppe Broccolo, pgsp...@googlegroups.com
Ahah, please do. We maintain this repository

Oleg Bartunov

unread,
Jan 2, 2018, 11:43:34 AM1/2/18
to Giuseppe Broccolo, Alexander Korotkov, pgsp...@googlegroups.com
Also, I'd like to see the number of read blocks from disk for various selects. Explain (analyze, buffers)

Giuseppe Broccolo

unread,
Jan 7, 2018, 6:25:37 PM1/7/18
to Oleg Bartunov, Alexander Korotkov, pgsp...@googlegroups.com
Hi Oleg,

2018-01-02 17:43 GMT+01:00 Oleg Bartunov <obar...@gmail.com>:
Also, I'd like to see the number of read blocks from disk for various selects. Explain (analyze, buffers)

I've obtained some results querying a 10 billions spoint sample in a 2GB RAM machine, considering the
worst case each time, i.e. a shared buffer cache completely cold. As you now, BRIN scan is based on
bitmap index/heap scan. Considering different selects, with a larger number of selected spoint each time:

# of selected point    |    # of read index buffers   |   # of read heap buffers
            1                   |                   29                    |               77
         100                  |                   29                    |              132
    100000                 |                   29                    |            4576
  1000000                 |                   29                    |            7411

As you can see, the number of read buffers during heap scan increase is restrained as the number of
selected objects increases.

Giuseppe.

Oleg Bartunov

unread,
Jan 11, 2018, 8:06:05 AM1/11/18
to Giuseppe Broccolo, Alexander Korotkov, pgsp...@googlegroups.com
Guiseppe, we could provide you machine and catalog, which we used in
our benchmarks. It's be nice if you could repeat tests with your
opclass to have numbers to compare. Alexander will review your PR.

>
> Giuseppe.

Giuseppe Broccolo

unread,
Jan 11, 2018, 8:11:48 AM1/11/18
to Oleg Bartunov, Alexander Korotkov, pgsp...@googlegroups.com
Hi Oleg,

2018-01-11 14:05 GMT+01:00 Oleg Bartunov <obar...@gmail.com>:
Guiseppe, we could provide you machine and catalog, which we used in
our benchmarks. It's be nice if you could repeat tests with your
opclass to have numbers to compare. Alexander will review your PR.

Ok. Actually I'm waiting before to submit the pull request in github because
the patch currently doesn't support searches between spherical circles, but
just within spherical ranges (sbox). Do you think I should add before the support
for circles too?

Giuseppe.

Oleg Bartunov

unread,
Jan 11, 2018, 8:23:12 AM1/11/18
to Giuseppe Broccolo, Alexander Korotkov, pgsp...@googlegroups.com
Circles are needed of course, you may add it later.

>
> Giuseppe.

Giuseppe Broccolo

unread,
Jan 11, 2018, 8:25:57 AM1/11/18
to Oleg Bartunov, Alexander Korotkov, pgsp...@googlegroups.com
Hi,

Ok. Alexander, thanks for the review. Do you need I submit the pull request in the GitHub repo?

Giuseppe.
Reply all
Reply to author
Forward
0 new messages