I have first heard of Bloom filters when I started reading about Exadata.
Bloom filtering is a probabilistic algorithm which for the given set can
very quickly decide whether an item at hand is a member or not. The
algorithm can produce false postives, but not false negatives.
http://en.wikipedia.org/wiki/Bloom_filter
The main use in the database, or so I thought, was for Exadata to search
the blocks requested by the DB query. However, a recent discussion on the
Metalink community forum has produced the following:
**************************************************************************
The Oracle database makes use of Bloom filters in the following 4
situations:
- To reduce data communication between slave processes in parallel joins:
mostly in RAC
- To implement join-filter pruning: in partition pruning, the optimizer
analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded
partitions when building the partition access list
- To support result caches: when you run a query, Oracle will first see
if the results of that query have already been computed and cached by
some session or user, and if so, it will retrieve the answer from the
server result cache instead of gathering all of the database blocks
- To filter members in different cells in Exadata: Exadata performs joins
between large tables and small lookup tables, a very common scenario for
data warehouses with star schemas. This is implemented using Bloom
filters as to determine whether a row is a member of the desired result
set.
You can identify a bloom pruning in a plan when you see :BF0000 in the
Pstart and Pstop columns of the execution plan and PART JOIN FILTER
CREATE in the operations column:
Two hints can be used: px_join_filter and no_px_join_filter; and there
are a couple of views for monitoring Bloom filters:
- v$sql_join_filter: information about Bloom filters; number of rows
filtered out and probed by Bloom filters
- v$pq_tqstat: check reduced communication due to usage of Bloom filters
The Oracle database makes use of Bloom filters in the following 4
situations:
- To reduce data communication between slave processes in parallel joins:
mostly in RAC
- To implement join-filter pruning: in partition pruning, the optimizer
analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded
partitions when building the partition access list
- To support result caches: when you run a query, Oracle will first see
if the results of that query have already been computed and cached by
some session or user, and if so, it will retrieve the answer from the
server result cache instead of gathering all of the database blocks
- To filter members in different cells in Exadata: Exadata performs joins
between large tables and small lookup tables, a very common scenario for
data warehouses with star schemas. This is implemented using Bloom
filters as to determine whether a row is a member of the desired result
set.
You can identify a bloom pruning in a plan when you see :BF0000 in the
Pstart and Pstop columns of the execution plan and PART JOIN FILTER
CREATE in the operations column:
Two hints can be used: px_join_filter and no_px_join_filter; and there
are a couple of views for monitoring Bloom filters:
- v$sql_join_filter: information about Bloom filters; number of rows
filtered out and probed by Bloom filters
- v$pq_tqstat: check reduced communication due to usage of Bloom filters
**************************************************************************
The author is an Oracle employee, so I searched the Metalink. The search
has produced the document 912330.1 which, while less detailed than the
post above still confirms it. Of course, as with all new features, there
are "features":
Bug 12637294: BLOOM PARTITION PRUNING MAY CAUSE DEADLOCK ORA-60
Funny thing about this particular "feature" is that it is marked as fixed
in release 12.1.
There are also some good blog articles, like this one:
http://perfexpert2.wordpress.com/2011/05/21/undocumented-parameter/
--
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com