Full text lookup?

123 views
Skip to first unread message

csch...@gmail.com

unread,
Sep 20, 2007, 11:40:39 AM9/20/07
to H2 Database
In the dim past, I could have sworn there was mention of using Lucene
or some such custom full text search engine to do full substring
indexing of strings. Am I completely our of my gourd?

Thanx,

Chris

Thomas Mueller

unread,
Sep 20, 2007, 2:17:12 PM9/20/07
to h2-da...@googlegroups.com
Hi,

It is available, but not yet documented. Also I didn't test it since
quite a long time, I hope it still works with the newest release of
Lucene. I will add this documentation in the next release:

Full Text Search
H2 supports Lucene full text search and native full text search implementation.

Using the Native Full Text Search
To initialize, call:

CREATE ALIAS IF NOT EXISTS FT_INIT FOR "org.h2.fulltext.FullText.init";
CALL FT_INIT();

Afterwards, you can create a full text index for a table using:

CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR);
INSERT INTO TEST VALUES(1, 'Hello World');

CALL FT_CREATE_INDEX('PUBLIC', 'TEST', NULL);

where PUBLIC is the schema, TEST is the table name. The list of column
names (column separated) is optional, in this case all columns are
indexed. The index is updated in read time. To search the index, use
the following query:

SELECT * FROM FT_SEARCH('Hello', 0, 0);

You can also call the index from within a Java application:

org.h2.fulltext.FullText.search(conn, text, limit, offset)

Using the Lucene Full Text Search
To use the Lucene full text search, you first need to rename the file
FullTextLucene.java.txt to FullTestLucene.java and compile it. Also,
you need the Lucene library in the classpath.
To initialize, call:

CREATE ALIAS IF NOT EXISTS FTL_INIT FOR "org.h2.fulltext.FullTextLucene.init";
CALL FTL_INIT();

Afterwards, you can create a full text index for a table using:

CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR);
INSERT INTO TEST VALUES(1, 'Hello World');

CALL FTL_CREATE_INDEX('PUBLIC', 'TEST', NULL);

where PUBLIC is the schema, TEST is the table name. The list of column
names (column separated) is optional, in this case all columns are
indexed. The index is updated in read time. To search the index, use
the following query:

SELECT * FROM FTL_SEARCH('Hello', 0, 0);

You can also call the index from within a Java application:

org.h2.fulltext.FullTextLucene.search(conn, text, limit, offset)


I hope this helps,
Thomas

Chris Schanck

unread,
Sep 20, 2007, 2:26:35 PM9/20/07
to h2-da...@googlegroups.com
Thanks very much. Am I correct that this used to be in the docs?

BTW, I have been very happy with H2's performance on reasonable data
sets (30-40million rows over 600-700 tables). If I get a chance, I'd
like to try the new MVCC code. You say it is beta and to be careful,
but what is the actual failure mode? Wrong answers? Lock contention?
Deadlock?
Thanks

Thomas Mueller

unread,
Sep 22, 2007, 4:08:35 AM9/22/07
to h2-da...@googlegroups.com
Hi,

> this used to be in the docs?

No, as far as I know, it was never documented (unless documentation
was lost, but I don't think so).

> BTW, I have been very happy with H2's performance on reasonable data
> sets (30-40million rows over 600-700 tables).

Good to know! Tell me if you have problems, I am always interested in
trying to improve the database or the documentation.

> MVCC. You say it is beta and to be careful,


> but what is the actual failure mode? Wrong answers? Lock contention?
> Deadlock?

Exceptions when using multiple threads and multiple connections. That
means it is not really usable currently. Most unit tests work, but not
all yet. If you have multiple connections and manually edit data (one
row after the other), it should work. I don't think it will be hard to
fix the problems, just I didn't have time yet (fixing bugs seems more
important at this time).

Thanks,
Thomas

Chris Schanck

unread,
Sep 22, 2007, 10:24:22 PM9/22/07
to h2-da...@googlegroups.com
Well, I'll play with it as soon as I get a chance, but we are a little
busy at the moment. The company i work for is debuting a new deductive
database product, based on a custom N-store over a JDBC layer. For low
update concurrency and reasonably small datasets (100million facts and
less) we have several customers who wish to use H2 because of the
all-Java implementation. Plus, we have some clients who wish to use it
for tiny data/rich logic situations, a place where H2's in memory impl
comes in handy. For larger (100Million+) and highly update concurrent
installations, we have had good luck with Postgres.

Thanks,

Chris

Thomas Mueller

unread,
Sep 24, 2007, 2:45:59 PM9/24/07
to h2-da...@googlegroups.com
Hi,

Thanks for your mail. If you are interested in some kind of support
plan, please let me know. There is nothing in place at the moment, but
it is the (long term) plan to provide it (in addition to the forum, of
course).

Thomas

Reply all
Reply to author
Forward
0 new messages