Full text search engines like Apache Lucene are very powerful technologies to add efficient freetext search capabilities to applications. However, Lucene suffers several mismatches when dealingwith object domain models. Amongst other things indexes have to be kept up to date and mismatchesbetween index structure and domain model as well as query mismatches have to be avoided.
Hibernate Search addresses these shortcomings - it indexes your domain model with the help of a fewannotations, takes care of database/index synchronization and brings back regular managed objectsfrom free text queries. To achieve this Hibernate Search is combining the power ofHibernate and Apache Lucene.
Welcome to Hibernate Search. The following chapter will guide you through the initial steps requiredto integrate Hibernate Search into an existing Hibernate ORM enabled application. In case you are aHibernate new timer we recommend you start here.
The Hibernate Search library is split in several modules to allow you to pick the minimal set ofdependencies you need.It requires Apache Lucene, Hibernate ORM and some standard APIs such as the Java Persistence APIand the Java Transactions API. Other dependencies are optional, providing additional integrationpoints.To get the correct jar files on your classpath we highly recommend to use a dependency manager suchas Maven, or similar tools such as Gradle orIvy.These alternatives are also able to consume the artifacts from the Using Maven section.
You can download zip bundles from Sourceforge containing all neededHibernate Searchdependencies. This includes - among others - the latest compatible version of Hibernate ORM. However,only the essential parts you need to start experimenting with are included. You will probably needto combine this with downloads from the other projects, for example theHibernate ORM distribution on Sourceforgealso provides the modules to enable caching or use a connection pool.
Once you have added all required dependencies to your application you have to add a couple ofproperties to your Hibernate configuration file.If you are using Hibernate directly this can be done in hibernate.properties or hibernate.cfg.xml.If you are using Hibernate via JPA you can also add the properties to persistence.xml.The good news is that for standard use most properties offer a sensible default.An example persistence.xml configuration could look like this:
First you have to tell Hibernate Search which DirectoryProvider to use. This can be achieved bysetting the hibernate.search.default.directory_provider property. Apache Lucene has the notionof a Directory to store the index files. Hibernate Search handles the initialization andconfiguration of a Lucene Directory instance via a DirectoryProvider. In this tutorial we willuse a a directory provider which stores the index on the file system. This will give us the ability toinspect the Lucene indexes created by Hibernate Search (eg viaLuke). Once you have a working configuration you can startexperimenting with other directory providers (see Directory configuration).You also have to specify the default base directory for all indexes viahibernate.search.default.indexBase. This defines the path where indexes are stored.
Without projections, Hibernate Search will per default execute a Lucene query in order to find thedatabase identifiers of the entities matching the query criteria and use these identifiers toretrieve managed objects from the database. The decision for or against projection has to be made ona case by case basis.
This leaves us with @IndexedEmbedded. This annotation is used to index associated entities(@ManyToMany, @*ToOne, @Embedded and @ElementCollection) as part of the owning entity.This is needed since a Lucene index document is a flat data structure which does not know anythingabout object relations.To ensure that the author names will be searchable you have to make sure that the names are indexedas part of the book itself. On top of @IndexedEmbedded you will also have to mark the fields ofthe associated entity you want to have included in the index with @Field.For more details see Embedded and associated objects.
Hibernate Search will transparently index every entity persisted, updated or removed throughHibernate ORM. However, you have to create an initial Lucene index for the data already present inyour database. Once you have added the above properties and annotations it is time to trigger aninitial batch index of your books. You can achieve this by using one of the following code snippets(see also Rebuilding the whole index):
After executing the above code, you should be able to see a Lucene index under /var/lucene/indexes/example.Book(or based on a different path depending how you configured the property hibernate.search.default.directory_provider).
Now it is time to execute a first search. The general approach is to create a Lucene query, eithervia the Lucene API (Building a Lucene query using the Lucene API) or via the Hibernate Search query DSL(Building a Lucene query with the Hibernate Search query DSL), and then wrap this query into a org.hibernate.Query in order to get all thefunctionality one is used to from the Hibernate API. The following code will prepare a query againstthe indexed fields, execute it and return a list of Book instances.
When using the @Analyzer annotation one can either specify the fully qualified classname of theanalyzer to use or one can refer to an analyzer definition defined by the @AnalyzerDef annotation.In the latter case the analyzer framework with its factories approach is utilized.
The above paragraphs helped you getting an overview of Hibernate Search. The next step after thistutorial is to get more familiar with the overall architecture of Hibernate Search(Architecture) and explore the basic features in more detail. Two topics which were only brieflytouched in this tutorial were analyzer configuration (Default analyzer and analyzer by class) and field bridges(Bridges). Both are important features required for more fine-grained indexing. Moreadvanced topics cover clustering (JMS Master/Slave back end, Infinispan Directory configuration) and large indexhandling (Sharding indexes).
Each time an entity is inserted, updated or removed in/from the database, Hibernate Search keepstrack of this event (through the Hibernate event system) and schedules an index update. All theseupdates are handled without you having to interact with the Apache Lucene APIs directly (seeEnabling Hibernate Search and automatic indexing). Instead, the interaction with the underlying Lucene indexes ishandled via so called IndexManagers.
Each Lucene index is managed by one index manager which is uniquely identified by name. In mostcases there is also a one to one relationship between an indexed entity and a single IndexManager.The exceptions are the use cases of index sharding and index sharing. The former can be applied whenthe index for a single entity becomes too big and indexing operations are slowing down theapplication. In this case a single entity is indexed into multiple indexes each with its own indexmanager (see Sharding indexes). The latter, index sharing, is the ability to indexmultiple entities into the same Lucene index (see Sharing indexes).
The index manager abstracts from the specific index configuration. In the case of the default indexmanager this includes details about the selected backend, the configured reader strategy and thechosen DirectoryProvider. These components will be discussed in greater detail later on. It isrecommended that you start with the default index manager which uses different Lucene Directorytypes to manage the indexes (see Directory configuration). You can, however, also provideyour own IndexManager implementation (see Configuring the IndexManager).
Once the index is created, you can search for entities and return lists of managed entities savingyou the tedious object to Lucene Document mapping. The same persistence context is shared betweenHibernate and Hibernate Search. As a matter of fact, the FullTextSession is built on top of theHibernate Session so that the application code can use the unified org.hibernate.Query orjavax.persistence.Query APIs exactly the same way a HQL, JPA-QL or native query would do.
In the case of an ongoing transaction, the index update operation isscheduled for the transaction commit phase and discarded in case of transaction rollback. Thebatching scope is the transaction. There are two immediate benefits:
ACIDity: The work executed has the same scoping as the one executed by the database transaction andis executed if and only if the transaction is committed. This is not ACID in the strict sense of it,but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from thesource at any time.
You can think of those two batch modes (no scope vs transactional) as the equivalent of the(infamous) autocommit vs transactional behavior. From a performance perspective, the intransaction mode is recommended. The scoping choice is made transparently. Hibernate Search detectsthe presence of a transaction and adjust the scoping (see Worker configuration).
Hibernate Search offers the ability to let the batched work be processed by different backends.Several backends are provided out of the box and you have the option to plug in your own. It isimportant to understand that in this context backend encompasses more than just the configurationoption hibernate.search.default.worker.backend. This property just specifies an implementation ofthe BackendQueueProcessor interface (or the Backend interface, see the configuration options)which is a part of a backend configuration. In most cases,however, additional configuration settings are needed to successfully configure a specific backendsetup, like for example the JMS backend.
In this mode, all index update operations applied on a given node (JVM) will be executed to theLucene directories (through the directory providers) by the same node. This mode is typically usedin non clustered environment or in clustered environments where the directory store is shared.
7fc3f7cf58