Basic full-text search implemented

232 views
Skip to first unread message

Zef Hemel

unread,
Apr 27, 2010, 9:57:59 AM4/27/10
to persis...@googlegroups.com
Hi all,

I spent part of the day today to build an initial naive version of
full-text search for persistence.js. To use it simply include
persistence.search.js after including persistence.js.

Example usage:

var Note = persistence.define('Note', {
name: "TEXT",
text: "TEXT",
status: "TEXT"
});
Note.textIndex('name');
Note.textIndex('text');

This sample defines a `Note` entity with three properties of which
`name` and `text` are full-text indexed. For this a new database table
will be created that stores the index.

Searching is done as follows:

Note.search({query: "note", success: function(results) {
console.log(results);
}});

or you can paginate your results using `limit` and `skip` (similar
to `limit` and `skip` in QueryCollections).

Note.search({query: "note", limit: 10, skip: 10, success:
function(results) {
console.log(results);
}});

Query language
--------------
Queries can contain regular words. In addition the `*` wildcard can be
used anywhere with a word. The `property:` notation can be used to
search only a particular field. Examples:

* `note`
* `name: note`
* `interesting`
* `inter*`
* `important essential`

Note that currently a result is return when _any_ word matches.
Results are ranked by number of occurences of one of the words in the
text.

For each table with a textIndex column there is another
TableName_Index table created that contains the index. This table
format is not optimal at this point, it contains the UUID of the
entity, the property name, the word and the occurrence count. This
could be further optimized to reduce storage size. I implemented
_extremely_ simple stemming for English words (basically removing -s
endings from words and replacing -ies with -y, so that'd you find
babies when searching for baby, and vice versa).

Note that the API style diverged a bit from the rest of persistence.js
(passing an object containing named arguments to search, rather than a
flat list of arguments). Fabio will (hopefully) send a proposal for
applying this style to the rest of persistence.js later this week.

As usual, it's all available from http://github.com/zefhemel/persistencejs

For now, what do you guys think? Any comments?

Best,

Zef

--
Zef Hemel
http://zef.me
http://twitter.com/zef


--
Subscription settings: http://groups.google.com/group/persistencejs/subscribe?hl=en

Ulrich

unread,
Apr 27, 2010, 12:45:58 PM4/27/10
to persis...@googlegroups.com
Good job!
Is it plan to add boolean query like:
> Note.search({query: "name:note* -status:deleted", success: function
> (results) {
> console.log(results);
> }});


Ulrich VACHON


Envoyé de mon iPhone

Zef Hemel

unread,
Apr 27, 2010, 2:44:03 PM4/27/10
to persis...@googlegroups.com
Hi Ulrich,

On Tue, Apr 27, 2010 at 6:45 PM, Ulrich <u.va...@gmail.com> wrote:
> Good job!

Thanks!

> Is it plan to add boolean query like:
>>
>> Note.search({query: "name:note* -status:deleted", success:
>> function(results) {
>>     console.log(results);
>>   }});
>

Maybe, although it doesn't have very high priority for me now. I
needed basic full-text search. I'll just use what there is now and see
if it is sufficient. If you see a chance to add these feature, feel
very free to contribute.

Best,

Zef

Ulrich Vachon

unread,
Apr 27, 2010, 3:07:22 PM4/27/10
to persis...@googlegroups.com
Ok, that would be an honor that I am "light" in Javascript. But I'll
think about it because I know Lucene.

Ulrich

Ulrich Vachon

unread,
Apr 27, 2010, 4:39:54 PM4/27/10
to persis...@googlegroups.com
I watched quickly the code and I have not seen normalization step. The
goal of normalization is to use "filters" to have the same text while
indexing and searching. For example, if I want to index this
expression : "The full-text search is good!"
- After used a ToLowerCaseFilter :
- "the full-text search is good!"
- After used a StandardFilter :
- "the full text search is good"
- After used a StopWordFilter :
- "full text search good"
- etc...

Interresting?
Ulrich VACHON

PS : It's maybe the time to develop JS for me :)

Zef Hemel

unread,
Apr 28, 2010, 7:27:35 AM4/28/10
to persis...@googlegroups.com
Hi Ulrich,

> I watched quickly the code and I have not seen normalization step. The
> goal of normalization is to use "filters" to have the same text while
> indexing and searching. For example, if I want to index this
> expression : "The full-text search is good!"

There was a normalization step but it was not fully reused for both
the indexing and search phrase parser. I factored out that code now
into the normalizeWord function. Currently it removes any words
shorter than 3 characters, does basic stemming and removes some stop
words (mentioned in the filteredWords variable). But again, this is
very basic.

It would be nice to implement a more generic filtering mechanism like
you propose. Lucene's system is very nice.

> PS : It's maybe the time to develop JS for me :)

Definitely!

I changed the search API to use query collections. EntityName.search()
now returns a query collection, which means that you can now write:

Note.search("note").limit(10).skip(10).list(null, function(results) {
console.log(results);
});

Which is much cleaner than what I had before.

Best,

Zef

Zef Hemel

unread,
May 5, 2013, 9:20:10 AM5/5/13
to persis...@googlegroups.com
Perhaps a load order problem? How do you load the files?

-- Zef


On Sun, May 5, 2013 at 3:01 PM, Lorenzo Becchi <omini...@gmail.com> wrote:
Zef, thanks for your code.

Trying to add the textIndex I get:
 Note.textIndex is not a function

persistence.search.js is loaded properly buy the HTML but the functions are not added to the object.
None of the ones provided by persistence.search.js
Firebug  shows al persistence's functions but not the search ones.
what can be wrong?

thanks
Lorenzo

--
You received this message because you are subscribed to the Google Groups "persistence.js" group.
To unsubscribe from this group and stop receiving emails from it, send an email to persistencej...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Lorenzo Becchi

unread,
May 5, 2013, 9:27:34 AM5/5/13
to persis...@googlegroups.com
Zef, I feel like stupid but doesn't help, hehe
I've been after this tutorial:
http://zef.me/3234/full-text-search-in-persistence-js


----------------------------------

Lorenzo Becchi
http://www.lorenzobecchi.com
> You received this message because you are subscribed to a topic in the
> Google Groups "persistence.js" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/persistencejs/6ne4j99LgPI/unsubscribe?hl=en.
> To unsubscribe from this group and all its topics, send an email to

Lorenzo Becchi

unread,
May 5, 2013, 9:31:52 AM5/5/13
to persistencejs
ups, message was sent before complete... sorry

what I'm doing I'm including a bunch of your code:
----------------------------
<!-- JS library to manage DB -->
<script src="js/persistence.js" type="application/javascript"></script>
<script src="js/persistence.search.js" type="application/javascript"></script>
<script src="js/persistence.store.sql.js"
type="application/javascript"></script>
<script src="js/persistence.store.websql.js"
type="application/javascript"></script>
<script src="js/persistence.store.memory.js"
type="application/javascript"></script>
<script src="js/persistence.jquery.js" type="application/javascript"></script>
<script src="js/persistence.jquery.mobile.js"
type="application/javascript"></script>
----------------------------


thank using jQuery I get the $(document).ready event and create the DB.
----------------------------
if (window.openDatabase) {
persistence.store.websql.config(persistence, 'Acu', 'Acu
Ominiverdi', 5 * 1024 * 1024);
} else {
persistence.store.memory.config(persistence);
}

//define Entity
Task = persistence.define('Task', {
p_id: "INT",
p_title: "TEXT",
p_text: "TEXT",
si_title: "TEXT",
si_content: "TEXT"
});

Task.textIndex('p_title');
----------------------------


nothing else
and I get the error:
----------------------------
TypeError: Task.textIndex is not a function
Task.textIndex('p_title');
----------------------------

It feels so weird to me.




----------------------------------

Lorenzo Becchi
http://www.lorenzobecchi.com



Zef Hemel

unread,
May 5, 2013, 11:09:10 AM5/5/13
to persis...@googlegroups.com
Hi,

Turns out the search documentation (https://github.com/zefhemel/persistencejs/blob/master/docs/search.md) was slightly out of date. I just added the config line you have to add just after your websql config line as described in the README:


persistence.search.config(persistence, persistence.store.websql.sqliteDialect);

HTH,

-- Zef

Lorenzo Becchi

unread,
May 8, 2013, 8:04:06 AM5/8/13
to persistencejs
thanks Zef!

----------------------------------

Lorenzo Becchi
http://www.lorenzobecchi.com



Reply all
Reply to author
Forward
0 new messages