Matching text

1 view

Skip to first unread message

sega

unread,

Mar 2, 2010, 3:43:09 PM3/2/10

to FluidDB Discuss

Hello!

I am doing an application which uses FluidDB and every time I use the
keyword "matches" I get the following response:

header=HTTP/1.1 400 Bad Request
Transfer-Encoding: chunked
Date: Tue, 02 Mar 2010 20:15:20 GMT
X-Fluiddb-Error-Class: TParseError
Content-Type: text/html
X-Fluiddb-Request-Id: ditjscmnlknqgvxw
Server: TwistedWeb/8.2.0

The HTTP GET command that I send is:
http://sandbox.fluidinfo.com/objects?query=has+gts_bcn/TipFinder/tip+and+gts_bcn/TipFinder/question+matches+Hello

I do not know I am doing wrong. I have read all documentation about
FluidDB but I do not find solution about my problem. I think my query
is rightly built but the response is always Bad Request (TParseError).
Can anyone help me please?

Thank you

Nicholas Tollervey

unread,

Mar 2, 2010, 6:07:57 PM3/2/10

to fluiddb...@googlegroups.com

Hi Sega,

Don't worry, it isn't you. Take a look at the documents here:

http://doc.fluidinfo.com/fluidDB/queries.html

Specifically, look at the note at the bottom (and referenced in section about "matches" query) that states:

"Text matching has not been implemented for the launch of the FluidDB private alpha. Expect it soon."

Put simply, the FluidInfo guys have yet to enable this functionality.

Hope that helps,

Nicholas.

Terry Jones

unread,

Mar 3, 2010, 10:32:02 AM3/3/10

to fluiddb...@googlegroups.com

Hi Sega,

Sorry I didn't reply yesterday - your mail somehow hit my spam folder, and
I only got Nicholas' reply! :-)

Anyway, Nicholas is quite right. We've not implemented text matching
yet. There are several people who have asked about it, and we want it
ourselves, so it's a reasonably high priority. It's also something we want
to get right, so we're a little careful about how we'll provide it.

I'd be happy to hear more about what you're doing, how important text
matching is to that, what it's preventing you doing etc. We do of course
want to be reactive to the needs of people who are using FluidDB, and the
more we know the better we can prioritize things. So please feel free to
say more, to be the squeaky wheel, etc.

Terry

sega

unread,

Mar 4, 2010, 3:03:05 PM3/4/10

to FluidDB Discuss

Hello,

First at all, thank you for your replies and sorry for not reading the
note at the bottom of the FluidDB documentation. I read "Text matching
is done with Lucene, [...]" and I thought text matching was
implemented.

It's a pitty that text matching is not implemented because it's a
basic function when you use tag values with text. In my case, I'm part
of a work group who are implementing an API for Smalltalk to access to
FluidDB. Our API is already functional and we should want to do a
little application which use the features of FluidDB. Our goal was
that the benefit of this application implemented in Smalltalk gives
back to the Smalltalk community. For this reason, we thought to do an
application which helps to smalltalk developer, for example, creating
a database with questions and answers like a FAQ.

The needs that we have matching text is to perform searchs case
insensitive and maybe using regular expressions. We are not yet sure.

With respect to be a squeaky wheel I have several questions about
FluidDB for which I have not found satisfactory answers. What are the
features that make to FluidDB special to develop a killer application?
What can FluidDB do and a relational database not? When do I have to
use FluidDB instead of a relational database? And the last question,
why does not FluidDB have a way to get an object with all its tag
values or several of them using only one request? Now, when an object
have to be built a lot of requests have to be sended to get each tag
value overloading the net unnecessaryly. Are you thinking any solution
about this?

Thank you,

Sergio

Terry Jones

unread,

Mar 5, 2010, 7:03:23 AM3/5/10

to fluiddb...@googlegroups.com

Hi Sergio

>>>>> "Sergio" == sega <seg...@yahoo.es> writes:

Sergio> Our API is already functional and we should want to do a little
Sergio> application which use the features of FluidDB. Our goal was that
Sergio> the benefit of this application implemented in Smalltalk gives back
Sergio> to the Smalltalk community. For this reason, we thought to do an
Sergio> application which helps to smalltalk developer, for example,
Sergio> creating a database with questions and answers like a FAQ.

Sergio> The needs that we have matching text is to perform searchs case
Sergio> insensitive and maybe using regular expressions. We are not yet
Sergio> sure.

Sounds great - and I'm really sorry we're not ready for that just yet.

Sergio> With respect to be a squeaky wheel I have several questions about
Sergio> FluidDB for which I have not found satisfactory answers.

OK, I can at least try to answer some of these :-)

Sergio> What are the features that make to FluidDB special to develop a
Sergio> killer application?

Let me start with a slightly different question. People often ask "what's
the killer app" for FluidDB. I think that's almost by definition
unanswerable. If we could say in advance what killer apps were, the world
(especially the start-up world) would be a very different place. Although
it seems unlikely, it might be that there are no killer apps for FluidDB
and that the real killer is what FluidDB allows overall.

This heads towards an answer to your question. FluidDB makes it possible
for applications (and hence their users) to naturally share information by
putting it into the same place. When you put information into context, it
becomes more valuable. So a first answer is that applications become more
valuable because the data they're storing into FluidDB becomes more
valuable. It's more valuable because others can customize it, can search on
it (with text matching, one of these days :-)), can organize it, etc. By
using FluidDB for storage of information, a programmer gives the users of
his/her application the possibility to do more with the information that
the program is storing on their behalf.

I'm sorry if this sounds a bit abstract. As an example, think about
information that's about place, or about people, etc. Mobile applications
that store information about place into the same (FluidDB) storage
inherently have more potential - because they can become richer by easily
taking advantage of other information that's stored on the same object,
because they let the user do more, because they can potentially interact
with other applications using the same objects, etc.

Or consider Tickery (http://tickery.net) which we built to illustrate
this. The underlying data of Tickery is in FluidDB. So while Tickery itself
isn't world-changing, it has a ton of potential for more interesting things
to emerge as others can add to the underlying data, can query on it, and
can even display the results in Tickery. See the examples on Tickery's
Advanced tab. We'll add some more to Tickery soon.

So in a way the answer is about social data. Applications (and users) using
FluidDB will benefit most if their data is in a sense social. If an
application were to put completely private tags onto objects that no-one
could find, it would probably derive no benefit from FluidDB beyond perhaps
being simpler to use, less work to set up.

Sergio> What can FluidDB do and a relational database not?

There are two important things we're aiming at.

One is the overall control structure of FluidDB. No-one need ask permission
to add anything (tag) to any object, and future needs or changes do not
have to be anticipated. Those are steps away from what has traditionally
been a tightly controlled world of schema, tables, database admins who
controlled the data model, and careful / hard-to-change data
organization. The FluidDB control & permissions system is designed to
change all that. See the nice articles written by Nicholas Radcliffe
(http://bit.ly/Kh7Rx) and Nicholas Tollervey (http://bit.ly/aezmA0) that
address this.

The second is that FluidDB is designed to be very simple, mainly with the
goal of being able to scale horizontally. Some people would argue that
relational databases already scale pretty well, perhaps even horizontally.
The relational world is heading in that direction, of course. (BTW, I
don't much like the whole NoSQL movement.) But that's very hard when you
have to support a query language that's as rich as SQL and applications
that can build arbitrarily complex queries. In FluidDB you can't do a
complex query. You can do a big query, a deep one, one that returns many
results, etc., but you can't do a complicated one. The complexity of your
query language bounds the complexity of the underlying architecture. If
your query language is simple, the underlying architecture can be simple(r)
and more uniform. I wrote about this at http://bit.ly/ipQmZ

So, though I realize this is a contentious statement and of course that
we're not there yet, I'd claim that FluidDB can scale more easily - because
it's designed from the beginning to scale.

Sergio> When do I have to use FluidDB instead of a relational database?

I guess you mean when should you use FluidDB. One obvious initial way to
use it is to store metadata about things (put it on the object that's
"about" the thing). E.g., use it when you want to store information about a
Twitter user but Twitter's API doesn't let you put your information into
their database. Use it when you have data that you think other
applications or users might benefit from - by augmenting it, by searching
across it, by organizing it. Use it when you suspect you'll benefit from
other information that's in there already or that might be in there. Use it
when you want more future flexibility in your data model, or when you want
to give that flexibility to others. Use it if you think your data could be
made more valuable by being co-located with other related data. Use it if
you want your users to have choice and control over their own data.

Don't use it if you want to build a private app, or you need text matching :-)

Sergio> And the last question, why does not FluidDB have a way to get an
Sergio> object with all its tag values or several of them using only one
Sergio> request? Now, when an object have to be built a lot of requests
Sergio> have to be sended to get each tag value overloading the net
Sergio> unnecessaryly. Are you thinking any solution about this?

Yes. That's about to be addressed. There's a proposal I wrote up some weeks
ago that I've been waiting for feedback on. I plan to post it to the
mailing list. You're of course right that it's important - even if it
doesn't help hugely with speed, it can save hundreds of network requests.
It's a high priority.

Thanks for all the questions. I hope the above is interesting / helpful.
Feel free to write more. And thanks a lot for working on the Smalltalk
library!

Terry

sega

unread,

Mar 10, 2010, 4:18:20 PM3/10/10

to FluidDB Discuss

Hi Terry,

Sorry fot not responding sooner but lastly I'm very busy.

> Sergio> Our API is already functional and we should want to do a little
> Sergio> application which use the features of FluidDB. Our goal was that
> Sergio> the benefit of this application implemented in Smalltalk gives back
> Sergio> to the Smalltalk community. For this reason, we thought to do an
> Sergio> application which helps to smalltalk developer, for example,
> Sergio> creating a database with questions and answers like a FAQ.
>

Terry> Sounds great - and I'm really sorry we're not ready for that
just yet.

I have restated the problem about the matching text and I have found
an alternative solution to search text without using the matches
operator. Reading again the FluidDB documentation I found the contains
operator which matches full strings in a set of strings. So, I have
added a new tag called keywords which contains the main words of the
question and when I do a search I build a query using the contains
operator over this set of keywords. I have done a test and it has
worked successfully. Finally, the application has not been broken :-)
However, I have other question. How can the contains operator be case
insensitive?

> Sergio> What are the features that make to FluidDB special to develop a
> Sergio> killer application?
>

Terry> Let me start with a slightly different question. People often
ask "what's
Terry> the killer app" for FluidDB. I think that's almost by
definition
Terry> unanswerable. If we could say in advance what killer apps were,
the world
Terry> (especially the start-up world) would be a very different
place. Although
Terry> it seems unlikely, it might be that there are no killer apps
for FluidDB
Terry> and that the real killer is what FluidDB allows overall.
Terry>
Terry> This heads towards an answer to your question. FluidDB makes it
possible
Terry> for applications (and hence their users) to naturally share
information by
Terry> putting it into the same place. When you put information into
context, it
Terry> becomes more valuable. So a first answer is that applications
become more
Terry> valuable because the data they're storing into FluidDB becomes
more
Terry> valuable. It's more valuable because others can customize it,
can search on
Terry> it (with text matching, one of these days :-)), can organize
it, etc. By
Terry> using FluidDB for storage of information, a programmer gives
the users of
Terry> his/her application the possibility to do more with the
information that
Terry> the program is storing on their behalf.

I have read very attentively your answer and let me say you that it
have been an interesting reading. I agree with you that FluidDB allows
to share and set up information more naturally and flexiblely than
other data bases. Moreover, FluidDB gives freedom to group and added
information around objects being easier create new contexts for other
applications. It is true but I think behind this simple, homogeneous
and elegant (why not) model of FluidDB has an hidden handicap for the
developer. When a developer wants to make an application he/she does
not know the structure and information that there is already in the
data base. Therefore, developer cannot benefit from data and objects
already built. I think developer needs a tool to "see" this public
organization and information to be able to put his/her information in
the suitable context and become richer the objects sharing his/her
information. It can make easier build applications for FluidDB and
grow FluidDB data base.

Terry> The second is that FluidDB is designed to be very simple,
mainly with the
Terry> goal of being able to scale horizontally. Some people would
argue that
Terry> relational databases already scale pretty well, perhaps even
horizontally.
Terry> The relational world is heading in that direction, of course.
(BTW, I
Terry> don't much like the whole NoSQL movement.) But that's very
hard when you
Terry> have to support a query language that's as rich as SQL and
applications
Terry> that can build arbitrarily complex queries. In FluidDB you
can't do a
Terry> complex query. You can do a big query, a deep one, one that
returns many
Terry> results, etc., but you can't do a complicated one. The
complexity of your
Terry> query language bounds the complexity of the underlying
architecture. If
Terry> your query language is simple, the underlying architecture can
be simple(r)
Terry> and more uniform. I wrote about this athttp://bit.ly/ipQmZ

Other interesting question about FluidDB is the query language.
FuildDB is apparently a caotic set of objects tagged with information.
The queries are the only method to find related information and
extract a set of objects with a common information. With a scale as
horizontal as FluidDB the queries has to be fast and rich in operators
to guarantee dynamic and quick applications. Moreover, the number of
requests to FluidDB has to be minimum to reduce the objects
transmited. For this reason, tag values of objects could be sended
when a query is executed. I think the query language could became
richer to get, select and filter the true information that developer
needs. It is only an idea that I had when I read your reply. Nothing
reliable.

> Sergio> When do I have to use FluidDB instead of a relational database?
>

Terry> I guess you mean when should you use FluidDB. One obvious
initial way to
Terry> use it is to store metadata about things (put it on the object
that's
Terry> "about" the thing). E.g., use it when you want to store
information about a
Terry> Twitter user but Twitter's API doesn't let you put your
information into
Terry> their database. Use it when you have data that you think other
Terry> applications or users might benefit from - by augmenting it, by
searching
Terry> across it, by organizing it. Use it when you suspect you'll
benefit from
Terry> other information that's in there already or that might be in
there. Use it
Terry> when you want more future flexibility in your data model, or
when you want
Terry> to give that flexibility to others. Use it if you think your
data could be
Terry> made more valuable by being co-located with other related data.
Use it if
Terry> you want your users to have choice and control over their own
data.
Terry>
Terry> Don't use it if you want to build a private app, or you need
text matching :-)

I agree with you completely.

> Sergio> And the last question, why does not FluidDB have a way to get an
> Sergio> object with all its tag values or several of them using only one
> Sergio> request? Now, when an object have to be built a lot of requests
> Sergio> have to be sended to get each tag value overloading the net
> Sergio> unnecessaryly. Are you thinking any solution about this?
>

Terry> Yes. That's about to be addressed. There's a proposal I wrote
up some weeks
Terry> ago that I've been waiting for feedback on. I plan to post it
to the
Terry> mailing list. You're of course right that it's important -
even if it
Terry> doesn't help hugely with speed, it can save hundreds of network
requests.
Terry> It's a high priority.

Ok.

> Thanks for all the questions. I hope the above is interesting / helpful.
> Feel free to write more. And thanks a lot for working on the Smalltalk
> library!

Thanks for your time and attention.

Sergio

Terry Jones

unread,

Mar 15, 2010, 8:20:15 PM3/15/10

to fluiddb...@googlegroups.com

>>>>> "Sergio" == sega <seg...@yahoo.es> writes:

Sergio> I have restated the problem about the matching text and I have
Sergio> found an alternative solution to search text without using the
Sergio> matches operator.

OK, that's good news.

Sergio> Reading again the FluidDB documentation I found the contains
Sergio> operator which matches full strings in a set of strings. So, I have
Sergio> added a new tag called keywords which contains the main words of
Sergio> the question and when I do a search I build a query using the
Sergio> contains operator over this set of keywords. I have done a test and
Sergio> it has worked successfully. Finally, the application has not been
Sergio> broken :-) However, I have other question. How can the contains
Sergio> operator be case insensitive?

You can't do that yet either :-) Sorry!

Sergio> I have read very attentively your answer and let me say you that it
Sergio> have been an interesting reading. I agree with you that FluidDB
Sergio> allows to share and set up information more naturally and
Sergio> flexiblely than other data bases.

I would say that in some cases. There will be all sorts of situations where
using another database would be more appropriate. As far as what we like to
think of as social data, I think FluidDB is unique, and as a place to put
metadata about virtually anything, I do think it's a good choice. It may
well be that people store large amounts of data elsewhere (e.g., in Flickr,
in Amazon S3, etc) and use FluidDB to store metadata. That's fine too; in
fact that's a great use as well.

Sergio> Moreover, FluidDB gives freedom to group and added information
Sergio> around objects being easier create new contexts for other
Sergio> applications. It is true but I think behind this simple,
Sergio> homogeneous and elegant (why not) model of FluidDB has an hidden
Sergio> handicap for the developer. When a developer wants to make an
Sergio> application he/she does not know the structure and information that
Sergio> there is already in the data base. Therefore, developer cannot
Sergio> benefit from data and objects already built. I think developer
Sergio> needs a tool to "see" this public organization and information to
Sergio> be able to put his/her information in the suitable context and
Sergio> become richer the objects sharing his/her information. It can make
Sergio> easier build applications for FluidDB and grow FluidDB data base.

Yes. I think everyone would agree with that. Did you see Nicholas
Radcliffe's recent blog post about keeping a catalog of FluidDB About tag
conventions? See http://bit.ly/dCdlmR

That kind of information may migrate into FluidDB.

Another point is that FluidDB can help with this. Each tag and namespace
has an object associated with it that can be used to store information
about the tag or namespace. One of the tags on those objects holds the
description that's given when the tag or object is created. That can be
searched on (again, that needs text matching to be most useful). And
FluidDB can (in the future) also tell you which tags are most likely to be
useful, based initially on things like how heavily used they are, and based
later on more interesting things, like co-occurrence of tags on objects,
trusted tags on the object corresponding to tags, etc. I think there's a
lot of potential for that sort of thing to exist, and agree with your
comment that there will be pressure for it to exist.

The important thing right now - and remember that these are very early days
- is that FluidDB gives a natural and obvious (suggested) place for that
kind of information to accumulate: on the object corresponding to the tags
in question. And it's flexible enough that various ways of marking up
FluidDB using FluidDB are possible. And all that can happen, perhaps in
multiple ways, without needing Fluidinfo (the company) to direct things, to
anticipate things, or to change the underlying simple/uniform architecture.

Sergio> Thanks for your time and attention.

And for yours! Please feel free to ask more. Did you have a chance to look
at the /values draft proposal I posted late Friday night?