Freebase: querying the freebase database for qa system

22 views
Skip to first unread message

Devashish Shankar

unread,
Aug 16, 2011, 10:30:36 AM8/16/11
to VIT LUG
In freebase data is organized into topics. Now, each topic has some
(atleast one) types. Multiple types are grouped into domain.

An id of a topic is something like : /domain/type/id

e.g. apple: /food/fruit/apple
AND, /food/ingredients/apple

(+ by default every topic is in /en domain)

/en/apple

new york city: /en/new_york

Now, to access the properties of a topic we should pass the EXACT id.
Then, furthermore, we should know the exact property name which we
want info about ("Scientific name", "color", are all properties of
apple fruit).

Here is one idea how we can query freebase for our system:

Using one query we can find out the list of id's for a keyword, i.e.
that query will return an array of ids which have the same topic name.
That query will return all the topics with apple in its name like

-apple (fruit)
-apple (tree)
-apple, inc (computer company)
-apple II (computer made by apple)
.....

Then we should retrieve the list of properties of each topic. And then
we can query the freebase database regarding the specific property.

PROBLEMS
How will the program know whether apple is the topic or color is the
topic? (We can probably try both of them, and whichever has relevant
result will be posted as the output)

For now, I will try to find the queries for this approach. I am
completely unfamiliar with JSON which is causing the problems. I
haven't read the full API yet and there might be easier ways to do
this... The JSON tutorial teaches us queries not by a programming
point of view, but by the point of view of a user who wants to
retrieve some data. Thus it relies a lot on a "query assist" feature,
which is somewhat like intellisense in IDEs like visual studio,
eclipse. And we obviously cant use query assist in a program.

Please read the freebase query tutorial and help me out! I will also
read JSON and freebase and think of ways to do this. (Many, many
applications use freebase so there definitely are ways to do this)

A LITTLE OVERVIEW OF HOW TO QUERY FREEBASE (for anyone who wants to
make this part of application)
-----------------------------------------------------------------------------------

Freebase queries are in JSON. If you want to learn how to query
freebase start here:
http://www.freebase.com/docs/data/introduction

Then you would need an API so that a program can give a query...
http://code.google.com/p/freebase-java/wiki/GettingStarted
This tutorial will teach you how to use JSON in java...

Devashish Shankar

unread,
Aug 16, 2011, 10:43:20 AM8/16/11
to VIT LUG
Quote from freebase.com tutorial:

Freebase contains a lot of topics, and a lot of types and properties
that you yourself didn't design. This is quite different from
programming on a database that you have designed, or even a database
that someone on your team has designed. On Freebase, it's harder to
know what data is there, and what types and properties to use to get
at the data.

So far, we have relied a lot on the Query Assist feature of the Query
Editor for finding IDs of topics, types, and properties. This works
quite well if you already know roughly what you're looking for, but
don't know precisely the IDs (of topics, types, or properties) in
question. If you're less sure of what you want, then you need
something in addition to Query Assist.

On Aug 16, 7:30 pm, Devashish Shankar <devashish.shan...@gmail.com>
wrote:
> Then you would need an API so that a program can give a query...http://code.google.com/p/freebase-java/wiki/GettingStarted

Rohit Mishra

unread,
Aug 16, 2011, 10:53:13 AM8/16/11
to vit...@googlegroups.com
Devashish 

Thanks a lot for updating us on the way Freebase works. Regarding the problem of how to identify whether apple is the topic or color is the topic, can we assume that nouns will be our topics and adjectives/adverbs will be the property. I believe this rule will suffice for a large number of cases, though there will be a significant number of exceptions. 

Persist with JSON as it is the universal language of web APIs and will help you in other projects that you take up also. 

Devashish Shankar

unread,
Aug 16, 2011, 11:17:58 AM8/16/11
to VIT LUG
Although color is the quality of an object, it isnt an adjective i
think. its a noun.

I just found a query to get the list of ids associated with the name
"apple"
And it returned 98 results! but the first one was the most viewed
topic, apple the fruit. The second one was satellite named apple,
third was a character named apple in some comic book. Seems like it
does an exact search (thats why apple inc and apple II were excluded).
But the rest 95 were blank results. The name was "apple", but had no
data (freebase must be full of such irrelevant topics because of free
access to write... people have even created custom types like "food
(jamie)" and .added apple to it! but while viewing in website, such
types are ignored... but they are still there)

Now i have to find a query by which all the properties of apple are
listed. It is more complicated because apple is associated with many
types, and each type has some properties. Then i have to check whether
color is a property (I dont think it is a property in freebase... i
can find out the scientific name, nutritional value, availability,
foods in which it is used as an ingredient but not color!)

The implementation is gonna be tough i guess!!

Devashish Shankar

unread,
Aug 16, 2011, 11:33:17 AM8/16/11
to VIT LUG
An edit: The rest 98 results weren't without data... they were just
topics with very little info (like apple a song, a software, etc.)

On Aug 16, 8:17 pm, Devashish Shankar <devashish.shan...@gmail.com>
wrote:

Dheeraj Rajagopal

unread,
Aug 16, 2011, 2:47:49 PM8/16/11
to vit...@googlegroups.com
Devashish , thanks for pointing out the problem . I see too much noisy data in the freebase database and

I have an option : We can try changing the knowledge base for now . We can try something much simpler for now .

What do you guys feel ?
--
Regards

Dheeraj

Rohit Mishra

unread,
Aug 16, 2011, 4:22:09 PM8/16/11
to vit...@googlegroups.com
Dheeraj and Devashish, 

What are our other options for a knowledge base now ? 

Dheeraj Rajagopal

unread,
Aug 17, 2011, 12:51:34 AM8/17/11
to vit...@googlegroups.com
Rohit,


We have OpenCyc which has a java-API and that is the one predominantly used for any common-sense computing projects
And there is this relatively new knowledge base called conceptnet from MIT which has a good python API . It is better organised than OpenCyc .

Or we can ask the people working on java to try the Opencyc and people working with python to start with conceptnet ..
--
Regards

Dheeraj

Rohit Mishra

unread,
Aug 17, 2011, 2:29:04 AM8/17/11
to vit...@googlegroups.com
Dheeraj

Ok lets go ahead with them then. 

Rohit

Devashish Shankar

unread,
Aug 17, 2011, 11:45:44 AM8/17/11
to VIT LUG
Ok am going ahead with opencyc......

On Aug 17, 11:29 am, Rohit Mishra <ro...@rohitmishra.me> wrote:
> Dheeraj
>
> Ok lets go ahead with them then.
>
> Rohit
>
> On Wed, Aug 17, 2011 at 10:21 AM, Dheeraj Rajagopal <dheeraj.go...@gmail.com

Dheeraj Rajagopal

unread,
Aug 17, 2011, 1:44:44 PM8/17/11
to vit...@googlegroups.com
I ll start with the conceptnet database . Lets meet up after CAT with the progress .
--
Regards

Dheeraj

Devashish Shankar

unread,
Sep 12, 2011, 5:31:39 PM9/12/11
to VIT LUG
I have done a little research on opencyc. It is better suited to our
application than freebase. Opencyc is a knowledge base which covers a
huge amount of knowledge in logical format (predicates). Thus, asking
queries is much simpler as it is in predicate forms. Opencyc is the
most comprehensive KB (Knowledge base) for AI applications.

But, the problem is that it is NOT a database. For writing an
application, we need to learn a language "CycL", which is the language
of opencyc. Opencyc provides a good set of tutorials.

What i infer after reading the introduction slides from opencyc is
that the application we want to design is almost impossible to design
using a database (which has nothing for semantics). Our application
focuses on semantics (understanding "what" the user means). Thus if we
are able to make an application using cyc, it should give accurate
results.

Any other database on the other hand would provide values matching
keyword, and will be usually wrong. Opencyc's aim is to make computer
programs "understand".

That is why opencyc is more complicated (because it focuses on
semantics). Thus first we have to learn cycl, then design a program.
The program part will be even more complicated. But once done, this
should give an accurate answer (ofcourse the answer should be first of
all present in the opencyc KB... It is vast but still not complete
with every knowledge)

Please add your comments. What do you guys feel? What about progress
with conceptnet?


On Aug 17, 10:44 pm, Dheeraj Rajagopal <dheeraj.go...@gmail.com>
wrote:

Dheeraj Rajagopal

unread,
Sep 13, 2011, 12:06:39 AM9/13/11
to vit...@googlegroups.com
On Tue, Sep 13, 2011 at 3:01 AM, Devashish Shankar <devashis...@gmail.com> wrote:
I have done a little research on opencyc. It is better suited to our
application than freebase. Opencyc is a knowledge base which covers a
huge amount of knowledge in logical format (predicates). Thus, asking
queries is much simpler as it is in predicate forms. Opencyc is the
most comprehensive KB (Knowledge base) for AI applications.

But, the problem is that it is NOT a database

OpenCyc has a database but the only difference is that they dont use the databases that are familiar to us . They have their proprietary database file .


 
For writing an
application, we need to learn a language "CycL", which is the language
of opencyc. Opencyc provides a good set of tutorials.

What i infer after reading the introduction slides from opencyc is
that the application we want to design is almost impossible to design
using a database (which has nothing for semantics).

This is actually not true. You can design it using a database . Just imagine that all the data have to be stored somewhere or the other and a database is the most efficient way to store it . Semantics is actually the way you look at data . You can arrange the data in the database so that semantics can be incorporated( that is how these knowledge bases are created)


 
Our application
focuses on semantics (understanding "what" the user means). Thus if we
are able to make an application using cyc, it should give accurate
results.

Any other database on the other hand would provide values matching
keyword, and will be usually wrong. Opencyc's aim is to make computer
programs "understand".

Opencyc is just a very good abstraction of the top of a database so that it can easily queried and manipulated using their API . The way I see it , Their cyc language can be used functionally ( a good knowledge and understanding of Discrete Mathematics is required ) to infer knowledge , which is the basic of any AI based application . 



 

That is why opencyc is more complicated (because it focuses on
semantics). Thus first we have  to learn cycl, then design a program.
The program part will be even more complicated. But once done, this
should give an accurate answer (ofcourse the answer should be first of
all present in the opencyc KB... It is vast but still not complete
with every knowledge)

True . Only a 40 % of the Cyc database is out as OpenCyc .. you can imagine that there is a lot more of the knowledge still left unexplored by common people .


 

Please add your comments. What do you guys feel? What about progress
with conceptnet?




The basic code is ready . Will be uploaded to github account soon (in another one or two days). Currently ,
if I give question : what is the color of apple ?
it answers : the most probable answers are ( laptop , a , green , red )

I am finding a way to remove the "laptop ,a " entries . 



--
Regards

Dheeraj

Dheeraj Rajagopal

unread,
Sep 13, 2011, 12:34:01 AM9/13/11
to vit...@googlegroups.com
Oops ! I did not see that the code has been uploaded already ! 

--
Regards

Dheeraj

Devashish Shankar

unread,
Sep 13, 2011, 3:17:42 PM9/13/11
to VIT LUG
OK! Thanks for your clarifications. I think openapi will not support
direct access to the database. To get knowledge we will have to query
the database...

So, I have to write a code that converts the question asked to a
relational form[cycL]! Which seems a daunting task. But ill look into
the api and cycl and see about the means to make it possible.

Is there any other way to make this application? I'll try to look into
the api and see if something like what you did in conceptnet is
possible here.....

Ill upload the POS tagging code. its in java.

On Sep 13, 9:34 am, Dheeraj Rajagopal <dheeraj.go...@gmail.com> wrote:
> Oops ! I did not see that the code has been uploaded already !
>
> https://github.com/vitlug/QAsystem--Python-
>
> On Tue, Sep 13, 2011 at 9:36 AM, Dheeraj Rajagopal
> <dheeraj.go...@gmail.com>wrote:
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages