[ConceptNet Beginner] Installation of prebuilt database and Tutorial for Conceptnet and quering Concepts

993 views
Skip to first unread message

Neelesh Dewangan

unread,
Sep 12, 2015, 10:25:30 PM9/12/15
to conceptnet-users
Hello Everyone,

I am new to the conceptNet and python. I have downloaded prebuilt database of conceptNet provided here:
http://conceptnet5.media.mit.edu/downloads/current/

I have gone through many threads but I couldnt find the any installation guide related to the prebuilt database.

Also I need some tutorial refrences to query concept over database which might help me to understand how to build program and algorithm related to conceptNet.

Any kind of Help i appreciated..

Thanks in Advance..

David Chae

unread,
Sep 16, 2015, 10:30:14 AM9/16/15
to conceptnet-users
Did you see this page?


Maybe this page will help you. I hope so.

Sincerely.



2015년 9월 13일 일요일 오전 11시 25분 30초 UTC+9, Neelesh Dewangan 님의 말:

Amirouche Boubekki

unread,
Sep 16, 2015, 10:30:14 AM9/16/15
to conceptnet-users


On Sunday, September 13, 2015 at 4:25:30 AM UTC+2, Neelesh Dewangan wrote:
Hello Everyone,

Héllo Neelesh!
 
I am new to the conceptNet and python. I have downloaded prebuilt database of conceptNet provided here:
http://conceptnet5.media.mit.edu/downloads/current/

I have gone through many threads but I couldnt find the any installation guide related to the prebuilt database.
 
Me too! In theory I think It's possible to put the .msgpack files at the correct location in the source tree and run the good ninja build command(s). But I did not try.

The other solution is to use the REST API.
 
What I do is load the .msgpack files inside a graph database. It takes 3 hours (on my machine with a SSD). I don't implement ConceptNet5 REST API (I probably should)
but you can still query the concepts and write your own "step" to have custom queries. I attached the load script I use.

Also I need some tutorial refrences to query concept over database which might help me to understand how to build program and algorithm related to conceptNet.

Queries depends on the problem that you want to solve.

First the article about ConceptNet is a must read [1]. Here a summary/extracts:

While WordNet is optimised for lexical categorisation and word-similarity
determination, and Cyc is optimised for formalised logical
reasoning, ConceptNet is optimised for making practical
context-based inferences over real-world texts.

Context-based inference methods allow ConceptNet to
perform interesting tasks such as the following:

• ‘given a story describing a series of everyday events,
where is it likely that these events will take place, what is
the mood of the story, and what are possible next
events?’ (spatial, affective, and temporal projections),

• ‘given a search query (assuming the terms are
commonsensical) where one of the terms can have
multiple meanings, which meaning is most likely?’
(contextual disambiguation),

• ‘presented with a novel concept appearing in a story,
which known concepts most closely resemble or
approximate the novel concept?’ (analogy-making).

ConceptNet embraces the ease-of-use of  WordNet’s semantic
network representation, and the richness of Cyc’s content.
While WordNet excels as a lexical resource, and Cyc excels
at unambiguous logical deduction,

ConceptNet’s forte is contextual commonsense reasoning   
making practical inferences over real-world texts, such as
analogy, spatial-temporal-affective projection, and contextual
disambiguation.


Tasks ConceptNets can achieve:
  • Contextual neighbourhoods (through assoc-space or SimRank?)
  • Realm-filtering
  • Topic generation
  • Analogy-making: “Stated concisely, two ConceptNet nodes are analogous if their sets of back-edges (incoming edges) overlap”
  • Projection: following a single transitive relation-type. ‘Los Angeles’ is located in ‘California’, which is located in ‘United States’, which is located on ‘Earth’ is an example of a spatial projection, since LocationOf is a transitive relation.
  • Topic gisting
  • Disambiguation and classification: similar to the ones taken by statistical classifiers which compute classification using cosine-distance in high-dimensional vector space. The main difference in our approach is that the dimensions of our vector space are commonsense-semantic (e.g. along dimensions of time,space, affect) rather than statistically based (e.g. features such as punctuation, keyword frequency, syntactic role).
  • Novel-concept identification: take as input a document and a novel concept in that document. It outputs a list of potential things where the novel concept might be by making analogies to known concepts.
  • Affect sensing
Applications of ConceptNet

  • Observes a user writing an e-mail and proactively suggests photos relevant to the user’s story.
  • Story-generator that allows a person to interactively invent a story with the system
  • Product recommendatino from Amazon.com by using ConceptNet to reason about a person’s goals and desires, creating a profile of their predicted tastes.
  • Speech-based conversation understanding system that uses commonsense to gist the topics of casual conversations.
  • ‘Commonsense Predictive Text Entry’ [30] leverages ConceptNet to understand the context of a user’s mobile-phone text-message and to suggest likely word completions.
[1] http://web.media.mit.edu/~push/ConceptNet-BTTJ.pdf

If you don't know about NLP you have to read the wikipedia article [2]. and the main task of NLP [3]. This sound very abstract to me, so I started to watch the Standford NLP Coursera [4]. It's helpful as introduction material to get a grasp of the different low level tasks (parsing, tokenization, data analysis, some algorithm) and some higher level things (Question answering, Summarization, Information Retrival). I assume you are interested by the latter.

To be honest, I'm not sure what I am doing, I tried to skip the lowlevel stuff since it's already done, I assigned myself the task to extract interesting information from bbc articles [5] to start with, here is what I do:

- I load every article from a tech category (I choosed that category because I know the domain)
- I remove stopwords from the articles
- I link words from the articles to concepts in ConptNet going through some heuristic to match stem to concepts when words don't exists in ConceptNet as is

So far, that's it. I need to replace the current library I use with spaCy to parse article because it has better tokenizer than porter2.

I plan to use theorical graph algorithm (that's how people call them..) like SimRank or CoSimRank, Shortest Path [6].
 
And keep an eye on "Term rewritring" stuff. I'm not a this point though. Right now, I try to get a better understanding
of conceptnet through the bbc data by walking, exploring the resulting graph to understand how it behaves or how it
can behave. There is a lot "manual" work which both translates into exploring the data and building Dynamic Programming
algorithm

My short term goals:

- Add missing concepts or missing links between the articles and concepnet
- Extract significant single word concepts, topics, from the articles, brands, geographical locations and dates
-
Make sure that topics are merged into similar concepts.
- Create a hierarchy of topics.
-
Create surface texts (summary) for articles for instance: Datastax released Cassandra

Other possible goals:

  • summarize hackernews articles which have no pre-computed category (but a lot of data) [7]
  • summarize stackoverflow "clusters" to add a "group of questions view" that answers the question: «How create a REST API in Python»  listing all questions related to this topic avoiding possible duplicates. The previous query which can be the parent of another: «How to create a REST API in Python using Django RESTFramework». Clusters already exists in SO dumps implicitly through PostLink table (typed duplicate or related) [8]. You can group them using ML or Dynamic Programming exluding duplicate questions, but the goal is really to make sens of data, and possibly use the algorithm for future unlabeled data.
  • Or implement a vague/list search queries like «What are common pitfalls of building REST API», «What are the most common security issues in Web dev» which are not a valid SO questions, but can be aggregated from SO data. The previous point ressembles the new wikibase «Query» feature which allows other wikis to use the structured data of wikibase as a source of knowledge using SparQL query.
Getting Wikibase to work with conceptnet is an intersting topic, I think. It has another way to link items between them going through property node, which you can link to other properties, I think it's best represented as an hypergraph. I also discovered http://rest.wikimedia.org/ (down/slow at this time) which makes available wikis in various format among which html and parsoid, a json representation of the wikimarkup, this can make maybe the code to build conceptnet faster.

IMO compared to learning programming, the path is more obscure

[2] https://en.wikipedia.org/wiki/Natural_language_processing
[3] https://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
[4] https://class.coursera.org/nlp/lecture
[5] I use the full dataset: http://mlg.ucd.ie/datasets/bbc.html
[6] https://www.hackerrank.com/domains/algorithms/graph-theory
[7] https://archive.org/details/HackerNewsStoriesAndCommentsDump
[8] https://archive.org/details/stackexchange

HTH,

Neelesh Dewangan

unread,
Sep 16, 2015, 2:05:29 PM9/16/15
to conceptnet-users

Error is Coming as: sqlite3.OperationalError: unable to open database file

here is the traceback information

ERROR:conceptnet5:Exception on /web/c/en/brown [GET]
Traceback (most recent call last):
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask_Cors-2.1.0-py3.4.egg/flask_cors/extension.py", line 110, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/neel/Desktop/virtualEnvImages/cNetDocker/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/neel/Downloads/ConceptNet/data/conceptnet5-master/conceptnet5/web_interface/web_interface.py", line 68, in edges_for_uri
    edges = list(lookup(uri, limit=100))
  File "/home/neel/Downloads/ConceptNet/data/conceptnet5-master/conceptnet5/query.py", line 75, in lookup
    self.load_index()
  File "/home/neel/Downloads/ConceptNet/data/conceptnet5-master/conceptnet5/query.py", line 58, in load_index
    self._db_filename, self._edge_dir, self.nshards
  File "/home/neel/Downloads/ConceptNet/data/conceptnet5-master/conceptnet5/formats/sql.py", line 211, in __init__
    self._connect()
  File "/home/neel/Downloads/ConceptNet/data/conceptnet5-master/conceptnet5/formats/sql.py", line 216, in _connect
    self.dbs[i] = sqlite3.connect(filename)
sqlite3.OperationalError: unable to open database file

 

Neelesh Dewangan

unread,
Sep 17, 2015, 8:54:29 PM9/17/15
to conceptnet-users
Finally able to do it..

I was not declaring the database link correctly.. It should be linked to data folder on which prebuilt database has been extracted.

Easiest way is of installation is to download the database and vector file directly and just extract the compressed file and make link correctly.

thankx for the help And the links David and Amirouche!!
Reply all
Reply to author
Forward
0 new messages