How to train numberbatch embeddings?

154 views
Skip to first unread message

janna syberia

unread,
Feb 19, 2020, 8:39:40 AM2/19/20
to conceptnet-users
Hello,
I would like to train the numberbatch embeddings and have some questions. 
How to use conceptnet5.vectors? 
Is there a main function that starts the train function? 
Do i have to build a database with CN Graph to be able to make retrofitting, or is it possible to work only with CSV files of assertions? 

Thank you very much!


Robyn Speer

unread,
Feb 20, 2020, 1:38:23 PM2/20/20
to conceptn...@googlegroups.com
Hello Janna,

There isn't particularly an API that we provide for retrofitting with different data besides ConceptNet. It's part of the ConceptNet build process, which is run by snakemake.

The input to retrofitting is in fact a CSV file, data/assoc/reduced.csv, that's built by other steps of the process. If you were to change that file (and remove or revise the rule in the Snakefile that would rebuild it), you would be able to try retrofitting with your own data.

On Wed, Feb 19, 2020 at 9:16 AM Catherine Havasi <hav...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "conceptnet-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to conceptnet-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/conceptnet-users/93397532-37e3-4207-a6ec-5ce1317c7d45%40googlegroups.com.

janna syberia

unread,
Feb 21, 2020, 4:23:26 AM2/21/20
to conceptnet-users
Hello Robyn,

many thanks for the quick answer!
I want to work with the slightly modified ConceptNet data. 
But I only want to use the embeddings. 
So if I change the Snake file, can I learn embeddings without building a database?

Am Donnerstag, 20. Februar 2020 19:38:23 UTC+1 schrieb Robyn Speer:
Hello Janna,

There isn't particularly an API that we provide for retrofitting with different data besides ConceptNet. It's part of the ConceptNet build process, which is run by snakemake.

The input to retrofitting is in fact a CSV file, data/assoc/reduced.csv, that's built by other steps of the process. If you were to change that file (and remove or revise the rule in the Snakefile that would rebuild it), you would be able to try retrofitting with your own data.

On Wed, Feb 19, 2020 at 9:16 AM Catherine Havasi <hav...@gmail.com> wrote:
---------- Forwarded message ---------
From: 'janna syberia' via conceptnet-users <conceptn...@googlegroups.com>
Date: Wed, Feb 19, 2020, 8:39 AM
Subject: [conceptnet-users] How to train numberbatch embeddings?
To: conceptnet-users <conceptn...@googlegroups.com>


Hello,
I would like to train the numberbatch embeddings and have some questions. 
How to use conceptnet5.vectors? 
Is there a main function that starts the train function? 
Do i have to build a database with CN Graph to be able to make retrofitting, or is it possible to work only with CSV files of assertions? 

Thank you very much!


--
You received this message because you are subscribed to the Google Groups "conceptnet-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to conceptn...@googlegroups.com.

Robyn Speer

unread,
Feb 26, 2020, 5:45:35 PM2/26/20
to conceptn...@googlegroups.com
Yes -- as of version 5.7 the embeddings do not depend on the database. You can build just the embeddings with:

snakemake clean   # deletes any data already built
snakemake data/vectors/plain/numberbatch.txt.gz

To unsubscribe from this group and stop receiving emails from it, send an email to conceptnet-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/conceptnet-users/5a5d04ec-7b16-4b7b-b86c-21e85b4ab075%40googlegroups.com.

Filip Ilievski

unread,
Feb 28, 2020, 12:16:32 PM2/28/20
to conceptnet-users
Hello Robyn,

Do I gather correctly that I can: 
1. take a set of triples (e.g., load a subset of DBpedia), 
2. process that to produce 'data/assoc/reduced.csv' that contains a certain predefined structure
3. run the two snakemake lines you listed above
and, et voila, i have numberbatch embeddings of my data?

If so, could you please provide an example of a line in reduced.csv?

Thanks,
Filip

Robyn Speer

unread,
Mar 2, 2020, 2:33:51 PM3/2/20
to conceptn...@googlegroups.com
I don't know what your actual goal is, and I feel like providing these kinds of details one at a time won't really get you closer to your goal. So here's my general suggestion:

- Run the actual ConceptNet Numberbatch build with `snakemake data/vectors/plain/numberbatch.txt.gz`.
- Look at the files it created in data/, so you have an example you can look at and figure out what you want to change.
- Change either the build rules (in Snakefile) or the code (perhaps in conceptnet5/readers/) so it does the thing you want it to do.


To unsubscribe from this group and stop receiving emails from it, send an email to conceptnet-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/conceptnet-users/a23e64d4-ed7e-405b-9192-d05fab538df2%40googlegroups.com.

Filip Ilievski

unread,
Mar 3, 2020, 2:53:54 PM3/3/20
to conceptnet-users
I want to compute embeddings of a KG that contains ConceptNet, as well as other resources like Wordnet and Wikidata. 
By 'actual goal' presumably, you mean the intended use of these computed embeddings. They would be used on downstream tasks like QA.
Would you say that the numberbatch approach+code would work for this use case with minor adaptations?

I tried to follow your instructions but I get errors with step 1. Is there a more extensive guide on this? (I know the CN wiki has instructions on how to set CN from scratch, yet this is somewhat orthogonal to that)

Filip

Robyn Speer

unread,
Mar 6, 2020, 11:33:15 AM3/6/20
to conceptn...@googlegroups.com
Okay, great, that helps. Because this tells me that you don't want to replace the data, you want to add new sources and integrate it with the data that's already in ConceptNet.

WordNet is already in there, so probably what you need to add is Wikidata. In the conceptnet5/readers directory, there are modules that take in a data source and turn it into a stream of ConceptNet edges (in msgpack format). For a simple example, you could look at conceptnet5/readers/emoji.py.

You should then add a command for it to conceptnet5/readers/cli.py (following the other examples there), which will provide a "cn5-read wikidata" command that turns a dump of Wikidata into the appropriate edges.

Suppose you're going to call the result "edges/wikidata/wikidata.msgpack". You'd add "wikidata/wikidata" to DATASET_NAMES in the Snakefile, then add a "rule read_wikidata:" that describes how to build it using the "cn5-read wikidata" command, following the example of "rule read_wordnet:" or "rule read_emoji:".

Then you'd add the data source to DATASET_NAMES in the Snakefile, 



To unsubscribe from this group and stop receiving emails from it, send an email to conceptnet-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/conceptnet-users/10d61cb6-6ce9-4163-89df-248c990ec01c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages