Metadata tables in AntConc4

635 views
Skip to first unread message

Martin Wynne

unread,
Sep 30, 2022, 3:51:00 AM9/30/22
to AntConc-Discussion
I have built a new corpus comprising 104 novels in plain text format. I have successfully created a corpus, and am able to query it.

I also have some file-level metadata that I would like to make use of. This is basically the full titles of the novels and a date for each one. I see that there is an option to add 'metadta tables' in the Corpus Manager tool, but I don't know what formats input files should be.

Is it possible to add metadata in this way, and if so, what is the input file format?

Many thanks for any help,
Martin

Martin Wynne

unread,
Oct 20, 2022, 5:12:54 AM10/20/22
to AntConc-Discussion
Hi,

Has anyone got any thoughts on this? Has anyone actually used the function to add 'metadata tables' to a corpus in AntConc?

Martin

Martin Wynne

unread,
Jun 19, 2023, 11:33:56 AM6/19/23
to AntConc-Discussion
Hi,

I'm still hoping to get an answer to the question about how to add metadata tables in AntConc. Any ideas anyone?

Best wishes,
Martin

elco...@gmail.com

unread,
Jun 24, 2023, 9:27:44 AM6/24/23
to AntConc-Discussion
Hi Martin,

So, on page 17 in AntConc's handbook (https://www.laurenceanthony.net/software/antconc/releases/AntConc420/help.pdf), it says the following about metadata tables:

"If you click on "Add File(s)" or "Add Directory", you can choose optional metadata tables that will be stored as SQLite database tables together with your raw corpus data. The information in these metadata tables must be aligned with the column names used in the existing tables of the corpus. To understand the default table structure, open the corpus database in an SQLite database reader (e.g., https://sqlitebrowser.org/) and view the different tables."

Don't really know how to work with SQLite though. Hope this helps.

seralvarezuribe

unread,
Jun 25, 2023, 11:20:37 PM6/25/23
to AntConc-Discussion
Thanks, elco for the reference. I'm in a similar situation as Martin. I tried to follow those instructions without good results. It seems some more detail is needed. It would be great to see an example of adding a metadata file to an existing corpus. In my case, I have a database with several variables related to the file name of each of my documents in my corpus, but I don't know how to format it to link them. I add a .csv file with the data through the Add File in the advance options in the Corpus Manager, but AntConc gives me an error that says that the table does not have the right format. Any advice will be well appreciated.

Best wishes,
Sergio

Laurence Anthony

unread,
Aug 31, 2023, 9:24:35 AM8/31/23
to ant...@googlegroups.com
Hi everyone,

Sorry for the late reply about this matter. I've now created a working example of how to create a corpus. Here are the instructions:

1. Unzip the attached zip file. It includes a metadata table, and three corpus files:
metadata file: animal_metadata.tsv
corpus files: cat_1.txt, cat_2.txt, dog_1.txt

 I've also included a copy of the resulting corpus database that you should be able to recreate:
metadata_demo.db

2. Open AntConc and navigate to the File Menu->Corpus Manager->Raw files corpus building

3. Name your new corpus as "metadata_demo" by typing the name into the "Corpus Name" entry box.
4. Drag and drop the corpus files into the "Corpus Files" list and drag and drop the metadata file into the "Metatable Table(s)" list. You can also use the buttons if you want.
5. Click "Create" to create your new corpus.

6. In the KWIC tool, search for "the". You should find 6 hits, with 4 hits in the 'cats' files and 2 hits in the 'dog' file.
7. Now click on "Adv Search", activate the "SQL Search option"  and enter the following as a SQL search:
["animal_metadata", "animal = 'cat'", "doc_id"]
This advanced search will effectively filter all search results to ensure that the file has a "cat" label in the metadata table "animals" column.

8. Apply the settings, and then in the KWIC tool, search for "the" again. You should find that the results are now filtered for only files that are about "cats".

Let me know if this demo works well. If it is fine, I'll consider making a video demo.

Regards,

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/55889b69-9c86-43e0-a477-b366be287b1dn%40googlegroups.com.
metadata_demo.zip

Martin Wynne

unread,
Sep 5, 2023, 4:25:06 AM9/5/23
to Laurence Anthony, ant...@googlegroups.com
Dear Laurence,

That's great, thanks. This demo works for me. I have a few questions about how to apply this to my corpus.

Are the lines of the metadata table linked to the files by the order in which the files are loaded? I note that the metadata tables don't reference the file names, and the files don't reference the doc_id. When I try this with my corpus, they don't match up as I expected, unless I number the doc_id values from 0.

So, could you please clarify, what are the restrictions on the 'doc_id' values? Do they have to be numbers, and do they have to start with 0?

Could you please point me towards more information on the syntax of the SQL queries? Can I use wildcards? And 'NOT' operators?

Many thanks for any further assistance!

Best wishes,
Martin
-- 
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK
martin...@ling-phil.ox.ac.uk
https://orcid.org/0000-0002-4155-0530

Laurence Anthony

unread,
Sep 7, 2023, 9:24:17 PM9/7/23
to Martin Wynne, ant...@googlegroups.com
Hi Martin,

Glad to hear that the demo works.

Here are the answers to your questions:

>Are the lines of the metadata table linked to the files by the order in which the files are loaded? I note that the metadata tables don't reference the file names, and the files don't reference the doc_id. When I try this with my corpus, they don't match up as I expected, unless I number the doc_id values from 0.

No. All metadata tables are *joined* to the main "corpus" table using the "doc_id" column of the "corpus" and the matching column of the metadata table that you specify in the query as the shared key. See the example below for a search for "the" including the metadata (as in the example above).
["animal_metadata", "animal = 'cat'", "my_doc_id"]
The underlying true sql query would be as follows:
SELECT 'the' FROM corpus, animal_metadata WHERE corpus.doc_id = animal_metadata.my_doc_id AND animal_metadata.animal = 'cat'

The idea is that rather than have the user create the complex true sql above, AntConc helps to build it using those three parameters.

>So, could you please clarify, what are the restrictions on the 'doc_id' values? Do they have to be numbers, and do they have to start with 0?

The metadata table doc ids have to match the corpus.doc_id values, which by default are integers starting at 0. If you loaded your own corpus, it might be the case that the doc ids have different values. You just need them to match.

>Could you please point me towards more information on the syntax of the SQL queries? Can I use wildcards? And 'NOT' operators?

If you look at the above true sql query, you can see that you are affecting the part of the query after the AND. So, yes, you could use SQL wildcards and NOT operators. Here's an example
["animal_metadata", "animal != 'cat'", "my_doc_id"]
["animal_metadata", "animal LIKE 'cat*'", "my_doc_id"]

Does that help?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Martin Wynne

unread,
Sep 8, 2023, 4:41:00 AM9/8/23
to Laurence Anthony, ant...@googlegroups.com
HI Laurence,

Thanks, that's all very helpful. One thing remains unclear. I don't get any hits searching for 'the' with an advanced SQL Search for:

["animal_metadata", "animal LIKE 'cat*'", "doc_id"]

using the files for your metadata demo. The NOT operator works fine. Does the 'LIKE' query work for you?

Best wishes,
Martin

Laurence Anthony

unread,
Sep 8, 2023, 9:14:10 AM9/8/23
to Martin Wynne, ant...@googlegroups.com
Hi Martin,

Sorry, I was forgetting my SQL wildcards. The correct query is:

["animal_metadata", "animal LIKE 'cat%'", "doc_id"]


Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Reply all
Reply to author
Forward
0 new messages