Recommendation for Annotations

Marius

unread,

Mar 9, 2021, 5:03:41 AM3/9/21

to inception-users

Hi,

I am planning to conduct an annotation project on a collection of around 80k tweets.

Some of the annotations relate to the entire tweet, so they are modelled as Document Metadata.

Consequently, there will be around 80k files/tweets to be imported and processed.

Are there any known problems that I will run into with this approach or should it work just fine? If so, are there better alternatives? (I would like to avoid having to create a span that covers the entire tweet every time.)

Best Regards

Marius

Jan-Christoph Klie

unread,

Mar 9, 2021, 6:12:01 AM3/9/21

to inception-users

Hi,

I think we do not support recommendations for document metadata yet. What kind of recommender are you planning to use? 80k tweets are a lot and the current implementation for external recommenders might not work for you because we are always sending all documents. We are working on v2 of external recommender right now where we will fix this.

Regarding a comfortable annotation process with spans, you can have sentence level layers for the tweets, then you need just a double click to create them. We also had a project annotating tweets, they grouped 100 tweets per file. We have a Python script for that that makes it look ok with wrapping lines and haveing newlines between tweets. Please let me know whether you want to go this way.#

What is your time frame for this project?

Best,

Jan from INCEpTION

Marius

unread,

Mar 9, 2021, 8:51:54 AM3/9/21

to inception-users

Thank you for the quick and informative reply!

Sorry, my first message was not quite clear. We do not need recommenders for document metadata, but it is good to know that this currently impossible anyways.

We plan to use recommenders for some span annotations though. For now, we only use the built in StringMatcher. We thought about training a custom recommender, once we had a good portion of the data annotated. We will stick to StringMatching if external recommenders are infeasible for the size of our dataset.

Separate Sentence Level Layers sound like a good solution, yes! Also, 100 tweets per file works well.

Given the number of tweets, we will not have every tweet annotated by all annotators but distribute chunks across different subsets of annotators. Inception currently has no implementation for this kind of behaviour, right?

Thanks for the offer, but I did not get what exactly the python script is used for, tbh.

The time frame is not set in stone yet, we aim to start around late March. We are unsure how long the annotation will take, as the number of available annotators is still being negotiated.

Best Regards

Marius

Jan-Christoph Klie

unread,

Mar 9, 2021, 8:59:55 AM3/9/21

to inception-users

The recommender work also for many documents, just training interactive (external) recommender is not a good idea for many documents. If you pretrain a model and only use it to predict then it is not a problem. We have a framework for that [1]. Recommender predict only for the current document a user annotates, but external ones would send all documents for training purposes. You can disable training then it will work.

If you annotate tweets in INCEpTION, then you will see that the line wrapping goes away once you annotate and the end of the sentence goes off screen. I have a script that formats the documents to make it less ugly.

INCEpTION has a workload assignment that is disabled by default. You can say that you want each document to be annotated X times and then INCEpTION assigns annotators until that quota is reached [2] . In this mode, annotators then cannot select what they want to annotate but they are given a document whenever they click on the "Annotate" item in the menu.

[1]: https://github.com/inception-project/inception-external-recommender

[2]: https://inception-project.github.io/releases/0.18.2/docs/user-guide.html#sect_workload

Marius

unread,

Mar 9, 2021, 9:29:55 AM3/9/21

to inception-users

Great, thanks! The workload assignment and external recommenders sound like a perfect match then.

I see, if you would share the script, I would gladly take it.

Jan-Christoph Klie

unread,

Mar 9, 2021, 9:51:27 AM3/9/21

to inception-users

You can find it in [1]. It expects a tsv file where the first column is the tweet id and the second the tweet itself. For reasons, I also removed emojis but you can skip that in newer INCEpTION versions. The main idea is to split lines after 100-120 chars and add two newlines between each tweet.

[1]: https://gist.github.com/jcklie/85eefc2ea713da7354814ca4c2a184aa

Reply all

Reply to author

Forward