Hi Dario,
> On 25. Oct 2021, at 21:14, Dario Bonaretti <
bonaret...@gmail.com> wrote:
>
> I need to annotate several thousands of short documents (e.g., online reviews, tweets). Can anyone advise on the best approach for importing this data?
Sounds like the "Dynamic workload" [1] management mode is for you. That mode allows you to define how many annotators should label a document before the document is considered to be finished. The assignment of documents to annotators happens automatically. Mind that annotators need to explicitly mark a document as finished. In the dynamic mode, marking a document as finished is the only way of being able to access the next document.
> Right now, I'm uploading each review as a .txt documents. In total I have some 2k reviews/documents for a total of ~8MB. Moving forward, I'll probably need to upload more batches like this, that is, 2-3k documents, each one containing one review.
>
> Does this approach make any sense or should I really try to keep all reviews on one document?
If your annotation task is to assign labels at the level of a full review, then modelling these annotations as a "document metadata" [2] layer is likely to be more convenient to your annotators than having to manually mark the span of a review in order to create a span-level annotation and then assigning a label to that span-level annotation. You will find that the recently introduced "singleton" [3] option on document metadata layers is very useful for this purpose.
If every review is a document, then annotators finish the reviews one-by-one. The action to finish a document currently involves a confirmation dialog. Having to go through that dialog hundred of times may be annoying to the annotators.
You could reduce the frequency of the confirmation dialog by having for example batches of 10 reviews per document. If sentence boundaries are not crucial for your annotation process, you may want to import the data in such a way that each review is treated as a single sentence. You may also want to increase the default page size in the annotation editor to your batch size [4].
If your annotation task is to annotate words/short spans within each review, then you should be fine with these mini-batches at this point. But if your task is annotating at the level of the full review, then your annotators would now face the issue of having to first create an annotation for the span of the review and then assigning a label to it. Assuming you configure you annotation layer as a "sentence level" layer and that you have imported each review as "one sentence", creating an annotation is as easy as double-clicking anywhere within the sentence. But it still is a step that needs to be taken for each sentence. Also, once the annotation has been created, the default "brat (one line per sentence)" visualization mode can no longer line-wrap the sentence - which can be quite inconvenient. This problem can be removed by switching to the "brat (break at 120 chars)" visualization mode via the preferences dialog on the annotation page. Presently, every annotator has to make that switch once manually - the project manager cannot (yet) set a default visualization mode.
> I would also like some input on another question: The reviews have some metadata which I would like to pass to inception (e.g., author's name, review's score from 1 to 5). I saw there's an experimental feature for importing metadata, would you say that's the way to go?
If you have one review per document, you can model them as a document-metadata layer. If you have "one review as a sentence", you can model them as a sentence-level layer. If you do not want to display this metadata to the annotators, you can uncheck the "enabled" checkbox of the layer. If you want to display them but not let the user edit them, you can check the "read only" checkbox.
Cheers,
-- Richard
[1]
https://inception-project.github.io/releases/21.1/docs/user-guide.html#sect_dynamic_workload
[2]
https://inception-project.github.io/releases/21.1/docs/user-guide.html#_document_metadata
[3]
https://inception-project.github.io/releases/21.1/docs/user-guide.html#_singletons
[4]
https://inception-project.github.io/releases/21.1/docs/admin-guide.html#sect_settings_annotation