Proposed improvements to BratReader

2 views
Skip to first unread message

Alain Désilets

unread,
Dec 12, 2019, 3:18:53 PM12/12/19
to dkpro-core-user
Attached is a small project that illustrates some improvements I propose to make to BratReader. I didn't put them in a Pull Request because I wanted to show the difference between the current implementation (BratReader.java) and my proposed change (BratReaderForgiving.java) and I didn't want to commit the latter only to have to rename it to BratReader.java later on.

The unit tests in BratIssuesTest.java illustrate the problems that these improvements aim to fix. 

In a nutshell, it allows you to:

1) Create a BratRead without having to provide mappings. 

- BratReader already knows mappings for the Annotations defined in the standard dkpro-core type system (ex: Person)
- If you provide a PARAM_MAPPING, those mappings are added to the default ones (Note: not working at the moment because of a bug with serialization of Mapping)
- If the .ann file contains a label that is defined in neither of the default or PARAM_MAPPING mappings, it will use a "catch-all" Annotation type (NamedEntity for the moment, but could be something else)

2) Pass a directory or file to PARAM_SOURCE_LOCATION without having to worry about things like:

- Adding *.ann at the end of the directory path (the Reader adds it automaticaly)
- Making sure to pass the .ann file as opposed to the .txt file (the Reader automatically converts it to .ann path)
- Making sure that the single file, or all the the .txt files in the directory have a corresponding .ann file (the Reader automatically creates empty .ann files for orphan .txt files)

If this seems appropriate, I will create a feature request followed by a Pull REquest.


BratIssues.tar.gz

Richard Eckart de Castilho

unread,
Dec 13, 2019, 2:31:09 AM12/13/19
to dkpro-c...@googlegroups.com
Hi Alain

> On 12. Dec 2019, at 21:18, Alain Désilets <alainde...@gmail.com> wrote:
>
> Attached is a small project that illustrates some improvements I propose to make to BratReader. I didn't put them in a Pull Request because I wanted to show the difference between the current implementation (BratReader.java) and my proposed change (BratReaderForgiving.java) and I didn't want to commit the latter only to have to rename it to BratReader.java later on.

Thanks for the suggestions! They sound like useful additions. The PARAM_LOCATION improvement in particular seems to be something that could be a general improvement across all kinds of readers. I didn't look at the changes in detail yet.

Maybe I could convince you to create a PR right away? It would facilitate review and discussion:

* I don't have to unpack/import a new project into Eclipse or apply a patch - I can review directly on the GitHub website
* line comments can be made on the GitHub website and changes can be discussed in detail
* Jenkins can be asked to automatically build and test the changes

Usually, I ask people to create a PR right away when they start working on an issue. Of course, an issue explaining the desired improvements should be created first and the issue number should be referenced in the commit comments as per [1]

There is no need to make the changes in a copy of the file and later rename that. Just make the changes directly in the affected files and commit them to the PR. GitHub will show the proposed changes and the original side-by-side such that the changes can easily be reviewed. Making the changes in a copy of the file complicates the process unnecessarily.

Does that sound reasonable to you?

Cheers,

-- Richard

[1] https://dkpro.github.io/contributing/ (Section: Preparing a pull request)

Alain Désilets

unread,
Dec 13, 2019, 6:00:44 AM12/13/19
to dkpro-c...@googlegroups.com
OK, I will issue a Feature request describing the suggested improvements and then create a PR.

On Fri, Dec 13, 2019 at 2:31 AM Richard Eckart de Castilho <richard...@gmail.com> wrote:

Thanks for the suggestions! They sound like useful additions. The PARAM_LOCATION improvement in particular seems to be something that could be a general improvement across all kinds of readers. I didn't look at the changes in detail yet.

Would you like me implement the improvements for all readers? If so, I will need some guidance in terms of which superclass of BratReader should be modified. Essentially all I did was to modify the getSourceLocation() method in BratReader.

 

Richard Eckart de Castilho

unread,
Dec 13, 2019, 6:34:27 AM12/13/19
to dkpro-c...@googlegroups.com
On 13. Dec 2019, at 12:00, Alain Désilets <alainde...@gmail.com> wrote:
>
> Would you like me implement the improvements for all readers? If so, I will need some guidance in terms of which superclass of BratReader should be modified. Essentially all I did was to modify the getSourceLocation() method in BratReader.

Let's see the PR first.

-- Richard
Reply all
Reply to author
Forward
0 new messages