Fwd: [ln] Stage: Traineeship at the EC's 'Joint Research Centre' (JRC),Terminology discovery, Deadline Extension

12 views
Skip to first unread message

Camilo Thorne

unread,
Feb 11, 2015, 6:15:35 AM2/11/15
to sp...@googlegroups.com

Camilo Thorne

IBM CAS Trento - Trento RISE
Piazza Manci 17, Povo di Trento
38123 (TN) - Italy

"Exegi monumentum aere perennius" 
(Horatius, Ode III-30)

---------- Forwarded message ----------
From: Thierry Hamon <ha...@limsi.fr>
Date: Fri, Feb 6, 2015 at 10:45 PM
Subject: [ln] Stage: Traineeship at the EC's 'Joint Research Centre' (JRC),Terminology discovery, Deadline Extension
To: l...@cines.fr



Date: Fri, 06 Feb 2015 16:20:39 +0100
From: Ralf Steinberger <ralf.ste...@jrc.ec.europa.eu>
Message-id: <02a901d04220$778fbdc0$66af3940$@jrc.ec.europa.eu>
X-url: http://recruitment.jrc.ec.europa.eu/?type=TR&site=IPR
X-url: http://recruitment.jrc.ec.europa.eu/?inst=3460&type=TR


The deadline for the 2 traineeship applications has been extended to 19
February 2015.


We are particularly looking for candidates who would be able to provide
programming support to analyse large datasets (20GB) for terminology
extraction, term variant analysis, etc. The current focus is on English
language texts.

-----

The European Commission's Joint Research Centre (JRC) is looking to fill
two traineeship positions in the field of:

Terminology discovery over time in the field of disaster risk
management.

If you are interested, please follow the instructions provided at the
URLs listed below. (Code: 2014-IPR-G-000-4154 - ISPRA).

Generic URL: http://recruitment.jrc.ec.europa.eu/?type=TR&site=IPR

Job description: http://recruitment.jrc.ec.europa.eu/?inst=3460&type=TR

Traineeship rules:
http://ec.europa.eu/dgs/jrc/downloads/jrc_trainee_rules_en.pdf

Conditions/eligibility: http://ec.europa.eu/dgs/jrc/index.cfm?id=5860

Application deadline:   26 February 2015

Starting date:          around April 2015

Duration:               5 months each

Remuneration:           Up to approximately 1000 Euro per month.

The EMM applications:   http://emm.newsbrief.eu/overview.html

JRC-EMM Publications:
http://langtech.jrc.ec.europa.eu/JRC_Publications.html

Two trainees have been working on this task since September. As the end
users have deemed the results of the motivated and competent team
useful, we plan to continue this work with a new set of two trainees.

DESCRIPTION OF THE ACTIVITY:

The Europe Media Monitor (EMM) group at the European Commission's Joint
Research Centre (JRC) in Ispra, Italy, is looking for two trainees to
work on a project to automatically explore the development of
terminology in the field of 'Disaster Risk Management' (DRM). The
purpose is to give the international stakeholders in that field
(e.g. the United Nations Office for Disaster Risk Reduction UN-ISDR)
concrete and countable evidence of new concepts (terms) emerging in
their field, of changing concepts and of shifts in interest over
time. The study will include both scientific publications and texts
produced by national and international governmental organisations
working in that field.

This first exploratory study will exclusively concern English language
text in the field of Disaster Risk Management, but other languages and
subject areas will be considered in case the outcome of this exploratory
study is deemed concrete and useful. This work may lead to a scientific
publication co-authored by the project contributors.

A scenario to reach this goal of terminology discovery might consist of
the following steps:

(1) Manual or semi-automatic selection and collection of freely
    available documents covering the sub-areas of the life cycle of
    Disaster Risk Management (Prevention and mitigation; Preparedness;
    Response; Recovery and reconstruction);

(2) Conversion of the various file formats (e.g. HTML, PDF, MS-Word)
    into a structured text format (e.g. XML);

(3) Selection of suitable off-the-shelf software for the automatic
    extraction of terms (e.g. noun phrases);

(4) Usage of this software and, if needed, tuning of this software to
    extract lists of potential terms;

(5) Application of statistical methods to select the domain-specific
    terms and to weigh or rank them;

(6) Application of statistical methods that allow to observe trends such
    as the detection of terms that are more frequently or more rarely
    used compared to previous observation periods;

(7) Presentation of the results (term lists, trends) in an
    easy-to-understand manner; This may also include a
    keyword-in-context presentation of the terms, or similar.

The foreseen traineeship duration is five months, starting around March
2015. The working language in the EMM team is English.

REQUIRED QUALIFICATIONS:

The task is foreseen to be carried out jointly by two trainees who, in
combination, possess the skills or satisfy the criteria listed
below. The combination of a more linguistically inclined person and a
programmer could be fruitful.

- Mature student or post-graduate in any of the following fields (or
  similar): computational linguistics, computer science, library
  sciences, machine learning;

- Knowledge of - and experience with - freely available Language
  Technology tools (e.g. for terminology extraction, term weighting,
  categorisation);

- Experience with document format conversion (PDF, HTML, MS-Word etc. to
  text);

- Sufficient programming experience to autonomously implement all
  necessary steps (Java preferred);

- Knowledge of statistical methods for term weighing (e.g.  chi-square,
  TF.IDF) and for automatic categorisation;

- Linguistic sensitivity and an interest for terminology extraction
  (what is a term?; relationships between terms);

- Ability to present the project outcome in a format suitable for DRM
  specialists who may not be so knowledgeable of Information Technology
  (presentation; reporting; visualisation?).

- Ability to work autonomously;

- Team worker;

- Good working knowledge of English plus the ability to communicate in
  at least one other official EU language.

In your application, please state your interests and please provide
clear information on your skill set, by elaborating on the
above-mentioned list.  Should you apply as a ready-made team, please
nevertheless clearly state your personal skills and strengths.

THE JRC TEAM:

The Joint Research Centre (JRC; http://ec.europa.eu/dgs/jrc/) is the
scientific-technical arm of the European Commission. The approximately
2200 JRC employees working in Ispra are from all EU countries and there
are also some non-EU visitors. The working environment is multilingual,
multi-cultural and multi-disciplinary. The JRC's Europe Media Monitor
(EMM) team
(https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems)
carries out research and development in the field of text mining
(Language Technology; Computational Linguistics) for the purposes of
media monitoring.  EMM gathers an average of almost 220,000 online news
articles per day in over 70 languages and analyses them to help its
large international user community understand and use this enormous
amount of media information. EMM is publicly accessible via
http://emm.newsbrief.eu/overview.html. The JRC is also known for having
distributed large quantities of parallel linguistic resources
(https://ec.europa.eu/jrc/en/language-technologies) , including
JRC-Acquis, DGT-Acquis, JRC-Names, the Translation Memories DGT-TM,
ECDC-TM and EAC-TM (https://ec.europa.eu/jrc/en/language-technologies),
and more.

Ralf Steinberger (http://langtech.jrc.ec.europa.eu/RS.html)
European Commission - Joint Research Centre (JRC)
21027 Ispra (VA), Italy

URL - Applications: (http://emm.newsbrief.eu/overview.html)
http://emm.newsbrief.eu/overview.html

URL - Resources: https://ec.europa.eu/jrc/en/language-technologies

URL - Publications:
http://langtech.jrc.ec.europa.eu/JRC_Publications.html


Reply all
Reply to author
Forward
0 new messages