Evaluation of the Machine Translation of Scientific Documents

Inria
December 31, 2022
Contact:N/A
Offerd Salary:Negotiation
Location:N/A
Working address:N/A
Contract Type:Other
Working Time:Negotigation
Working type:N/A
Job Ref.:N/A

2022-05571 - Evaluation of the Machine Translation of Scientific Documents

Contract type : Internship

Level of qualifications required : Master's or equivalent

Fonction : Internship Research

Context

This internship will take place in the context of the ANR project MaTOS (Machine Translation for Open Science), which aims to develop new methods of automatically translating and evaluating scientific documents. The project focuses on translation between English and French, for which resources are readily available and translations are of a reasonable quality and coherence. The internship could potentially lead to a PhD thesis starting in September 2023 financed by MaTOS. The internship will be supervised by Rachel Bawden and will involve collaborations with the other partners in the project, notably François Yvon (CNRS).

The length of the internship is 6 months starting on the 1st March 2023 at the earliest.

Assignment

The topic of this internship is the evaluation of machine translation (MT) of scientific documents. The automatic evaluation of MT is a crucial component of model development and remains a challenging subject. The development of automatic metrics, which seek to replicate human judgments of translation quality, is a major area of study, and many metrics exist, from simple ones that rely on counting lexical overlap such as BLEU (Papineni et al., 2002) and METEOR (Banerjee and Lavie, 2005) to those relying on more recent techniques (e.g. pre-trained neural language models) such as BERTscore (Zhang et al., 2020), BARTScore (Yuan et al., 2021), BLEURT (Sellam et al., 2020) and COMET (Rei et al., 2020). Besides the general challenges faced when defining MT metrics, the evaluation of the MT of scientific documents poses specific challenges, one of them being the heavy use of domain-specific terms, which, if translated incorrectly, severely impact the quality of the translation. Evaluation metrics should therefore also be sensitive to specific challenges faced by the evaluation of scientific documents: (i) the correct translation of terms, (ii) the coherent translation of terms within a document (with respect to term variants, use of acronyms, etc.) and (iii) the capacity to maintain a logical argument between sentences and sections. Previous work has suggested provided complementary measures to evaluate these specific aspects such as correct term translation (Alam et al., 2021) and lexical cohesion (Wong and Kit, 2012).

This internship will involve exploring alternative ways of evaluating terminological aspects of scientific document translation. Inspired by the use of question-based metrics to evaluate text generation tasks (Scialom et al., 2021),1 one possible direction is to explore how terminologies, relation extraction and information extraction can be used as a means of evaluation of translation quality. For example, (i) can the same relations be found between a reference (human-produced) translation and an automatically produced one? (ii) can terms be matched in similar parts of the document? (iii) how coherent is the use of terms within a document? and (iv) can the same information be extracted from an MT output and the source or reference text? The internship will involve both analysis of existing data, for instance the biomedical translation task (Bawden et al., 2020; Yeganova et al., 2021) and the use and training of neural NLP models.

References

Md Mahfuz Ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, and Vassilina Nikoulina. On the evaluation of machine translation for terminology consistency, June 2021.

Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. https: // aclanthology.org/W05-0909.

Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann, and Lana Yeganova. Findings of the WMT 2020 biomedical translation shared task: Basque, Italian and Russian as new additional languages. In Proceedings of the Fifth Conference on Machine Translation, pages 660–687, Online, November 2020. Association for Computational Linguistics. https: // aclanthology.org/2020.wmt-1.76.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. https: // aclanthology.org/P02-1040.

Ricardo Rei, Craig Stewart, Ana C Farinha, and Alon Lavie. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online, November 2020. As- sociation for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.213. https: // aclanthology.org/2020.emnlp-main.213.

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano, Alex Wang, and Patrick Gallinari. QuestEval: Summarization asks for fact-based evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 6594–6604, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.529. https: // aclanthology.org/2021.emnlp-main.529.

Thibault Sellam, Dipanjan Das, and Ankur Parikh. BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.704. https: // aclanthology.org/2020.acl-main.704.

Billy T. M. Wong and Chunyu Kit. Extending machine translation evaluation metrics with lexical cohesion to document level. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1060–1068, Jeju Island, Korea, July 2012. Association for Computational Linguistics. https: // aclanthology.org/D12-1097.

Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Inigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez-de Viñaspre, Maika Vicente Navarro, and Antonio Jimeno Yepes. Findings of the WMT 2021 biomedical translation shared task: Sum- maries of animal experiments as new test set. In Proceedings of the Sixth Conference on Machine Translation, pages 664–683, Online, November 2021. Association for Computational Linguistics. https: // aclanthology.org/2021.wmt-1.70.

Weizhe Yuan, Graham Neubig, and Pengfei Liu. BARTScore: Evaluating generated text as text generation. In Curran Associates, Inc., editor, Advances in Neural Information Processing Systems, pages 27263–27277, Online, 2021.

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. https: // openreview.net/forum?id=SkeHuCVFDr.

Main activities

The main activities will be to carry out research on the topic outlined by (i) studying the past literature, (ii) re-implementing previously proposed approaches and baselines, (iii) proposing improvements to these solutions or a novel approach, (iv) carrying out and writing up experiments, (v) communicating on those experiments with the group.

Skills

Candidates should be currently finishing a Master 2 or equivalent (e.g. engineering school) in computer science (speciality artificial intelligence, machine learning or natural language processing).

They should have a good level in programming (python), experience with neural networks and an interest in natural language processing. A good written and spoken level of English is required, and knowledge of French is preferred.

Benefits package
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • Remuneration

    This internship may be either compensated (3,90€/hour) or remunerated (SMIC = 1 678,95€) depending on the candidate's situation.

    General Information
  • Theme/Domain : Language, Speech and Audio Information system (BAP E)

  • Town/city : Paris

  • Inria Center : Centre Inria de Paris
  • Starting date : 2023-03-01
  • Duration of contract : 6 months
  • Deadline to apply : 2022-12-31
  • Contacts
  • Inria Team : ALMANACH
  • Recruiter : Bawden Rachel / rachel.bawden@inria.fr
  • The keys to success

    We especially welcome candidates who already have experience in NLP and who have a high-level of motivation for the topic and the task, as well as the capacity to propose new ideas.

    About Inria

    Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.

    Instruction to apply

    Defence Security : This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.

    Recruitment Policy : As part of its diversity policy, all Inria positions are accessible to people with disabilities.

    Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.

    From this employer

    Recent blogs

    Recent news