Título |
ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules |
Tipo |
Congreso |
Sub-tipo |
Memoria |
Descripción |
2022 Iberian Languages Evaluation Forum, IberLEF 2022 |
Resumen |
Named entity recognition (NER) and normalization are crucial tasks for information extraction in the medical field. They have been tackled through different approaches from rule-based systems and classic machine learning methods with feature engineering to the most sophisticated deep learning models; most of them for English. In this work, we present a transfer learning approach starting from multilingual BERT to tackle the problem of Spanish NER (species) and normalization in clinical cases by using sentence tokenization for training and a paragraph tuning strategy at the inference phase. We propose that text lengths at training and inference stages do not have to match and that such difference can leverage the model's performance according to the task. Our validation showed that using a context of three sentences during inference improves the F1 score in ≈1% compared to longer and shorter paragraphs and in ≈17% compared to the whole document. We also applied simple but effective post-processing rules on the model's output, which improved the Micro F1 score in ≈28%. Our system achieved an F1 of 0.8499 in the testing dataset of the LivingNER shared task 2022. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
Observaciones |
CEUR Workshop Proceedings, v. 3202 |
Lugar |
Coruña |
País |
España |
No. de páginas |
|
Vol. / Cap. |
|
Inicio |
2022-09-20 |
Fin |
|
ISBN/ISSN |
|