Resumen |
Abstract: Recently, the extraction of clinical events from unstructured medical texts has attracted much attention of the research community. Machine learning approaches are popular for this task, due to their ability to solve the problem of sequence tagging effectively. It has been suggested previously that simple features, such as word unigrams, part-of-speech tags, chunk tags, among others, are sufficient for this task. We show that more careful preprocessing and feature selection can significantly improve the results. We used conditional random field classifier with more linguistically oriented features and outperformed the current state-of-the-art approaches. We also show that the popular and much simpler Viterbi algorithm (hidden Markov model-based classification algorithm) can produce competitive results, when its parameters are tuned using specific optimization techniques. We evaluate these algorithms for the task of extraction of medical events from the corpus developed for SemEval shared Task 12: Clinical TempEval (Temporal Evaluation) 2016, namely, for its two subtasks: (i) event detection and (ii) event classification based on contextual modality. |