Resumen |
In this paper, we compare the performance of a variety of machine learning algorithms, including supervised Naïve Bayes, J48, SVM, Random Tree, Random Forest, and non-supervised KNN for determining the type of cancer a patient is suffering using medical textual records. We train these classifiers on different sets of features such as unigrams and bigrams of words, character n-grams using tf-idf weighting scheme and binary feature representation. We evaluated performance of the classifers in terms of accuracy, precision, recall, and F-measure. The obtained results show that Naïve Bayes and SVM achieve the best performance in this task. |