Medical Incident Report Classification using Context-based Word Embeddings

Bosch, S. (2017) Medical Incident Report Classification using Context-based Word Embeddings. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
Master_AI_research_project_201_1.pdf - Published Version
Download (2MB) | Preview

Text
toestemming.pdf - Other
Restricted to Backend only
Download (79kB)

Abstract

The University Medical Center Groningen is one of the largest hospitals in The Netherlands, employing over 10.000 people. In a hospital of this size incidents are bound to occur on a regular basis. Most of these incidents are reported extensively, but the time consuming nature of analyzing their textual descriptions and the sheer number of reports make it costly to process them. Therefore, this thesis proposes ways of employing machine learning techniques to process the incident reports more efficiently and effectively. More specifically, we show how context-based word embeddings, vector representations of words, can be used to enhance searching capabilities within the report database and how they can be combined with classifiers to predict labels attributed to the incidents. For these purposes, we subjected several word embedding and classification techniques to a comparative analysis. To evaluate word embedding techniques, we propose a method to measure the extent to which word embeddings that are similar in vector space, represent words that are associated with each other. Using this method, we find that the textit{continuous bag-of-words} architecture developed by Mikolov et al. performs best out of a selection of six different word embedding methods. Furthermore, we compare different methods for embedding entire incident reports, and find that simply averaging word embeddings in combination with an inverse document frequency weighting function yields better results than several recently introduced techniques. For the prediction of class labels, the combination of such report embeddings with multilayer perceptrons proves most promising out of a selection of eleven different combinations of classifier and input type (bag of words, word embeddings or report embeddings). We find that three out of five tested categories are predicted well above the baseline, established by always selecting the most frequent label, whereas the other two maximally yield baseline performances. Finally, to improve searching capabilities within the report database, we developed an application that makes use of the learned word embeddings by providing the user with search suggestions based on a search query. The Central Incident Commission, which is concerned with processing the incident reports, experiences this functionality to drastically increase the efficiency and effectiveness of search queries, both saving them time and helping them find more complete patterns within the data.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Wiering, M.A. and Cnossen, F.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	15 Feb 2018 08:31
Last Modified:	02 May 2019 09:26
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/15795

Actions (login required)

View Item