Extracting topical phrases from clinical documents

Research output: Chapter in Book/Report/Conference proceedingConference contribution

View graph of relations Save citation



Research units


In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the “bag-of-words” assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients’ discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.

Request a copy

Request a copy


Publication date12 Feb 2016
Publication titleProceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)
Number of pages7
ISBN (Electronic)9781577357605
Original languageEnglish
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States


Conference30th AAAI Conference on Artificial Intelligence, AAAI 2016
CountryUnited States

Bibliographic note


Employable Graduates; Exploitable Research

Copy the text from this field...