Abstract
Preprocessing is an essential and primary step in generating taxonomy automatically for text documents because text data is unstructured; and more inconsistent and noisy than structured data. Different combinations of preprocessing techniques have been applied in generating taxonomy to amplify pertinent information for further analysis and processing. This research investigates the impact of various preprocessing techniques on the quality of the generated taxonomy. Various combinations of preprocessing techniques have been applied in taxonomy generation on two text data sets, selected from different domains. The experimental results revealed that selecting a suitable combination of preprocessing techniques can improve the quality of automated taxonomy. However applying all preprocessing techniques in the generation does not guarantee high quality.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 13th International Conference on Emerging Technologies, ICET2017 |
Publisher | IEEE |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5386-2260-5 |
DOIs | |
Publication status | Published - 8 Feb 2018 |