Abstract

This paper presents a solution for predicting word complexity using contextual sentence information, a problem that traditional methods often struggle to address. It also introduces a user-friendly interface to dynamically assesses word complexity and provides explanations by considering both individual word features and their surrounding context.
Three distinct approaches were explored in this work. The first approach applied a Bidirectional Long Short-Term Memory (Bi-LSTM) model, trained on linguistic and semantic features extracted from the text. The second method uses Bidirectional Encoder Representations from Transformers (BERT) with two separate models: one for sentence-level complexity and another for word-level complexity, with the predictions combined for more context-sensitive result. The third approach introduces a novel method that combines XLNet word embeddings with a Random Forest classifier to processes both sentence and word embeddings for predicting complexity levels.
A diverse dataset covering the domains of religion, biomedical, and parliamentary texts was used in this word, as it is pre-categorised into five complexity levels (Very-easy, Easy, Medium, Hard, Very-hard). To ensure balanced class representation, data augmentation techniques were applied. Evaluation metrics revealed that the XLNet-based model (third method) outperformed others, achieving 80% accuracy (Macro-Average F1-measure = 0.78), particularly excelling at identifying highly complex words (F1-measure = 0.95). The BERT-based model closely followed, with an accuracy of 78% (Macro-Average F1-measure = 0.75), and the Bi-LSTM method achieved an accuracy of 63% (Macro-Average F1-measure = 0.63). The best-performing model (XLNet-based) is then selected as the engine behind a user-friendly interface created with Gradio, which can detect complex words in an input sentence and provide explanations.
This work highlights the importance of utilising both word and sentence-level embeddings for effective complexity prediction. The developed models, along with the user-friendly interface, have significant potential applications in education by helping language learners in navigating challenging vocabulary.
Original languageEnglish
Number of pages1
Publication statusUnpublished - 22 Nov 2024
EventThe Second UK AI Conference 2024 - University of Birmingham, Birmingham, United Kingdom
Duration: 22 Nov 202422 Nov 2024
https://uk-ai.org/ukai2024/

Conference

ConferenceThe Second UK AI Conference 2024
Abbreviated titleUK AI
Country/TerritoryUnited Kingdom
CityBirmingham
Period22/11/2422/11/24
Internet address

Fingerprint

Dive into the research topics of 'Enhancing Word Complexity Prediction Through Contextual Analysis'. Together they form a unique fingerprint.

Cite this