Early detection of heterogeneous disaster events using social media

Viktor Pekar, Jane Binner, Hossein Najafi, Chris Hale, Vincent Schmidt

Research output: Contribution to journalArticle

Abstract

This article addresses the problem of detecting crisis‐related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine‐learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40% when trained and tested on heterogeneous data vis‐á‐vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real‐world data set to assess its practical value.
Original languageEnglish
JournalJournal of the Association for Information Science and Technology
Early online date22 Mar 2019
DOIs
Publication statusE-pub ahead of print - 22 Mar 2019

Fingerprint

social media
Disasters
disaster
event
Emergency services
Adaptive boosting
Learning algorithms
Labels
Classifiers
Social media
Disaster
learning method
cause
management
learning
performance

Bibliographical note

© 2019 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T.

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Cite this

@article{341b93eac2e746c09d608179cc89c98b,
title = "Early detection of heterogeneous disaster events using social media",
abstract = "This article addresses the problem of detecting crisis‐related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine‐learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40{\%} when trained and tested on heterogeneous data vis‐{\'a}‐vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real‐world data set to assess its practical value.",
author = "Viktor Pekar and Jane Binner and Hossein Najafi and Chris Hale and Vincent Schmidt",
note = "{\circledC} 2019 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.",
year = "2019",
month = "3",
day = "22",
doi = "10.1002/asi.24208",
language = "English",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",

}

Early detection of heterogeneous disaster events using social media. / Pekar, Viktor; Binner, Jane; Najafi, Hossein; Hale, Chris; Schmidt, Vincent.

In: Journal of the Association for Information Science and Technology, 22.03.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Early detection of heterogeneous disaster events using social media

AU - Pekar, Viktor

AU - Binner, Jane

AU - Najafi, Hossein

AU - Hale, Chris

AU - Schmidt, Vincent

N1 - © 2019 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

PY - 2019/3/22

Y1 - 2019/3/22

N2 - This article addresses the problem of detecting crisis‐related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine‐learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40% when trained and tested on heterogeneous data vis‐á‐vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real‐world data set to assess its practical value.

AB - This article addresses the problem of detecting crisis‐related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine‐learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40% when trained and tested on heterogeneous data vis‐á‐vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real‐world data set to assess its practical value.

UR - https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.24208

U2 - 10.1002/asi.24208

DO - 10.1002/asi.24208

M3 - Article

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

ER -