The sensitivity of mapping methods to reference data quality: training supervised image classifications with imperfect reference data

Giles M. Foody; Mahesh Pal; Duccio Rocchini; Carol X. Garzon-Lopez; Lucy Bastin

doi:10.3390/ijgi5110199

The sensitivity of mapping methods to reference data quality: training supervised image classifications with imperfect reference data

Giles M. Foody^*, Mahesh Pal, Duccio Rocchini, Carol X. Garzon-Lopez, Lucy Bastin

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

Original language	English
Article number	199
Number of pages	20
Journal	ISPRS International Journal of Geo-Information
Volume	5
Issue number	11
DOIs	https://doi.org/10.3390/ijgi5110199
Publication status	Published - 1 Nov 2016

Bibliographical note

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Keywords

accuracy
classification
error
land cover
remote sensing
training

Access to Document

10.3390/ijgi5110199Licence: CC BY 3.0

Sensitivity of mapping methods to reference data quality
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
Final published version, 694 KBLicence: CC BY 3.0

http://www.mdpi.com/2220-9964/5/11/199Licence: CC BY 3.0

Cite this

@article{6a831d797c78402b8d7e09564f0b3fb6,

title = "The sensitivity of mapping methods to reference data quality: training supervised image classifications with imperfect reference data",

abstract = "The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.",

keywords = "accuracy, classification, error, land cover, remote sensing, training",

author = "Foody, {Giles M.} and Mahesh Pal and Duccio Rocchini and Garzon-Lopez, {Carol X.} and Lucy Bastin",

note = "This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0). ",

year = "2016",

month = nov,

day = "1",

doi = "10.3390/ijgi5110199",

language = "English",

volume = "5",

journal = "ISPRS International Journal of Geo-Information",

issn = "2220-9964",

publisher = "MDPI AG",

number = "11",

}

TY - JOUR

T1 - The sensitivity of mapping methods to reference data quality

T2 - training supervised image classifications with imperfect reference data

AU - Foody, Giles M.

AU - Pal, Mahesh

AU - Rocchini, Duccio

AU - Garzon-Lopez, Carol X.

AU - Bastin, Lucy

N1 - This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

PY - 2016/11/1

Y1 - 2016/11/1

N2 - The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

AB - The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.

KW - accuracy

KW - classification

KW - error

KW - land cover

KW - remote sensing

KW - training

UR - http://www.scopus.com/inward/record.url?scp=84994156533&partnerID=8YFLogxK

U2 - 10.3390/ijgi5110199

DO - 10.3390/ijgi5110199

M3 - Article

AN - SCOPUS:84994156533

SN - 2220-9964

VL - 5

JO - ISPRS International Journal of Geo-Information

JF - ISPRS International Journal of Geo-Information

IS - 11

M1 - 199

ER -

The sensitivity of mapping methods to reference data quality: training supervised image classifications with imperfect reference data

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this