Parallel geostatistics for sparse and dense datasets

Benjamin R. Ingram; Dan Cornford

doi:10.1007/978-90-481-2322-3_32

Parallel geostatistics for sparse and dense datasets

Benjamin R. Ingram, Dan Cornford

Computer Science Research Group

Research output: Chapter in Book/Published conference output › Chapter

Abstract

Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.

Original language	English
Title of host publication	geoENV VII – Geostatistics for Environmental Applications
Publisher	Springer
Pages	371-381
Number of pages	11
Volume	16
ISBN (Print)	9789048123216
DOIs	https://doi.org/10.1007/978-90-481-2322-3_32
Publication status	Published - 2008

Bibliographical note

geoENV 2008, 8-10 September 2008, Southampton (UK). The original publication is available at www.springerlink.com

Keywords

spatially-referenced datasets
satellite-based sensors
monitoring networks
individual sensors
environmental decision making
generation of maps
specific locations
real-time data
geostatistical operations
interpolation
map-generation
emergency
risk
evacuation
exploratory analysis
grid based systems
data likelihood
parallel maximum likelihood variogram estimation
parallel prediction algorithms
Walker Lake data set

Access to Document

10.1007/978-90-481-2322-3_32

Parallel geostatistics for sparse and dense datasets
geoENV 2008, 8-10 September 2008, Southampton (UK). The original publication is available at www.springerlink.com
Accepted author manuscript, 212 KB

Cite this

@inbook{25bd69ed814b4c8c9f2705eda4f07d35,

title = "Parallel geostatistics for sparse and dense datasets",

abstract = "Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.",

keywords = "spatially-referenced datasets, satellite-based sensors, monitoring networks, individual sensors, environmental decision making, generation of maps, specific locations, real-time data, geostatistical operations, interpolation, map-generation, emergency, risk, evacuation, exploratory analysis, grid based systems, data likelihood, parallel maximum likelihood variogram estimation, parallel prediction algorithms, Walker Lake data set",

author = "Ingram, {Benjamin R.} and Dan Cornford",

note = "geoENV 2008, 8-10 September 2008, Southampton (UK). The original publication is available at www.springerlink.com",

year = "2008",

doi = "10.1007/978-90-481-2322-3_32",

language = "English",

isbn = "9789048123216",

volume = "16",

pages = "371--381",

booktitle = "geoENV VII – Geostatistics for Environmental Applications",

publisher = "Springer",

address = "Germany",

}

TY - CHAP

T1 - Parallel geostatistics for sparse and dense datasets

AU - Ingram, Benjamin R.

AU - Cornford, Dan

N1 - geoENV 2008, 8-10 September 2008, Southampton (UK). The original publication is available at www.springerlink.com

PY - 2008

Y1 - 2008

N2 - Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.

AB - Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.

KW - spatially-referenced datasets

KW - satellite-based sensors

KW - monitoring networks

KW - individual sensors

KW - environmental decision making

KW - generation of maps

KW - specific locations

KW - real-time data

KW - geostatistical operations

KW - interpolation

KW - map-generation

KW - emergency

KW - risk

KW - evacuation

KW - exploratory analysis

KW - grid based systems

KW - data likelihood

KW - parallel maximum likelihood variogram estimation

KW - parallel prediction algorithms

KW - Walker Lake data set

UR - http://www.springerlink.com/content/t567xr3558222517/

U2 - 10.1007/978-90-481-2322-3_32

DO - 10.1007/978-90-481-2322-3_32

M3 - Chapter

SN - 9789048123216

VL - 16

SP - 371

EP - 381

BT - geoENV VII – Geostatistics for Environmental Applications

PB - Springer

ER -

Parallel geostatistics for sparse and dense datasets

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this