scikit-gstat用户手册:前言

发布于 2022-07-28  1058 次阅读


Please refresh the page if equations are not rendered correctly.
---------------------------------------------------------------

Introduction 引言

General

scikit-gstat的用户指南文档旨在为用户提供该Python模块功能的使用指南,以及对地理统计学概念的基础性介绍。主要的用例是将这种描述交给学习地理统计学并需要使用scikit-gstat的学生但在介绍变异函数之前,必须先回答一个更普遍的问题,即地质统计学究竟是什么。

Note

本用户指南是对地理统计学的一个介绍。如果你已经熟悉了这个主题,你可以跳过这一部分。

What is geostatistics 什么是地质统计学?

The basic idea of geostatistics is to describe and estimate spatial covariance, or correlation, in a set of point data. While the main tool, the semi-variogram, is quite easy to implement and use, a lot of important assumptions are underlying it. 

地质统计学的基本思想是描述和估计一组点数据中的空间协方差,或相关关系。虽然主要的工具,即半变异函数,很容易实现和使用,但它背后有很多重要的假设。

The typical application of geostatistics is an interpolation. Therefore, although using point data, a basic concept is to understand this point data as a sample of a (spatially) continuous variable that can be described as a random field rf, or to be more precise, a Gaussian random field in many cases. 

地质统计学的典型应用是内插法。因此,虽然使用的是点数据,但一个基本的概念是把这个点数据理解为一个(空间上的)连续变量的样本,可以描述为一个随机场,或者更准确地说,在很多情况下是一个高斯随机场。

The most fundamental assumption in geostatistics is that any two values x_%7Bi%7D%20 and x_%7Bi%2Bh%7D%20 are more similar, the smaller h is, which is a separating distance on the random field. In other words: close observation points will show higher covariances than distant points. In case this most fundamental conceptual assumption does not hold for a specific variable, geostatistics will not be the correct tool to analyse and interpolate this variable.

地质统计学中最基本的假设是,任何两个在随机场的空间间隔距离h越小的值x_%7Bi%7Dx_%7Bi%2Bh%7D越相似。换句话说:近的观察点会比远的观察点显示出更高的协方差(即相关性)。如果这个最基本的概念假设对一个特定的变量不成立,地理统计学将不是分析和插值这个变量的正确工具。

One of the most easiest approaches to interpolate point data is to use IDW (inverse distance weighting). This technique is implemented in almost any GIS software. The fundamental conceptual model can be described as:

内插点数据最简单的方法之一是使用反距离加权法(inverse distance weighting,IDW)。这种技术几乎可以在任何GIS(Geographic Information System)软件中实现。其基本概念模型可以描述为:

Z_%7Bu%7D%20%3D%5Cfrac%7B%5Csum%5Cnolimits_%7Bi%7D%5EN%20w_%7Bi%7D*Z_%7Bi%7D%20%7D%7BN%7D%20

where Z_%7Bu%7D%20 is the value of rf at a non-observed location with N observations around it. These observations get weighted by the weight w_%7Bi%7D, which can be calculated like:

w_%7Bi%7D%20%3D%20%5Cfrac%7B1%7D%7B%5Cvert%20%5Cvert%20%5Cvec%7Bux_%7Bi%7D%7D%20%5Cvert%20%5Cvert%20%7D%20

where u  is the unobserved point and x_%7Bi%7D is one of the sample points. Thus, %5Cvec%7Bux_%7Bi%7D%7Dis the 2-norm of the vector between the two points: the Euclidean distance in the coordinate space (which by no means has to be limited to the R%5E2 case).

This basically describes a concept, where a value of the random field is estimated by a distance-weighted mean of the surrounding points. As close points shall have a higher impact, the inverse distance is used and thus the name of inverse distance weighting.

这基本上描述了一个概念,即随机场的一个值是由周围点的距离加权平均值来估计的。由于近距离的点会有更大的影响,所以使用了反距离作为权重,因此被称为反距离加权法。

In the case of geostatistics this basic model still holds, but is extended. Instead of depending the weights exclusively on the separating distance, a weight will be derived from a variance over all values that are separated by a similar distance. 

在地理统计学的情境下,这个基本模型仍然成立,但得到了扩展。与其说权重完全取决于间隔距离,不如说权重来自于所有间隔距离相似的数值的方差。

This has the main advantage of incorporating the actual (co)variance found in the observations and basing the interpolation on this (co)variance, but comes at the cost of some strict assumptions about the statistical properties of the sample. Elaborating and assessing these assumptions is one of the main challenges of geostatistics.

这有一个主要的优点,就是把观测中发现的实际(协)方差纳入其中,并把插值建立在这个(协)方差的基础上,但代价是要对样本的统计特性进行一些严格的假设。阐述和评估这些假设是地理统计学的主要挑战之一。

Geostatistical Tools 地理统计学工具

Geostatistics is a wide field spanning a wide variety of disciplines, like geology, biology, hydrology or geomorphology. Each discipline defines their own set of tools, and apparently definitions, and progress is made until today. 

地理统计学是一个广泛的领域,横跨各种学科,如地质学、生物学、水文学或地貌学。每个学科都定义了他们自己的一套工具和概念,直到今天还在取得新的进展。

It is not the objective of scikit-gstat to be a comprehensive collection of all available tools. The objective is more to offer some common and also more sophisticated tools for variogram analysis. Thus, when using scikit-gstat, you typically need another library for the actual application, like interpolation. In most cases that will be gstools. However, one can split geostatistics into three main fields, each of it with its own tools:

scikit-gstat的目标不是要成为所有可用工具的全面集合。它的目的更多的是为变异函数分析提供一些常用的和更复杂的工具。因此,在使用scikit-gstat时,你通常需要另一个库来实现实际应用中的其他功能,比如插值。在大多数情况下,gstools将可以胜任。然而,人们可以把地理统计学分成三个主要领域,每个领域都有自己的工具。

  • variography: with the variogram being the main tool, the variography focuses on describing, visualizing and modelling covariance structures in space and time.

  • 变差法:以变异函数为主要工具,变差法的重点是对空间和时间的协方差结构进行描述、可视化和建模。

  • kriging: is a family of interpolation methods, that utilize a variogram to estimate the kriging weights as sketched above.

  • kriging:包含一系列插值方法,利用变异函数来估计kriging的权重。

  • geostatistical simulation: is aiming on generate random fields that fit a given set of observations or a pre-defined variogram or covariance function.

  • 地质统计学模拟:旨在生成符合给定的观察值或预定的变异函数或协方差函数的随机场。

Everything not saved will be lost.
最后更新于 2022-08-10