R analysis in Ecology (Keep updating)

43 minute read

Published: March 01, 2020

I will record everything I have learned about R based Ecology analysis here.

Introduction to R

Here will be recorded some methods that may only be used in ecological research. I will collect methods from the paper I read. These methods are mainly related to biogeography and community ecology, and they may not be used by me now, but they may be used by you in the future.

Introduction to R
Biodiversity
Connect plot, traits and environment
- RQL analysis
Biogeography
Taxonomic data processing based on R

Biodiversity

生物多样性是生态学研究的基础，我们将在系统发育、功能性状、地理分布等方面详细探讨如何研究生物多样性。同时也会涉及许多生物信息学的内容，那么生物信息学非常有用，据说当前生物信息学的平均年薪已经接近10w刀，所以学习相关知识对于我们未来的发展非常重要。 Jesús N. Pinto-Ledezma and Jeannine Cavender-Bares

Phylogenetic Diversity

有关该章节的相关课程：“Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology”（Garamszegi，2014）（http://www.mpcm-evolution.org/）和 “Phylogenies in Ecology”。（Cadotte and Davies，2016）（https://www.utsc.utoronto.ca/~mcadotte/page-3/）。需要安装许多包加载许多数据：

package.names <- c('permute', 'picante', 'pez', 'car', 'vegan', 'MASS', 'ecodist', 'FD', 'adephylo', 'phytools') 
if ( ! (package.names[1] %in% installed.packages())) {install.packages(package.names[1], dependencies = T)}
#也可以这样
missing_pkgs <- package.names[which(!package.names %in% installed.packages())]
install.packages(missing_pkgs)
#设置可以搞一个循环
package.names <- c('ape', 'picante', 'pez', 'car', 'vegan', 'MASS', 'ecodist', 'FD', 'adephylo')
for (pkg in package.names) {
  if (!require(pkg, character.only = TRUE, quietly = TRUE)) {
    install.packages(pkg)
    library(pkg, character.only = TRUE)
  } 
}
#加载系统发育数据
trMB <- ape::read.tree("Rstats/ALLMB.CCESR.tre")
#查看数据的一些特征
length(trMB$tip.label)
head(trMB$tip.label)
#加载群落数据，2-5列
BBSraw <- read.csv("Data/BBSpecies.biomass.2014.csv")[2:5]
bio.dat <- BBSraw
head(bio.dat)
#Year Plot Biomass.g.m2                Species
#1 2014    2         1.42     Aristida basiramea
#2 2014    2         0.92            Cyperus sp.
#3 2014    2         0.02          Digitaria sp.
#4 2014    2         0.02 Euphorbia glyptosperma
#5 2014    2       182.12     Lespedeza capitata
#6 2014    2        22.03   Miscellaneous litter
#将空格替换为_
bio.dat$Species <- gsub(" ", "_", bio.dat$Species)
#将Plot和year用；连接起来
bio.dat$Plot.Year <- paste(bio.dat$Plot, bio.dat$Year, sep = ";", collapse = NULL)
#调整列的顺序
bio.dat <- bio.dat[, -c(1, 2)]
bio.dat <- bio.dat[, c(3, 1, 2)]
#修改错误的种名
bio.dat$Species <- gsub("Petalostemum_purpureum", "Dalea_purpurea", bio.dat$Species)
bio.dat$Species <- gsub("Petalostemum_candidum", "Dalea_candida", bio.dat$Species)
bio.dat$Species <- gsub("Petalostemum_villosum", "Dalea_pulchra", bio.dat$Species)
bio.dat$Species <- gsub("Taraxicum_officinalis", "Taraxacum_croceum", bio.dat$Species)
bio.dat$Species <- gsub("Leptoloma_cognatum", "Digitaria_ciliaris", bio.dat$Species)
bio.dat$Species <- gsub("Artemisia_.caudata._campestris", "Artemisia_caudata", bio.dat$Species)
bio.dat$Species <- gsub("Achillea_millefolium.lanulosa.", "Achillea_millefolium", bio.dat$Species)
bio.dat$Species <- gsub("Euphorbia_.supina._maculata", "Euphorbia_supina", bio.dat$Species)
bio.dat$Species <- gsub("Tragopogon_dubius_.major.", "Tragopogon_dubius", bio.dat$Species)
bio.dat$Species <- gsub("Ambrosia_artemisiifolia_elatior", "Ambrosia_artemisiifolia", bio.dat$Species)
bio.dat$Species <- gsub("Andropogon_gerardi", "Andropogon_gerardii", bio.dat$Species)
bio.dat$Species <- gsub("Erigeron_canadensis", "Erigeron_canadense", bio.dat$Species)
#使用循环语句修改错误的种名
oldsp <- c("Petalostemum_purpureum", "Petalostemum_candidum", "Petalostemum_villosum", 
           "Taraxicum_officinalis", "Leptoloma_cognatum", "Artemisia_.caudata._campestris",
           "Achillea_millefolium.lanulosa.", "Euphorbia_.supina._maculata",
           "Tragopogon_dubius_.major.", 
           "Ambrosia_artemisiifolia_elatior","Andropogon_gerardii", "Erigeron_canadensis")

newsp <- c("Dalea_purpurea", "Dalea_candida", "Dalea_pulchra", 
           "Taraxacum_croceum", "Digitaria_ciliaris", "Artemisia_caudata", 
           "Achillea_millefolium", "Euphorbia_supina", "Tragopogon_dubius", 
           "Ambrosia_artemisiifolia", "Andropogon_gerardii", "Erigeron_canadense")

for(i in 1:length(oldsp)){
  cat("FROM", oldsp[i], "TO", newsp[i], "\n")
  bio.dat$Species <- gsub(oldsp[i], newsp[i], bio.dat$Species)
}

好，到目前为止，我们已经加载了系统发育并更新了群落数据中的物种名称，但是，系统发育是Smith和Brown（2018）的完整系统发育，总共包括356305种（length(trMB $ tip.label)）。现在，我们将准备一个系统发育系统，以仅包括我们群落中存在的物种。

library("ape")
#获得bigbio的物种名录
#将bio.dat中的Species导为spnames
spnames <- unique(bio.dat$Species)
#用setdiff求向量trMB$tip.label与spnames中不同的元素
#用drop.tip去掉发育树中的特定类群，就实现了树和群落数据中物种的对齐
trMBcom <- drop.tip(trMB, setdiff(trMB$tip.label, spnames))
#看看哪些是Smith树里没有的
setdiff(spnames, trMBcom$tip.label)
#把bio.bat里的NA缺失数据通过na.omit()函数全删了并转为数据框
bio.dat <- data.frame(na.omit(bio.dat))
head(bio.dat)
#安装相关的包
library(nlme)
library(picante)
#将sample文件转变为matrix文件
BBScom <- data.frame(sample2matrix(bio.dat))
BBScom[1:10, 1:10]
#最后，我们拥有计算不同系统发育多样性指标所需的所有数据

Indicator about phylogenetic diversity

在继续之前，请再次检查我们的数据（系统发育和群落数据）是否匹配！为此，我们将使用来自picante软件包的强大功能match.phylo.com()。首先需要清理我们乱糟糟的R环境然后就可以计算各种指标

#获取存储在环境中的对象的名称
is()
rem <- is()
rem
#查看i中的缺失值
is.na(i)

#把群落有的但Smith树没有的数据给drop了
matched <- picante::match.phylo.comm(phy = trMBcom, comm = BBScom)
#现在让我们检查存储在匹配对象中的数据。
matched$comm[1:10, 1:10]
#画出整理对齐后的树
plot(matched$phy, show.tip.label = FALSE)

复杂树

PD PD即为系统发育多样性。现在准备探索一些度量指标以评估群落的表型和系统发育结构。

#群落中总分支长度之和
sum(matched$phy$edge.length)
#计算每个样地的PD
BBSpd <- pd(matched$comm, matched$phy, include.root = FALSE)
head(BBSpd)
#计算相关性，SR为每个样地的物种数量
cor.test(BBSpd$SR, BBSpd$PD)
#物种多样性与系统发育多样性一起作图
plot(BBSpd$SR, BBSpd$PD, xlab = "Species richness", ylab = "PD (millions of years)", pch = 16)

PD与SR

MPD and MNTD 两两平均距离(MPD)和两两最近平均距离(MNTD)也是衡量系统发育的重要指标之一。

#计算MPD
dist.trMB <- cophenetic(matched$phy)
#lower.tri()函数可以提取矩阵的下三角部分
dist.trMB <- dist.trMB[lower.tri(dist.trMB, diag = FALSE)]
mean(dist.trMB)
#[1] 224.8141
#计算MNTD
#cophenetic是距离矩阵？
dist.trMB2 <- cophenetic(matched$phy)
#把矩阵的对角线弄成NA
diag(dist.trMB2) <- NA
#通过apply函数，计算dist.trMB2的第2行的最小值
apply(dist.trMB2, 2, min, na.rm = TRUE)
mean(apply(dist.trMB2, 2, min, na.rm = TRUE))

可以用picante包更方便地计算MPD和NMTD

#mpd
BBSmpd <- mpd(matched$comm, cophenetic(matched$phy))
head(BBSmpd)
#mntd
BBSmntd <- mntd(matched$comm, cophenetic(matched$phy))
head(BBSmntd)

null model of phylogenetic diversity

对群落系统发育的分析可以通过评估本地群落的系统发育来推断构建本地群落的机制。但是，现在可以使用新方法，从而可以将本地和区域范围内的生态过程和历史过程之间更复杂的平衡纳入分析中。现在，让我们计算一些最常见的指标。 PD-系统发育多样性是一个或多个样品的总系统发育分支长度的总和。它有许多衍生。

SES.pd, SES.mpd and SES.mntd 为了消除群落中物种丰富度对系统发育多样性的影响，对系统发育多样性进行标准化即可得到SES.pd指数(standardized effect size of PD)。而对mpd和mntd进行标准化也可以得到相应的指标：SESmpd和SESmntd

BBScdm <- ses.pd(matched$comm, matched$phy, runs = 99)
BBScdm <- BBScdm[, c(1, 2, 6, 7)]
head(BBScdm)
#Rao的二次熵是对生态群落多样性的一种度量，可以选择考虑物种差异（例如系统发生差异）。
#Simpsons，99次零模型得到的结果
BBScdm <- ses.pd(matched$comm, matched$phy, runs = 99)
BBScdm <- BBScdm[, c(1, 2, 6, 7)]
#结果中PDobs表示系统发育多样性的观测值,SES.pd大于0表示群落中较老的物种比较多
head(BBScdm)
#MPD是群落中分类单元的两两平均距离
#SESmpd
BBSsesmpd <- ses.mpd(matched$comm, cophenetic(matched$phy), runs = 99)
BBScdm$MPD <- BBSsesmpd[, c(2)]
BBScdm$sesMPD <- BBSsesmpd[, c(6)]
BBScdm$MPDpval <- BBSsesmpd[, c(7)]
#MNTD是群落中分类单元的两两最近平均距离
#SESmntd
BBSsesmntd <- ses.mntd(matched$comm, cophenetic(matched$phy), runs = 99)
BBScdm$MNTD <- BBSsesmntd[, c(2)]
BBScdm$sesMNTD <- BBSsesmntd[, c(6)]
BBScdm$MNTDpval <- BBSsesmntd[, c(7)]

PSV, PSR and PSE 谱系变异性指数(PSV)指数能够把群落中全部物种都共同拥有的一个随机性状产生的变异水平进行量化，并反映出是如何通过系统发育亲缘关系进行变化的。系统发生物种丰富度（PSR）是样本中物种的数量乘以PSV。而PSE指数是对PSV指数进行改进的指标，结合了物种丰富度，通过改进能够融合更多的物种信息。

#PSV
BBSpsv <- psv(matched$comm, matched$phy, compute.var = TRUE)
BBScdm$PSV <- BBSpsv[, 1]
#PSR
BBSpsr <- psr(matched$comm, matched$phy, compute.var = TRUE)
BBScdm$PSR <- BBSpsr[, 1]
#PSE
BBSpse <- pse(matched$comm, matched$phy)
BBScdm$PSE <- BBSpse[, 1]

**${q}\textrm{D}(p)$** ${q}\textrm{D}(p)$是一种度量标准，用于衡量社区内物种差异的变化。该指标是对Hill指数的修改，对物种的比例按其系统发育信息进行了加权。 https://onlinelibrary.wiley.com/doi/full/10.1111/j.1600-0706.2012.20607.x

source("RFunctions/qDp.R")
BBSqDp <- qDp(matched$phy, matched$comm, q = 2)
BBScdm$qDP <-	BBSqDp
#最后我们总结上述数据
head(BBScdm, 10)
BBScdm2 <- BBScdm[, c(1, 2, 5, 6, 9, 12, 13, 14, 15)]
names(BBScdm2) <- c("SR", "PD", "RaoD", "MPD", "MNTD", "PSV", "PSR", "PSE", "qDP")
head(BBScdm2)

Biodiversity and ecosystem functions

在本节中，我们将探讨有关生物多样性及其对生态系统功能的影响的问题。为此，我们将使用Jake Gorssman提供的数据和脚本（进行一些修改）。从大佬(Grossman et al. 2017; Ecology 98:2601-14)的”Species richness and traits predict overyielding in stem growth in an early‐successional tree diversity experiment“中获得数据并整理，它们在我的github中也可以获得：

library(agricolae)
df <- read.csv("Data/BEF_Lesson_Data.csv", header = T)
#查看第一行
names(df)
#看看数据结构，df是一个具有9列140行的数据帧。这些行是FAB实验中的140个实验图。
dim(df)

我们需要知道：

plot = 每个地块的任意索引
SR = 样地的物种丰富度（1、2、5或12种）
Comp = 分类代码，对于具有相同成分的图，该代码相同。 M = 单种养殖，B = 双种养殖，F = 5种，T = 12种
PSV = 系统发生种的变异性（Helmus et al。2007）: 一种独立于物种丰富度的系统发生多样性的度量
FDis = 功能分散度（Laliberte和Legendre，2010年）: 一种独立于物种丰富度的功能多样性指标
NBE = 生物多样性净效应: 观察到的生物多样性（d_Y）减去预期生物多样性（基于单一文化（在此示例中未给出））
CE = 互补效应（Loreau and Hector 2001）: CE + SE = NBE；使用Forest Isbell中的脚本进行计算-UMN–Twin Cities
SE = 选择效应（Loreau and Hector 2001），见上文
d_Y = 地块生物量: 给定地块中树木茎生物量的平均变化量（kg）重要提示-请注意，对于纯林(?)而言，NBE，CE和SE没有价值，因为它们只能计算得出混交林(?)的数值。
The impact of land composition and species richness on biomass
Q1：茎生物量产量是否取决于地块组成和丰富度？ 首先，设置颜色方案以按颜色区分构图，然后绘制数据以查看是否存在可见趋势：
```
#设置颜色
comp.cols <- c(rep("red", 12), rep("orange", 28), rep("yellow", 10), rep("green", 1))
with(df, plot(Comp, d_Y, col = comp.cols))
```
接下来可以进行一些数学分析
```
#线性回归
m1 <- lm(d_Y ~ Comp, data = df)
summary(m1)
#anova
anova(m1)
#HSD
m1.df <- HSD.test(m1, "Comp", group = TRUE, console = TRUE)
```
Q2:那么这些超额收益（overyield，NBE）是否取决于地块的组成和丰富程度？ 请记住，现在的响应变量不是该地块产生多少生物量，而是该数字减去如果该地块中的某树的纯林的情况下的预期值。因此，这需要调整每种物种的“先天”生产力。这时不同地块的情况如何呢？
```
#绘图
with(df, plot(Comp, NBE, col = comp.cols))
#回归
m3 <- lm(NBE ~ Comp, data = df) 
summary(m3)
#anova
anova(m3)
#HSD
m3.df <- HSD.test(m3, "Comp", group = TRUE, console = TRUE)
```
所以，根据事后检验，那一块地的产量最高呢？在评估超额收益与物种丰富度的关系时，我们又会发现什么？可以比较m4与上面m2的结果：
```
with(df, plot(SR, NBE))
m4 <- lm(NBE ~ SR, data = df)
summary(m4)
anova(m4)
m4.df <- HSD.test(m4, "SR", group = TRUE, console = TRUE)
```
这有助于解释单变量控制试验在生物多样性研究中的重要性。
The complementarity and selection of biodiversity
Q3:如何在互补性和选择性方面比较不同级别的物种丰富度？ 我们现在很熟悉这一套了：首先，以图形方式进行分析；然后建立线性模型，并使用ANOVA进行评估（如果ANOVA显着），最后使用事后测试HSD：
```
#首先查看互补效应
with(df, plot(SR, CE))
m5 <- lm(CE ~ SR, data = df)
summary(m5)
anova(m5)
m5.df <- HSD.test(m5, "SR", group = TRUE, console = TRUE)
#接着查看选择效应
with(df, plot(SR, SE))
m6 <- lm(SE ~ SR, data = df)
summary(m6)
anova(m6)
m6.df <- HSD.test(m6, "SR", group = TRUE, console = TRUE)
#还可以把CE和SE的图放在一块比较，使之更易看出正负相关
with(df, plot(SR, CE, col = "blue"))
with(df, points(SR, SE, col = "red"))
abline(h = 0)
```
那么。
CE和SE如何与NBE（超额收益overyield）进行比较？
CE阳性是什么意思？CE阴性意味着什么？

SE的正值和负值如何？（这可能会造成混淆。）你是否发现超乎寻常的超额收益的证据？

#回到问题1的图
with(df, plot(SR, d_Y, ylim = c(-0.05, 0.25)))
#加条分界线
abline(h = 0)

就很明显： NBE2 Q4:那么生产力最低（或平均）的混交林是否比生产力最高的单一纯林更高？ 可以将所有纯林编码为“0”，并将所有混交林编码为“1”，然后进行t检验…

Dimensions of Biodiversity and Its Productivity

Q5:哪个方面的生物多样性维度？物种、功能性状还是系统发育可以确定最高的生产力呢？ 为了解决这个问题，我们将看到每个维度解释了多少超额收益（NBE）。对于单变量回归模型，我们可以仅使用线性模型输出中的R^2进行模型比较。结果如下：

#这是物种多样性
m4 <- lm(NBE ~ SR, data = df)
summary(m4)
#Adjusted R-squared:  0.643
#这是系统发育多样性
m7 <- lm(NBE ~ PSV, data = df) 
summary(m7)
#Adjusted R-squared:  0.02538
#这是功能多样性
m8 <- lm(NBE ~ FDis, data = df) 
summary(m8)
#Adjusted R-squared:  0.1768

Using Spectral Data to Investigate Biodiversity

在本实验中，我们随便研究一下“光谱数据”。为此，我们将使用R包spectrolab和其中存储的数据。spectrolab是一个R包，用于处理和可视化来自便携式光谱仪的数据，并建立通用的光谱信息接口。处理光谱信息/数据与处理任何其他信息（例如物种，社区）没什么不同似，因此可以用于计算任何多样性指标，即称为”光谱多样性”的生物多样性维度。 Jesús对这个算法进行了一些修改和补充，但基本上是基于spectrolab软件包的说明（https://cran.r-project.org/web/packages/spectrolab/vignettes/introduction_to_spectrolab.pdf）。也感谢感谢Dudu（JoséEduardo Meireles）使光谱数据易于处理。首先准备各种数据和软件：

#安装包
install.packages("spectrolab", dependencies = TRUE)
library(spectrolab)
#有两种方法可以使光谱进入R：
#1）将矩阵或数据框转换为光谱；
#2）从原始数据文件（格式：SVC的sig，Spectral Evolution的sed和ASD的asd）中读取光谱。 这是几个例子：
#这是一个使用spectrolab软件包提供的名为spec_matrix_meta.csv的数据集矩阵的示例。
#从csv中读取数据，但不用check.names = FALSE
dir_path <- system.file("extdata/spec_matrix_meta.csv", package = "spectrolab")
spec_csv <- read.csv(dir_path, check.names = FALSE)
head(spec_csv)
#将光谱转为R的矩阵或数据框后，就可以使用函数as.spectra()将其转换为光谱对象。
#矩阵必须在行中具有样本，在列中必须具有波长。波长列的标题必须是（数字）波长标签。
#您还应该使用name_idx参数声明哪一列具有样品名称（必填）。 如果存在其他列（样品名称和反射率除外），则必须将其索引作为meta_idxs参数传递给as.spectra。
#1,2列为元数据，3为样本列
achillea_spec = as.spectra(spec_csv, name_idx = 3, meta_idxs = c(1, 2))
#这就是光谱数据
achillea_spec
#找到自带的样本数据
dir_path <- system.file("extdata", "Acer_example", package = "spectrolab")
#读取波段数据（.sig格式）
acer_spectra <- read_spectra(path = dir_path, format = "sig")
#查看数据
acer_spectra
#查看维度？波段？
dim(acer_spectra)
#你可以访问光谱的各个组成部分。
#查看样本名称，允许重复
n <- names(achillea_spec)
#查看波长数据
w <- wavelengths(achillea_spec)
#查看反射率矩阵
r <- reflectance(achillea_spec)
#meta()代表相关的元数据（如果你有的话）
m <- meta(achillea_spec, "ssp", simplify = TRUE)

您可以使用类似于矩阵和data.frames中使用的[i，j]函数的符号对光谱信息进行子集（分波段）化。[i，]中的第一个参数匹配样本名称（行），而第二个参数[，j]匹配波长名称（列）。以下是”[“在频谱中的工作方式的一些示例： •x [1：3，]将保留x的前三个样本，即1：3为索引。 •x [“ sp_1”，]将样本名称与“ sp_1”匹配的所有条目保留在x中。 •x [，800：900]将保持800至900之间的波长。 •x [，1：5]将报错，因为波长不能通过索引子集。通过子集，你可以排除频谱开始和结尾的嘈杂区域，或者将数据限制为特定的波段。但某些光谱的分辨率可能不同于1nm，就像SVC一样。在这些情况下，最好的方法是使用光谱的最小和最大参数来标准化光谱。同时通过索引对样本进行子集的划分是可行的，通过数字或字符对波长进行子集设置也可以实现

#按波长和样本进行匹配
spec_sub_vis <- achillea_spec[ , 400:700 ]
spec_sub_byname <- achillea_spec["ACHMI_7", ]
spec_sub_byidx <- achillea_spec[ 1:3, ]
#标准化光谱
acer_spectra_trim <-  acer_spectra[ , wavelengths(acer_spectra, 400, 2400) ]
#通过参数划设子集
spec_sub_byidx[1, "405"] == spec_sub_byidx[1, 405]

也可以将光谱图转为矩阵或data.frame。也可以使用as.matrix()或as.data.frame()函数将光谱对象转换为矩阵或data.frame。因为我们经常要以特定格式（例如csv）导出数据。

#从spectra生成一个矩阵
spec_as_mat = as.matrix(achillea_spec, fix_names = "none")
spec_as_mat[1:4, 1:3]
#生成一个矩阵
spec_as_df = as.data.frame(achillea_spec, fix_names = "none", metadata = TRUE)
spec_as_df[1:4, 1:5]

Get band spectrum

绘制光谱的主要功能是plot()。它将在光谱对象中共同绘制每个光谱。您应该能够将常用的绘图参数传递给它，例如col，ylab，lwd等:

#简单绘制
plot(achillea_spec, lwd = 0.75, lty = 1, col = "grey25", main = "All Spectra")

光谱1 您还可以使用plot_quantile()绘制光谱对象的分位数。第二个参数total_prob是分位数包含的总“质量”。例如，total_prob = 0.95涵盖了光谱对象中95％的变化，即0.025至0.975分位数。如果add = TRUE，则分位数图可以单独使用，也可以添加到当前图中。

plot_quantile(achillea_spec, total_prob = 0.8, col = rgb(1, 0, 0, 0.5), lwd = 0.5, border = TRUE)
title("80% spectral quantile")

光谱分位数函数plot_regions()有助于阴影不同的光谱区域。spectrolab提供了一个default_spec_regions()矩阵作为示例，但显然需要对其进行自定义（有关详细信息，请参见plot_regions的帮助页面）。

#结合了单个光谱，分位数和阴影光谱区域的图
plot(achillea_spec, lwd = 0.25, lty = 1, col = "grey50", main = "Spectra, quantile and regions")
plot_quantile(achillea_spec, total_prob = 0.8, col = rgb(1, 0, 0, 0.25), border = FALSE, add = TRUE)
plot_regions(achillea_spec, regions = default_spec_regions(), add = TRUE)

光谱全图

Ecological indicators based on remote sensing

请注意，光谱范围为400到2400，以下我们将光谱数据分为不同的光谱。这些是VIS（可见光）或，NIR（近红外）或，SWIR1和SWIR2为短波红外。

VIS <- c(400:700)
NIR <- c(800:1300)
SWIR1 <- c(1550:1800)
SWIR2 <- c(2000:2400)
#VIS
achillea_spec_VIS <- achillea_spec[, VIS]
achillea_spec_VIS
achillea_spec_VIS_df <- as.data.frame(achillea_spec_VIS, fix_names = "none", metadata = TRUE)
achillea_spec_VIS_df[1:10, 1:5]
#可以绘制出来，其他波段也可以如法炮制
plot(achillea_spec_VIS, lwd = 0.75, lty = 1, col = "grey25", main = "Visible")
#删除前3列以进一步分析
spec_as_df_clean <- spec_as_df[, 4:2004]
spec_as_df_clean[1:10, 1:5]
#选择光谱数据
achillea_spec_VIS_df_clean <- achillea_spec_VIS_df[, 4:304]
achillea_spec_VIS_df_clean[1:10, 1:5]
#例如选择ACHMI的第一行。结果是规格为1行和2001列的data.frame。
spec_as_df_clean_ind1 <- spec_as_df_clean[1, ]
dim(spec_as_df_clean_ind1)
spec_as_df_clean_ind1

现在让我们计算一些描述性统计信息以更好地理解数据。可以自己做，但是Jesús准备了一个函数来自动进行此计算。其称为Descriptives()，您只需要输入值即可。

source("mixR.R")
Descriptives(as.numeric(spec_as_df_clean_ind1))

现在可以为其他对象重复该过程，并查看与Achillea的区别。请注意，函数Descriptives()可以在for循环中使用，但是也可以使用apply函数。此函数将特定函数应用于矩阵或data.frame。

descript_all_individuals <- apply(spec_as_df_clean, MARGIN = 1, FUN = Descriptives)
descript_all_individuals[[10]]

最后，我们将计算一些植被指数： 1.简单植被指数：SR = Rnir / Rr 其中：R =反射率，nir =波段845，r =波段665。 2.归一化植被指数：NDVI =（Rnir-Rr）/（Rnir + Rr）其中：nir = 845和r = 665。 3.光化学反射指数：PRI ＝（R531-R570）/（R531 + R570）。这些数字分别对应于531和570波段，R是反射率。 4.归一化水文指数：NDWI =（R860-R1240）/（R860 + R1240） 5.水平衡指数：WBI = R900 / R970。

SR_achillea = spec_as_df[, "845"]/spec_as_df[, "665"]
SR_achillea

Connect plot, traits and environment

RQL analysis

来自：https://heather-grab.github.io/Entom-4940/rql.html 准备环境

setwd('/Users/calice/desktop/rql')
library(mvabund)
library(lattice)
# R矩阵，即为每个地点的情况
env <- read.csv("env.csv", row.names = 1)
env
#   bio1 bio2  alt soil
# a 34.9  6.0 17.1  9.0
# b 49.6 20.2 25.9  9.9
# c 37.3  2.9 13.3  9.4
# d 52.1 15.1 19.1 10.6
# e 32.2  0.0 22.9  2.6
# f 55.1  0.0 13.6  1.6
# Q矩阵，每个物种的情况
trait <- read.csv("trait.csv", row.names = 1)
trait
#     height leaflen climb growth
# sp1    861    1.90 climb   herb
# sp2    819    1.87 climb   herb
# sp3    904    2.16 climb   herb
# sp4   1663    1.69   non   wood
# sp5   1778    1.53   sns   herb
# sp6   1151    1.91   non   herb
# L矩阵，物种在地点上的情况
dis <- read.csv("dis.csv", row.names = 1)
dis
#   sp1 sp2 sp3 sp4 sp5 sp6
# a   0   0   1  24   2   2
# b   0   0   0  16   2   3
# c   0   2   0   0   1   7
# d   0   0   0   3   0   1
# e   0   0   0   8   2   0
# f   1   0   2   7   0   0

Data exploring

# 等号
dis_sp = mvabund(dis)
plot(dis_sp)

可以根据分类环境变量将物种区分开来

env2 = env
env2$habitat = "field"
env2$habitat[1:3] <- "edge"
env2$habitat = as.factor(env2$habitat)

plot(dis_sp ~ env2$habitat, tranformation = "no")

ab2 我们可以借此了解每个物种在其特定的栖息地类型中的丰富度情况。

Multiple Generalized Linear Regression (mGLM)

了解物种分布格局/多样性与环境的关系。 dis_sp与bio1的关系：

# 此时我们认为数据是泊松分布
mod1 <- manyglm(dis_sp ~ env$bio1, family = "poisson")
plot(mod1)

glm1 残差歪了，说明不是泊松分布，可能是负二项分布？

mod2 <- manyglm(dis_sp ~ env$bio1, family = "negative_binomial")
plot(mod2)

glm2 残差平衡了不少。可以进行anova分析来探索物种的多样性格局是否与bio1有显著关联：

#bootstrap
anova(mod2, nBoot = 99)
# Time elapsed: 0 hr 0 min 0 sec
# Analysis of Deviance Table

# Model: dis_sp ~ env$bio1

# Multivariate test:
#             Res.Df Df.diff   Dev Pr(>Dev)
# (Intercept)      5                       
# env$bio1         4       1 8.042      0.4
# Arguments:
#  Test statistics calculated assuming uncorrelated response (for faster computation) 
#  P-value calculated using 99 iterations via PIT-trap resampling.

使用p.uni参数可以具体看到所有物种与bio1的调整后的相关性是否显著。

anova(mod2, p.uni = "adjusted", nBoot = 99)
# Time elapsed: 0 hr 0 min 0 sec
# Analysis of Deviance Table

# Model: dis_sp ~ env$bio1

# Multivariate test:
#             Res.Df Df.diff   Dev Pr(>Dev)
# (Intercept)      5                       
# env$bio1         4       1 8.042      0.4

# Univariate Tests:
#               sp1            sp2            sp3         
#               Dev Pr(>Dev)   Dev Pr(>Dev)   Dev Pr(>Dev)
# (Intercept)                                             
# env$bio1    3.582     0.42 0.791     0.84 0.786     0.84
#               sp4            sp5            sp6         
#               Dev Pr(>Dev)   Dev Pr(>Dev)   Dev Pr(>Dev)
# (Intercept)                                             
# env$bio1    0.197     0.84 2.148     0.57 0.538     0.84
# Arguments:
#  Test statistics calculated assuming uncorrelated response (for faster computation) 
# P-value calculated using 99 iterations via PIT-trap resampling.

multipleSDM

就是基于上面glm的一个物种分布模型，本质上是predict(glm())。模型中物种为固定效应，且通过环境相互作用。

sdm_fit = traitglm(dis_sp, env)

可以查看标准化模型系数（回归斜率）

sdm_fit$fourth
#                             bio1       bio2        alt
# as.factor.names.L..sp2  -2.146444  -1.245655  -0.905375
# as.factor.names.L..sp3 -18.360366  38.538891 -19.822485
# as.factor.names.L..sp4 -13.778650  28.875992 -12.890129
# as.factor.names.L..sp5   3.736608 -36.916878  28.603439
# as.factor.names.L..sp6   2.137445 -26.817396  21.258961
#                              soil
# as.factor.names.L..sp2   3.243161
# as.factor.names.L..sp3 -23.591447
# as.factor.names.L..sp4 -17.058409
# as.factor.names.L..sp5  38.603482
# as.factor.names.L..sp6  31.107954

绘制

# sdm_fit$fourth.corner和sdm_fit$fourth似乎没区别
a = max(abs(sdm_fit$fourth.corner))
# 设置颜色的渐变系统
colort = colorRampPalette(c("blue","white","red"))
#绘制物种与环境的关系矩阵
plot.spp = levelplot(t(as.matrix(sdm_fit$fourth.corner)), xlab="Environmental Variables",
 ylab="Species", col.regions=colort(100), at=seq(-a, a, length=100),
 scales = list( x= list(rot = 45)))
print(plot.spp)

sdm

Forth corner analysis

原先是环境，性状和物种分布三个角，现在要进行扩展

fit = traitglm(dis, env, trait)
fit$fourth
# 得到的似乎是环境与性状的关系
#                  bio1       bio2        alt      soil
# height      35.218287  -2.531766   9.834852 -42.99279
# leaflen    -24.685135  45.724439 -19.270302 -25.60843
# climbnon    -1.205911 -49.951263  34.751291  63.40392
# climbsns   -38.426865 -15.221388  15.094146  65.29489
# growthwood -43.189380  74.579155 -46.143020 -36.40022

观察残差：

plot(fit)
# nBoot最好是999什么的
anova(fit, nBoot = 10)
# Using block resampling... 
# Resampling begins for test 1.
# 	Resampling run 0 finished. Time elapsed: 0.00 minutes...
# Time elapsed: 0 hr 0 min 0 sec
# Analysis of Deviance Table

# Model  1: traitglm(L = dis, R = env, Q = trait, get.fourth = FALSE)
# Model  2: traitglm(L = dis, R = env, Q = trait)

# Multivariate test:
#                           Res.Df Df.diff   Dev Pr(>Dev)
# Main effects only             25                       
# env:trait (fourth corner)      5      20 50.18    0.182
# Arguments: P-value calculated using 10 iterations via PIT-trap block resampling.

可以使用summary考察所有因子的显著性

summary(fit, nBoot = 10)
# Coefficients: (3 not defined because of singularities)
#                 wald value Pr(>wald)  
# (Intercept)         46.704    0.0909 .
# sppsp2               0.717    0.0909 .
# sppsp3               7.238    0.0909 .
# sppsp4              14.257    0.0909 .
# sppsp5               6.968    0.0909 .
# sppsp6               3.526    0.0909 .
# bio1                 0.000    0.2727  
# bio2                 0.000    0.3636  
# alt                  0.000    0.5455  
# soil                 0.000    0.3636  
# bio1.squ             0.000    0.2727  
# bio2.squ             0.000    0.6364  
# alt.squ            259.127    0.0909 .
# soil.squ           419.920    0.0909 .
# bio1.height       2201.885    0.0909 .
# bio1.leaflen      1185.788    0.0909 .
# bio1.climbnon      101.597    0.6364  
# bio1.climbsns     1568.928    0.0909 .
# bio1.growthwood   8555.765    0.0909 .
# bio2.height        168.033    0.5455  
# bio2.leaflen      2272.804    0.0909 .
# bio2.climbnon     4643.564    0.0909 .
# bio2.climbsns      673.177    0.0909 .
# bio2.growthwood  12272.103    0.0909 .
# alt.height         648.529    0.0909 .
# alt.leaflen        996.368    0.0909 .
# alt.climbnon      3577.378    0.0909 .
# alt.climbsns       820.169    0.0909 .
# alt.growthwood   11168.552    0.0909 .
# soil.height       2315.545    0.0909 .
# soil.leaflen      1080.750    0.0909 .
# soil.climbnon     4609.975    0.0909 .
# soil.climbsns     2354.359    0.0909 .
# soil.growthwood   8399.477    0.0909 .
# --- 
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

# Test statistic:  3.781e+16, p-value: 0.0909 
# Arguments: P-value calculated using 10 resampling iterations via pit.trap resampling.

绘制热图：

a        = max(abs(fit$fourth.corner) )
colort   = colorRampPalette(c("blue","white","red")) 
plot.4th = levelplot(t(as.matrix(fit$fourth.corner)), xlab="Environmental Variables",
 ylab="Species traits", col.regions=colort(100), at=seq(-a, a, length=100),
 scales = list( x= list(rot = 45)))
print(plot.4th)

hmap

使用ggplot2绘制heatmap：

library(ggplot2)
library(reshape2)
data4heatmap <- as.matrix(fit$fourth.corner)
#                  bio1       bio2        alt      soil
# height      35.218287  -2.531766   9.834852 -42.99279
# leaflen    -24.685135  45.724439 -19.270302 -25.60843
# climbnon    -1.205911 -49.951263  34.751291  63.40392
# climbsns   -38.426865 -15.221388  15.094146  65.29489
# growthwood -43.189380  74.579155 -46.143020 -36.40022
# 数据整形
heatmap <- melt(data4heatmap)
#          Var1 Var2      value
# 1      height bio1  35.218287
# 2     leaflen bio1 -24.685135
# 3    climbnon bio1  -1.205911
# 4    climbsns bio1 -38.426865
# 5  growthwood bio1 -43.189380
# 6      height bio2  -2.531766
# 7     leaflen bio2  45.724439
# 8    climbnon bio2 -49.951263
# 9    climbsns bio2 -15.221388
# 10 growthwood bio2  74.579155
# 11     height  alt   9.834852
# 12    leaflen  alt -19.270302
# 13   climbnon  alt  34.751291
# 14   climbsns  alt  15.094146
# 15 growthwood  alt -46.143020
# 16     height soil -42.992794
# 17    leaflen soil -25.608433
# 18   climbnon soil  63.403915
# 19   climbsns soil  65.294891
# 20 growthwood soil -36.400219
ggplot(data = heatmap, aes(x = Var1,y = Var2, fill = value)) +
  geom_tile() +
  # 设置填充颜色
  scale_fill_continuous(low = "#00FF00", high = "#FF0000") +
  theme_bw() +
  theme(
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    panel.border = element_blank(),
    axis.title = element_blank(),
    # 设置x轴标注的角度
    axis.text.x = element_text(angle = 30)
  )

hmap2

Variable selection (based on LASSO)

ft1=traitglm(dis,env,trait,method="glm1path")
ft1$fourth
# LASSO各种变量筛选惩罚后把我的小数据全给惩罚没了...
#            bio1 bio2 alt soil
# height        0    0   0    0
# leaflen       0    0   0    0
# climbclimb    0    0   0    0
# climbnon      0    0   0    0
# climbsns      0    0   0    0
# growthwood    0    0   0    0

Biogeography

生物地理学结合了GIS技术、遥感和生态学的知识，不学就亏了。但以下内容都可以用ArcGIS和ENVI解决，因此也就图一乐。原作：Jesús N. Pinto-Ledezma and Jeannine Cavender-Bares

Diversity and distributions

在本实验中，你将学习R中的基本工具，以可视化物种分布，建立地理范围，以测试不同方法下生物多样性梯度的驱动力。你将需要三个数据集，它们都在我的github中： 1.物种分布数据点-live.oaks.txt 2.物种地理范围-Furnarii_ranges_geo.shp 3.环境预测因子-bio1.bil和bio12.bil

Data preparation and geographic visualization

首先是准备必要的包和地理分布数据。

#加载相关的包
library(maptools)
library(rgdal)
library(raster)
library(sp)
library(rangeBuilder)
library(sf)
library(spdep)
library(ncf)
library(ape)
library(geiger)
library(dismo)
library(spatialreg)
#加载物种分布数据
oaks <- read.table("Data/live.oaks.txt", header = TRUE)
#看一下发现都是坐标
head(oaks)
#使用maptools包将点绘制在地图（这里是简单版世界地图）上
data(wrld_simpl)
{plot(oaks[c(2:3)], col = "blue", pch = 19)
plot(wrld_simpl, add = TRUE)}

简单的分布可以在图中显示其中的一个物种，如Quercus virginiana

#提取物种列表
unique(oaks$Species)
#找到我们要的种
que_vir <- subset(oaks, oaks$Species == "Quercus_virginiana")
#绘图
{plot(que_vir$Longitude, que_vir$Latitude, pch = 15)
plot(wrld_simpl, add = TRUE)}

简单的分布2 接下来我们可以检查是否有重复的点：

oaks_dups <- duplicated(oaks[, c(2:3)])
length(which(oaks_dups == TRUE))
#看看有多少不重复的
length(which(oaks_dups == FALSE))
oaks_dups_row <- which(oaks_dups == TRUE)
length(oaks_dups_row)
#挑选出不重复的
oaks_nodups <- oaks[-oaks_dups_row,]
#查看数据纬度
dim(oaks_nodups)
head(oaks_nodups)
#不同颜色绘制看看区别
{plot(oaks$Longitude, oaks$Latitude, pch = 19, col = "red", cex = 2)
points(oaks_nodups$Longitude, oaks_nodups$Latitude, pch = 16, col = "black")}

Range of occurance

该部分可用于基于几何形状（例如最小凸多边形等）创建“简单”范围图，而无需考虑环境变量（不包括ENM或SDM）。

Based on the minimum convex polygon (Convex hull)

#在物种的记录周围创建一个多边形
oaks_hull <- convHull(oaks_nodups[, c(2:3)])
#把这个多边形画出来
{plot(oaks_hull)
points(oaks_nodups$Longitude, oaks_nodups$Latitude, pch = 16, col = "black")}

多边形边界

#只对一个种绘制简单边界
que_vir <- subset(oaks_nodups, oaks_nodups$Species == "Quercus_virginiana")
que_vir_hull <- convHull(que_vir[, c(2:3)])
{plot(que_vir_hull)
points(que_vir$Longitude, que_vir$Latitude, pch = 16, col = "black")}

最后可以画一个综合大图

#把所有树的点用黑色画出来
{plot(oaks_hull)
points(oaks_nodups$Longitude, oaks_nodups$Latitude, pch = 16, col = "black")
#把Quecus virginiana的分布用红色标出
plot(que_vir_hull, add = TRUE)
points(que_vir$Longitude, que_vir$Latitude, pch = 16, col = "red")
#加入世界地图
plot(wrld_simpl, add = TRUE)}
#注意，上面是套娃式的一整句语句

多边形边界2 但明眼人都看得出来这样绘制物种范围显然太弱智了。

Boundary based on dynamic Alpha

好的，使用简单的凸包似乎不是活橡树的好工具，现在让我们尝试另一种方法。现在，我们将使用软件包rangeBuilder中的动态alpha包。

#先画一个普通的，和上面没区别的
que_vir_alphahull <- getDynamicAlphaHull(que_vir, fraction = 0.95, coordHeaders = c("Longitude", "Latitude"),clipToCoast = 'no')[[1]]
#再来个动态Alpha的
{plot(que_vir_alphahull, lwd = 2, col = "red") 
plot(que_vir_hull, add = TRUE, lwd = 2, lty = 2)
points(que_vir$Longitude, que_vir$Latitude, pch = 16, col = "green")
plot(wrld_simpl, add = TRUE, lwd = 2)}

动态阿尔法边界好像还是很扯淡。

Biodiversity gradients (based on boundary)

在此之前，我们探讨了如何使用事件来绘制，清理和建立物种地理范围。现在将使用最大的大陆性区域性辐射值(?)（Furnariides）结合物种地理范围来探索物种多样性的地理梯度。我们使用的地理范围数据是Infraorder Furnariides（鸟类雀形目火炉亚目下的某个）。该数据可通过BirdLife International获得。当然可以使用IUCN或BIEN上的任何其他组。无论如何，你首先需要以shapefile格式下载它们的多边形。要加载这些地理分布需要用到R包maptools的函数readShapePoly。我们可以通过设置方块（栅格）然后切出每块区域的物种丰富度。 备注：以下操作均可在ArcGIS上完成

franges <- readShapePoly("Data/Franges/Furnarii_ranges_geo.shp")
str(franges)
#通过@来访问这些数据
head(franges@data)
#首先在1º纬度的空间分辨率下，使用这个鸟类的范围，在新热带区域创建一个空栅格。
neo_ras <- raster()
#extent函数可以确定空间范围
extent(neo_ras) <- extent(franges)
res(neo_ras) <- 1
neo_ras
values(neo_ras) <- 0
#使用空栅格，我们将对每个像元或像素中的物种标识进行“栅格化”。最终的栅格将是新热带地区这种鸟类的物种丰富度。
f_sr_raster <- rasterize(franges, neo_ras, field = "SCINAME", 
                         fun = function(x,...){length(unique(na.omit(x)))})
#在地图上显示
{plot(f_sr_raster)
plot(wrld_simpl, add = T)}

多样性的分布

#换颜色
colfuncYellows <- colorRampPalette(c("#d7191c", "#fdae61", "#ffffbf", "#abd9e9", "#2c7bb6"))
{plot(f_sr_raster, col = rev(colfuncYellows(100)), axes = FALSE, box = FALSE, 
     zlim = c(minValue(f_sr_raster), maxValue(f_sr_raster)), 
     xlab = "Furnariides richness", legend.width = 2)
plot(wrld_simpl, add = T)}

多样性的分布2 让我们尝试栅格化来自shp数据集的其他信息。我们将使用“RD”列中的信息，此数据对应于从尖端到系统树根部的节点数或根距(RD)（系统发育分支间的距离？），因此，将使用RD计算平均根距离(MRD)，用于衡量一个群落中物种的进化衍生性(Kerr＆Currie, 1999)，也可用于确定本地动物群主要是由早生物种还是近来起源的物种组成(Hawkins et al, 2012)。

head(franges@data)
f_MRD_raster <- rasterize(franges, neo_ras, field = "RD", fun = mean)
{plot(f_MRD_raster)
plot(wrld_simpl, add = T)}
#换颜色加标题
{plot(f_MRD_raster, col = rev(colfuncYellows(100)), axes = FALSE, box = FALSE, 
     zlim = c(minValue(f_MRD_raster), maxValue(f_MRD_raster)), 
     xlab = "Furnariides mean root distance", legend.width = 2)
plot(wrld_simpl, add = T)}

MRD

#把两张图合并
{par(mfrow = c(1, 2))
plot(f_sr_raster, col = rev(colfuncYellows(100)), axes = FALSE, box = FALSE, 
     zlim = c(minValue(f_sr_raster), maxValue(f_sr_raster)), 
     xlab = "Furnariides richness", legend.width = 2)

plot(f_MRD_raster, col = rev(colfuncYellows(100)), axes = FALSE, box = FALSE, 
     zlim = c(minValue(f_MRD_raster), maxValue(f_MRD_raster)), 
     xlab = "Furnariides mean root distance", legend.width = 2)}

MRD与生物多样性

Scale dependence problem

现在，我们将探讨生态和进化中最古老的问题之一，即数据中的“尺度依赖性”。因此，要探索这种尺度相关性，我们将对Furnariides属的范围进行栅格化，但要使用从2到4度的纬度的不同空间分辨率。

#设置2°和4°的格网
#2º
neo_ras_2dg <- raster()
extent(neo_ras_2dg) <- extent(franges)
res(neo_ras_2dg) <- 2
neo_ras_2dg
values(neo_ras_2dg) <- 0
#4º
neo_ras_4dg <- raster()
extent(neo_ras_4dg) <- extent(franges)
res(neo_ras_4dg) <- 4
neo_ras_4dg
values(neo_ras_4dg) <- 0
#分别套数据得到不同大小尺度的物种多样性
f_sr_2dg_raster <- rasterize(franges, neo_ras_2dg, field = "SCINAME", fun = function(x,...){length(unique(na.omit(x)))})
f_sr_4dg_raster <- rasterize(franges, neo_ras_4dg, field = "SCINAME", fun = function(x,...){length(unique(na.omit(x)))})
#画在一个图里
{par(mfrow = c (1, 3))
plot(f_sr_raster, main = "Furnariides richness 1dg")
plot(wrld_simpl, add = T)

plot(f_sr_2dg_raster, main = "Furnariides richness 2dg")
plot(wrld_simpl, add = T)

plot(f_sr_4dg_raster, main = "Furnariides richness 4dg")
plot(wrld_simpl, add = T)}

看看有什么区别呢？不同格网

Specie distribution model

加载与bio1（年平均温度）和bio12（年降水量）相对应的环境变量。这些数据对应于WorldClim中19个变量中的两个。我们将仅出于教育目的使用这两个变量，而不是对物种与环境的关系进行完整的评估。

#加载并绘制2个气候因子
bio1 <- raster("Data/Envi/bio1.bil")
bio1
bio12 <- raster("Data/Envi/bio12.bil")
bio12
#显示一下这是全球的气候数据
{par(mfrow = c(2, 1))
plot(bio1)
plot(bio12)}
#按这个属的鸟类的分布区域进行裁剪
bio1_neo <- crop(bio1, extent(franges))
bio12_neo <- crop(bio12, extent(franges))
#绘制
{par(mfrow = c(1, 2))
plot(bio1_neo, main = "Annual Mean Temperature")
plot(bio12_neo, main = "Annual Precipitation")}

分布地气候现在，我们将从Furnariides属的栅格中获取坐标。然后，将使用这些坐标从bio1和bio12气候层中提取信息。

#提取物种坐标
f_ras_coords <- xyFromCell(f_sr_raster, 1:length(values(f_sr_raster)))
#提取坐标对应气候数据
f_ras_bio1 <- extract(bio1_neo, f_ras_coords)
f_ras_bio12 <- extract(bio12_neo, f_ras_coords)
#提取物种多样性数据
f_ras_rich <- values(f_sr_raster)
#合到一起
fdata <- na.omit(data.frame(f_ras_coords, f_ras_rich, f_ras_bio1, f_ras_bio12))
#f_ras_rich_noNA <- ifelse(is.na(f_ras_rich), 0, f_ras_rich)
#f_ras_bio1_noNA <- ifelse(is.na(f_ras_bio1), 0, f_ras_bio1)
#f_ras_bio12_noNA <- ifelse(is.na(f_ras_bio12), 0, f_ras_bio12)
#fdata_noNA <- data.frame(f_ras_coords, f_ras_rich_noNA, f_ras_bio1_noNA, f_ras_bio12_noNA)

之后可以简单分析一下，看看相关系数什么的：

cor.test(fdata$f_ras_rich, fdata$f_ras_bio1)
cor.test(fdata$f_ras_rich, fdata$f_ras_bio12)
{par(mfrow = c(1, 2))
plot(fdata$f_ras_bio1, fdata$f_ras_rich, xlab = "Bio 1", ylab = "Richness")
plot(fdata$f_ras_bio12, fdata$f_ras_rich, xlab = "Bio 12", ylab = "Richness")}

分别相关

Spatial autocorrelation

现在，我们认为相关性不是简单的因果关系，因此要探索这种关系，我们需要构建拟合模型。为了探索这种关系，我们将首先探索一个简单的普通最小二乘回归（OLS）。（就是利用最小二乘法进行线性回归）

fols <- lm(f_ras_rich ~ f_ras_bio1 + f_ras_bio12, data = fdata)
summary(fols)
#Adjusted R-squared:  0.493

接着计算它们物种丰富度以及残差之间的空间相关性：

#计算空间自相关
autocor_SR <- ncf::correlog(fdata$x, fdata$y, z = fdata$f_ras_rich, na.rm = T, 
                         increment = 1, resamp = 1)
#绘图
{plot(autocor_SR$correlation[1:50], type = "b", pch = 1, cex = 1.2, lwd = 1.5,
     ylim = c(-1, 1), xlab = "Distance class", ylab = "Moran's I", cex.lab = 1.2, 
     cex.axis = 1.2)
     abline(h = 0)}
#接下来考察残差中的空间自相关
coords <- fdata[1:2]
coords <- as.matrix(coords)
#通过距离计算领域间的连续性
nb1.5 <- dnearneigh(coords, 0, 1.5)
#通过领域连续性为相邻列表建立空间权重
nb1.5.w <- nb2listw(nb1.5, glist = NULL, style = "W", zero.policy = TRUE)
#从OLS中提取残差
residuals_ols <- residuals(fols)
#绘图
plot(residuals_ols)
#计算单变量空间相关图。
autocor_ols_res <- correlog(fdata$x, fdata$y, z = residuals(fols), 
                            increment = 1, resamp = 1)
#并绘制残差的自相关图
{plot(autocor_ols_res$correlation[1:50], type = "b", pch = 1, cex = 1.2, lwd = 1.5,
     ylim = c(-0.5, 1), xlab = "distance", ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS residuals", cex = 1.5)}

残差自相关似乎残差具有很强的空间自相关，这是一个问题，因为如果我们在残差中发现自相关，我们获得的大部分解释都可能会产生偏差。那我们接着检查这两个自相关的图。

{par(mfrow = c(2, 1))

plot(autocor_SR$correlation[1:50], type = "b", pch = 1, cex = 1.2, lwd = 1.5,
     ylim = c(-1, 1), xlab = "Distance class", ylab = "Moran's I", cex.lab = 1.2, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS model", cex = 1.5)
plot(autocor_ols_res$correlation[1:50], type = "b", pch = 1, cex = 1.2, lwd = 1.5,
     ylim = c(-0.5, 1), xlab = "Distance class", ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS residuals", cex = 1.5)}

残差和多样性自相关存在很强的空间自相关，因此使用OLS模型的任何结论都可能存在偏差。为了尝试解决这个重要问题，我们将使用空间同时自回归误差模型估计（Aka SAR模型），该模型通过以空间权重矩阵的形式添加额外的项（自回归）来说明空间自相关。其会指定每个单元格或像素的邻域以及每个邻居的相对权重。

#Aka SAR模型
sar_nb1.5.w <- errorsarlm(fols, listw = nb1.5.w, data = fdata, quiet = FALSE, 
                          zero.policy = TRUE)
summary(sar_nb1.5.w)
#AIC: 10931, (AIC for lm: 14495)
residuals_sar_nb1.5.w <- residuals(sar_nb1.5.w)

那么它的空间自相关结果怎么样呢？

#计算
autocor_sar_nb1.5.w <- correlog(fdata$x, fdata$y, z = residuals(sar_nb1.5.w), 
                                na.rm = T, increment = 1, resamp = 1)
#绘图
{plot(autocor_sar_nb1.5.w$correlation[1:50], type = "b", pch = 4, cex = 1.2, lwd = 1.5,
     ylim = c(-0.5, 1), xlab = "distance", ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "Correlogram SARerr", cex = 1.5)}
#和线性回归的一起比较绘图
{par(mfrow = c(2, 1))
plot(autocor_ols_res$correlation[1:50], type = "b", pch = 1, cex = 1.2, lwd = 1.5,
     ylim = c(-0.5, 1), xlab = "distance", ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS residuals", cex = 1.5)

plot(autocor_sar_nb1.5.w$correlation[1:50], type = "b", pch = 4, cex = 1.2, lwd = 1.5,
     ylim = c(-0.5, 1), xlab = "distance", ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "Correlogram SARerr", cex = 1.5)}

空间自相关的比较那是真的没有相关。但是很奇怪的是，SAR模型没有R square，因此我们使用别的函数：

#加载来源不明的函数
source("https://raw.githubusercontent.com/jesusNPL/BetaDivNA/master/SARr2.R")
SARr2(Lfull = sar_nb1.5.w$LL, Lnull = sar_nb1.5.w$logLik_lm.model, N = nrow(fdata))
#'log Lik.' 0.8852386 (df=4)

Species distribution and niche theory

在本实验中，我们将探索一些相关模型，这些模型使用物种级别的数据（地理分布）和大空间尺度的环境数据。我们将使用世界上最大的金刚鹦鹉——风信子金刚鹦鹉（Anodorhynchus hyacinthinus）的地理分布数据，该数据分布在玻利维亚和巴西的潘塔纳尔湿地和巴西的塞拉多之间。环境数据可从Ecoclimate（http://ecoclimate.org）获得。也可以从WorldClim获取环境数据。首先准备程序与数据：

#加载包
library(permute)
library(lattice)
library(vegan)
library(sp)
library(raster)
library(psych)
library(maps)
library(maptools)
library(kernlab)
library(dismo)
#加载气候数据
datoG0 <- read.table("Data/Environment/bio_var_CCSM_0k_global.txt", h = T)
#查看气候数据结构
head(datoG0)
datoG0[1:5, 1:5]
dim(datoG0)
str(datoG0)
summary(datoG0)
class(datoG0)
names(datoG0)
#把data.frame的数据变成raster栅格
gridded(datoG0) <- ~long+lat
class(datoG0)
#stake()函数可以改为栅格
clima0k <- stack(datoG0)
#绘制在世界地图上
plot(clima0k$bio.1)
map(add = T)

clim1 上图对应世界各地的环境变量的年平均温度即BIO1。但是，我们不需要世界各地的所有环境信息，因为我们的重点物种仅在南美分布（如果该物种位于美国或欧洲的动物园中，并不意味着该物种就存在于那些地方）。

#按经纬度裁剪
e <- extent(c(-90, -30, -60, 15))
clima0k.SA <- crop(clima0k, e)
plot(clima0k.SA$bio.9)
map(add = T)
#看看里面有几个小格子(grid)
ncell(clima0k.SA)

clim1-1

SA0k <- values(clima0k.SA)
class(SA0k)
#获得每个小格的经纬度坐标
coord.SA <- xyFromCell(clima0k.SA, 1:ncell(clima0k.SA))
#将这些小格的坐标和气候对应
SA0k <- cbind(coord.SA, SA0k)
#看看数据
SA0k[1:5,]
SA0k[1:5, 1:5]

现在，我们需要确定需要哪些变量来建模风信子金刚鹦鹉的生态位（格林纳利生态位或基础生态位）。在这里，我们可以依靠专家进行选择或使用统计工具或同时使用这两种工具。在这种情况下，我们将使用心理学中通常使用的阶乘分析。

#执行阶乘分析
fa.parallel(SA0k[, -c(1:3)])

因子分析平行分析表明，因子的数量= 5和组件的数量= 5足以进行进一步分析。

SA0k.fa <- fa(SA0k[, -c(1:3)], nfactors = 5, rotate = "varimax")
SA0k.fa
loadings(SA0k.fa)

根据FA(阶乘)的结果和专业知识，我们可以选择进行计算变量：bio1，bio2，bio15，bio4，bio2。接着我们保存变量并加载物种数据：（从GBIF中）

#保存我们选出的变量
write.table(SA0k[, c("x", "y", "bio.1", "bio.2", "bio.4", "bio.12", "bio.15")],
            row.names = FALSE, "Data/Environment/climaSA0k.txt", sep = "\t")
#作为栅格保存
writeRaster(clima0k.SA, "Data/Environment/climaSA0k.asc", format = "ascii", bylayer = T)
#清理r环境
rm(list = ls())
#直接从R中从GBIF获取我们物种的一些发生数据。考虑到所选物种的发生次数，这可能需要一些时间和网速。
hyacinth <- gbif("anodorhynchus", "hyacinthinus*", geo = FALSE)
#查看并选择有经纬度的数据列
dim(hyacinth)
colnames(hyacinth)
#整理数据
#去NA
hyacinth <- subset(hyacinth, !is.na(lon) & !is.na(lat))
dim(hyacinth)
hyacinth[1:4, c(1:5, 7:10)]
#其次，仅选择建模对象所需的列。
#请注意，使用colnames(风信子鹦鹉)可以获取列的位置，并且114、84和77列（按此顺序）分别是对应于物种名称，经度和纬度的列，这就是我们所需要的。
hyacinth_coords <- hyacinth[, c(114, 84, 77)]
head(hyacinth_coords)
#保存
dir.create("Data/OCC")
write.csv(hyacinth_coords, "Data/OCC/hyacinth_data.csv")
#再清理一下运行环境
rm(list = ls())
#合并物种分布与环境数据
hyacinth <- read.csv("Data/OCC/hyacinth_data.csv")
head(hyacinth)
#只有一个物种，就不要物种名字了
hyacinth_coords <- hyacinth[, c(3, 4)]
head(hyacinth_coords)
#加载环境数据
climSA0k <- read.table("Data/Environment/climaSA0k.txt", h = T)
class(climSA0k)
gridded(climSA0k) <- ~x + y
climSA0k <- stack(climSA0k)
climSA0k
#将其绘制在地图上
plot(climSA0k$bio.1)
points(hyacinth_coords[, "lon"], hyacinth_coords[, "lat"])
#匹配物种分布数据和气象数据
hyacinth_var <- extract(climSA0k, hyacinth_coords, cellnumbers = T)
head(hyacinth_var)
hyacinth_var <- cbind(hyacinth_coords, hyacinth_var)
hyacinth_var[1:5, ]
#把NA数据去了
hyacinth_var <- na.omit(hyacinth_var) # remove NA's
dim(hyacinth_var)
#删除单元格中重复的值
duplicated(hyacinth_var[,"cells"])
a <- which(duplicated(hyacinth_var[, "cells"]) == T)
a
hyacinth_var <- hyacinth_var[-a, ]
dim(hyacinth_var)
#保存
write.table(hyacinth_var, row.names = FALSE, "Data/OCC/hyacinth_var.txt", sep = "\t")
#画在地图上
plot(climSA0k$bio.1)
points(hyacinth_var[, "lon"], hyacinth_var[, "lat"])
#整理数据作为背景数据集
clima0k <- read.table("Data/Environment/climaSA0k.txt", h = T)
dim(clima0k)
#123就是有123个分布点
id.back <- sample(1:nrow(clima0k), 123) 
length(id.back)
dim(background)
names(background)
#保存
write.table(background, "Data/Environment/background.txt", row.names = F, sep = "\t")

Fitting and prediction of ecological niche model

有了这些数据我们就可以开始演算了。先通过环境背景数据和物种分布数据设置测试集合训练集.

#准备数据
id.ocur <- sample(1:nrow(hyacinth_var), round(0.75*nrow(hyacinth_var)))
length(id.ocur)
id.back <- sample(1:nrow(background), round(0.75*nrow(background)))
length(id.back)
#准备训练集
training <- prepareData(x = climSA0k, 
                             p = hyacinth_var[id.ocur, 1:2], 
                             b = background[id.back, 1:2], xy = T)
#准备测试集
test <- prepareData(x = climSA0k, 
                      p = hyacinth_var[-id.ocur, 1:2], 
                      b = background[-id.back,1:2], xy = T)

接着我们用不同的模型用训练集去预测其生态位分布: 可以使用各种模型来探索物种如何对这些特定变量做出响应（基于我们使用的特定算法以及地理和环境空间之间的关系），使用一个为每个变量创建响应图的响应函数，以及其他函数变量的中位数。同时，我们将评估每个模型的性能。为了评估模型的性能，我们将使用测试数据。对所有四个模型重复以上操作。使用存在/不存在数据对模型进行交叉验证。给定存在向量和不存在值的向量（或模型以及存在和不存在点以及预测变量），将计算混淆矩阵（针对变化的阈值），并为每个混淆矩阵/阈值计算模型评估统计量。最后可以为每个模型设置阈值，阈值（截止值）用于将模型预测（概率，距离或相似值）转换为二进制分数（存在或不存在）。

Bioclim model

BIOCLIM算法通过将任意位置的环境变量的值与已知发生位置（“训练地点”）的值的百分比分布进行比较，来计算位置的相似度。

Bioclim.model <- bioclim(x = training[training[, "pb"] == 1, -c(1:3)]) 
Bioclim.model
plot(Bioclim.model)

bioclim1 让我们绘制模型的响应变量或单个变量响应曲线。

response(Bioclim.model)

bioclim2 进行分布区预测

Bioclim0k <- predict(object = Bioclim.model, x = climSA0k)
plot(Bioclim0k)
#加入现在的实际分布点
points(training[training[, "pb"]==1,"x"], training[training[, "pb"]==1,"y"])

bioclim3 测试数据

Bioclim.eval <- evaluate(p = test[test[, "pb"] == 1, 1:2], 
                         a = test[test[, "pb"] == 0, 1:2], 
                         model = Bioclim.model, 
                         x = climSA0k)
Bioclim.eval
#AUC: 0.8319892 
str(Bioclim.eval)
#设置阈值
Bioclim.thr <- threshold(Bioclim.eval)
Bioclim.thr

“thr”对象中有不同的列，每个列都是阈值的度量，在本示例中，我们将使用“ spec_sens”列（物种敏感度）。在spec_sens中，阈值为灵敏度（真阳性率）和特异性（真阴性率）之和最高的值。

bio <- Bioclim.thr$spec_sens

Gower model

域算法计算任何位置的环境变量与已知的任何发生位置（“训练地点”）之间的高尔距离。对于每个变量，均采用站点与任何训练点之间的最小距离。

Gower.model <- domain(x = training[training[, "pb"] == 1, -c(1:3)])
Gower.model
response(Gower.model)
#预测
Gower0k <- predict(climSA0k, Gower.model)
plot(Gower0k)
#验证
Gower.eval <- evaluate(p = test[test[, "pb"] == 1, 1:2], 
                       a = test[test[, "pb"] == 0, 1:2], 
                       model = Gower.model, 
                       x = climSA0k)
#查看阈值
Gower.thr <- threshold(Gower.eval)
gow <- Gower.thr$spec_sens

Gower1

Support vector machine (SVM)

支持向量机是用于分类，独立性检测和回归的出色工具。支持向量机（SVMs;Vapnik,1998）将简单的线性方法应用于数据，但是在与输入空间非线性相关的高维特征空间中，但实际上，它不涉及该高维中的任何计算空间。这种简单性与在许多学习问题（分类，回归和独立性检测）上的先进技术相结合，促进了SVM的普及（Karatzoglou等，2006）。

svm.model <- ksvm(pb ~ bio.1 + bio.2 + bio.4 + bio.12 + bio.15, data = training)
#预测
svm0k <- predict(climSA0k, svm.model)
plot(svm0k)
#验证
svm.eval <- evaluate(p = test[test[, "pb"] == 1, 1:2], 
                     a = test[test[, "pb"] == 0, 1:2], 
                     model = svm.model, 
                     x = climSA0k)
#查看阈值
svm.thr <- threshold(svm.eval)
s <- svm.thr$spec_sens

Generalized Linear Model (GLM)

广义线性模型（GLM）是普通最小二乘回归的广义。使用最大似然并通过允许线性模型通过链接函数与响应变量相关以及通过允许每次测量的方差大小为其预测值的函数来拟合模型。根据GLM的指定方式，它可以等同于（多次）线性回归，逻辑回归或泊松回归。

glm.model <- glm(pb ~ bio.1 + bio.2 + bio.4 + bio.12 + bio.15, 
                  data = training, family = binomial(link = "logit"))
#预测
GLM0k <- predict(climSA0k, glm.model)
plot(GLM0k)
#验证
glm.eval <- evaluate(p = test[test[, "pb"] == 1, 1:2], 
                     a = test[test[, "pb"] == 0, 1:2], 
                     model = glm.model, 
                     x = climSA0k)
#查看阈值
glm.thr <- threshold(glm.eval)
g <- glm.thr$spec_sens

比较4张图

par(mfrow = c(2, 2))
plot(Bioclim0k, main = "bioclim")
plot(Gower0k, main = "Gower")
plot(svm0k, main = "svm")
plot(GLM0k, main = "glm")

比较四个模型的ROC值

par(mfrow = c(2, 2))
plot(Bioclim.eval, "ROC")
plot(Gower.eval, "ROC")
plot(svm.eval, "ROC")
plot(glm.eval, "ROC")

AUC 添加了阈值限制后的物种适宜分布区

par(mfrow = c(2, 2))
plot(Bioclim0k > bio, main = "Bioclim")
plot(GLM0k > g, main = "GLM")
plot(Gower0k > gow, main = "Gower")
plot(svm0k > s, main = "SVM")

可以综合这四个数值以获得平均的结果/综合的结果

thrs <- (bio + gow + s + g)
#stack()为堆叠函数
tmp <- stack(Bioclim0k, Gower0k, GLM0k, svm0k)
map.sum <- sum(tmp)
par(mfrow = c(2, 2))
plot(map.sum)
plot(map.sum > thrs)
plot(map.sum > 2)
plot(map.sum > 3)

也可以查看他们的平均结果

map.mean <- mean(tmp)
map.sd <- calc(tmp, sd)
par(mfrow = c(2, 2))
plot(map.mean)
plot(map.mean > thrs)
plot(map.mean > 0.2)
plot(map.mean > 0.3)

但是，这是一个有问题的方法，因为模型预测的值并不都在相同的范围（0到1）之间。所以您可能要先解决它。解决方案是添加权重。让我们结合按其AUC分数加权的四个模型。在这里，为了创建权重，我们减去0.5（随机期望）并将结果平方，以为更高的AUC值赋予更多权重。

auc <- sapply(list(Bioclim.eval, Gower.eval, svm.eval, glm.eval), function(x) x@auc)
w <- (auc-0.5)^2
map.mean.weight <- weighted.mean(tmp[[c("layer.1", "layer.2", "layer.3", "layer.4")]], w)
plot(map.mean.weight)
par(mfrow = c(2, 2))
plot(map.sum, main = "Sum of all models")
plot(map.mean, main = "Mean of all models")
plot(map.mean.weight, main = "Weighted mean of all models")
plot(map.sd, main = "Standard deviation of all models")

biomod2 package

一个很厉害的包…不单单可以拟合环境数据等连续性变量，还可以拟合基因标记等二项分布的变量。物种分布模型本质上就是一个概率模型，建立每个栅格（地理位置）对于的环境变量与其存在概率之间的函数关系来估计物种分布的可能性。

Prepare biomod2

#设置工作环境
setwd('workdir')
#安装biomod2
devtools::install_github('biomodhub/biomod2')
#加载
library(biomod2)
library(ggplot2)
library(gridExtra)
library(raster)
library(rasterVis)
#加载数据
ProLau_occ <- read.csv('../data/ProLau_occ.csv')
summary(ProLau_occ)
#加载各种数据...
bioclim_ZA_sub <- 
  raster::stack(
    c(
      bio_5  = '../data/worldclim_ZA/worldclim_ZA_bio_5.asc',
      bio_7  = '../data/worldclim_ZA/worldclim_ZA_bio_7.asc',
      bio_11 = '../data/worldclim_ZA/worldclim_ZA_bio_11.asc',
      bio_19 = '../data/worldclim_ZA/worldclim_ZA_bio_19.asc'
    )
  )

bioclim_ZA_sub

#整理数据
ProLau_data <- 
  BIOMOD_FormatingData(
    resp.var = ProLau_occ['Protea.laurifolia'],
    resp.xy = ProLau_occ[, c('long', 'lat')],
    expl.var = bioclim_ZA_sub,
    resp.name = "Protea.laurifolia",
    PA.nb.rep = 2,
    PA.nb.absences = 500,
    PA.strategy = 'random'
  )

运行物种分布模型

ProLau_data
#绘制伪缺席图
plot(ProLau_data)

数据分布图

#定义单个模型的选项
ProLau_opt <- 
  BIOMOD_ModelingOptions(
    GLM = list(type = 'quadratic', interaction.level = 1),
    GBM = list(n.trees = 1000),
    GAM = list(algo = 'GAM_mgcv')
  )
#跑模型
ProLau_models <- 
  BIOMOD_Modeling(
    data = ProLau_data,
    models = c("GLM", "GBM", "RF", "GAM"),
    models.options = ProLau_opt,
    NbRunEval = 2,
    DataSplit = 80,
    VarImport = 3,
    modeling.id = "demo1"
  )

#评估单个模型的质量

#获取模型评估分数
ProLau_models_scores <- get_evaluations(ProLau_models)

## ProLau_models_scores是一个包含模型分数的5维数组
dim(ProLau_models_scores)
dimnames(ProLau_models_scores)

# 绘出模型评估分数
models_scores_graph(
  ProLau_models, 
  by = "models", 
  metrics = c("ROC","TSS"), 
  xlim = c(0.5,1), 
  ylim = c(0.5,1)
)

models_scores_graph(
  ProLau_models, 
  by = "cv_run" , 
  metrics = c("ROC","TSS"), 
  xlim = c(0.5,1), 
  ylim = c(0.5,1)
)

models_scores_graph(
  ProLau_models, 
  by = "data_set", 
  metrics = c("ROC","TSS"), 
  xlim = c(0.5,1), 
  ylim = c(0.5,1)
)

#检查各个变量的重要值
(ProLau_models_var_import <- get_variables_importance(ProLau_models))

#求变量重要性的平均值
apply(ProLau_models_var_import, c(1,2), mean)

#绘制模型的响应图
ProLau_glm <- BIOMOD_LoadModels(ProLau_models, models='GLM')
ProLau_gbm <- BIOMOD_LoadModels(ProLau_models, models='GBM')
ProLau_rf <- BIOMOD_LoadModels(ProLau_models, models='RF')
ProLau_gam <- BIOMOD_LoadModels(ProLau_models, models='GAM')

glm_eval_strip <- 
  biomod2::response.plot2(
    models  = ProLau_glm,
    Data = get_formal_data(ProLau_models,'expl.var'), 
    show.variables= get_formal_data(ProLau_models,'expl.var.names'),
    do.bivariate = FALSE,
    fixed.var.metric = 'median',
    legend = FALSE,
    display_title = FALSE,
    data_species = get_formal_data(ProLau_models,'resp.var')
  )

gbm_eval_strip <- 
  biomod2::response.plot2(
    models  = ProLau_gbm,
    Data = get_formal_data(ProLau_models,'expl.var'), 
    show.variables= get_formal_data(ProLau_models,'expl.var.names'),
    do.bivariate = FALSE,
    fixed.var.metric = 'median',
    legend = FALSE,
    display_title = FALSE,
    data_species = get_formal_data(ProLau_models,'resp.var')
  )

rf_eval_strip <- 
  biomod2::response.plot2(
    models  = ProLau_rf,
    Data = get_formal_data(ProLau_models,'expl.var'), 
    show.variables= get_formal_data(ProLau_models,'expl.var.names'),
    do.bivariate = FALSE,
    fixed.var.metric = 'median',
    legend = FALSE,
    display_title = FALSE,
    data_species = get_formal_data(ProLau_models,'resp.var')
  )

gam_eval_strip <- 
  biomod2::response.plot2(
  models  = ProLau_gam,
  Data = get_formal_data(ProLau_models,'expl.var'), 
  show.variables= get_formal_data(ProLau_models,'expl.var.names'),
  do.bivariate = FALSE,
  fixed.var.metric = 'median',
  legend = FALSE,
  display_title = FALSE,
  data_species = get_formal_data(ProLau_models,'resp.var')
)

如随机森林的变量响应模型： rf_eval_strip

#运行集成模型
ProLau_ensemble_models <- 
  BIOMOD_EnsembleModeling(
    modeling.output = ProLau_models,
    em.by = 'all',
    eval.metric = 'TSS',
    eval.metric.quality.threshold = 0.8,
    models.eval.meth = c('TSS','ROC'),
    prob.mean = FALSE,
    prob.cv = TRUE, 
    committee.averaging = TRUE,
    prob.mean.weight = TRUE,
    VarImport = 0
  )

#评估集成模型质量
(ProLau_ensemble_models_scores <- get_evaluations(ProLau_ensemble_models))

Single variable model fitting and prediction

#当前适宜区？
ProLau_models_proj_current <- 
  BIOMOD_Projection(
    modeling.output = ProLau_models,
    new.env = bioclim_ZA_sub,
    proj.name = "current",
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )

ProLau_ensemble_models_proj_current <- 
  BIOMOD_EnsembleForecasting(
    EM.output = ProLau_ensemble_models,
    projection.output = ProLau_models_proj_current,
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )

#预测未来
#加载2050气候
bioclim_ZA_2050_BC45 <-
  stack(
    c(
      bio_5  = '../data/worldclim_ZA/worldclim_ZA_2050_BC45_bio_5.asc',
      bio_7  = '../data/worldclim_ZA/worldclim_ZA_2050_BC45_bio_7.asc',
      bio_11 = '../data/worldclim_ZA/worldclim_ZA_2050_BC45_bio_11.asc',
      bio_19 = '../data/worldclim_ZA/worldclim_ZA_2050_BC45_bio_19.asc'
    )
  )

ProLau_models_proj_2050_BC45 <- 
  BIOMOD_Projection(
    modeling.output = ProLau_models,
    new.env = bioclim_ZA_2050_BC45,
    proj.name = "2050_BC45",
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )

ProLau_ensemble_models_proj_2050_BC45 <- 
  BIOMOD_EnsembleForecasting(
    EM.output = ProLau_ensemble_models,
    projection.output = ProLau_models_proj_2050_BC45,
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )

#加载2079气候
bioclim_ZA_2070_BC45 <-
  stack(
    c(
      bio_5  = '../data/worldclim_ZA/worldclim_ZA_2070_BC45_bio_5.asc',
      bio_7  = '../data/worldclim_ZA/worldclim_ZA_2070_BC45_bio_7.asc',
      bio_11 = '../data/worldclim_ZA/worldclim_ZA_2070_BC45_bio_11.asc',
      bio_19 = '../data/worldclim_ZA/worldclim_ZA_2070_BC45_bio_19.asc'
    )
  )

ProLau_models_proj_2070_BC45 <- 
  BIOMOD_Projection(
    modeling.output = ProLau_models,
    new.env = bioclim_ZA_2070_BC45,
    proj.name = "2070_BC45",
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )

ProLau_ensemble_models_proj_2070_BC45 <- 
  BIOMOD_EnsembleForecasting(
    EM.output = ProLau_ensemble_models,
    projection.output = ProLau_models_proj_2070_BC45,
    binary.meth = "TSS",
    output.format = ".img",
    do.stack = FALSE
  )
#绘制预测图像
plot(ProLau_ensemble_models_proj_2070_BC45, str.grep = "EMca|EMwmean")

SRC

#完成物种变化范围图（Species Range Change (SRC) ）
# 加载投影坐标系
ProLau_bin_proj_current <- 
  stack( 
    c(
      ca = "Protea.laurifolia/proj_current/individual_projections/Protea.laurifolia_EMcaByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img",
      wm = "Protea.laurifolia/proj_current/individual_projections/Protea.laurifolia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
    )
  )

ProLau_bin_proj_2050_BC45 <- 
  stack( 
    c(
      ca = "Protea.laurifolia/proj_2050_BC45/individual_projections/Protea.laurifolia_EMcaByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img",
      wm = "Protea.laurifolia/proj_2050_BC45/individual_projections/Protea.laurifolia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
    )
  )

ProLau_bin_proj_2070_BC45 <- 
  stack( 
    c(
      ca = "Protea.laurifolia/proj_2070_BC45/individual_projections/Protea.laurifolia_EMcaByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img",
      wm = "Protea.laurifolia/proj_2070_BC45/individual_projections/Protea.laurifolia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
    )
  )
#当前至2050的SRC变化
SRC_current_2050_BC45 <- 
  BIOMOD_RangeSize(
    ProLau_bin_proj_current,
    ProLau_bin_proj_2050_BC45
  )

SRC_current_2050_BC45$Compt.By.Models

#当前至2070的SRC变化
SRC_current_2070_BC45 <- 
  BIOMOD_RangeSize(
    ProLau_bin_proj_current,
    ProLau_bin_proj_2070_BC45
  )

SRC_current_2070_BC45$Compt.By.Models

ProLau_src_map <- 
  stack(
    SRC_current_2050_BC45$Diff.By.Pixel, 
    SRC_current_2070_BC45$Diff.By.Pixel
  )
names(ProLau_src_map) <- c("ca cur-2050", "wm cur-2050", "ca cur-2070", "wm cur-2070")

my.at <- seq(-2.5, 1.5, 1)
myColorkey <- 
  list(
    at = my.at, #观察颜色变化的区域
    labels = 
      list(
        labels = c("lost", "pres", "abs","gain"), #符号与备注
        at = my.at[-1] - 0.5 #备注的位置
      )
  )

rasterVis::levelplot( 
  ProLau_src_map, 
  main = "Protea laurifolia range change",
  colorkey = myColorkey,
  col.regions=c('#f03b20', '#99d8c9', '#f0f0f0', '#2ca25f'),
  layout = c(2,2)
)

cSRC

#计算SRC分布概率的分层密度
#参考项目
ref <- subset(ProLau_bin_proj_current, "ca")

#选择研究内容
mods <- c("GLM", "GBM", "RF", "GAM")
data_set <- c("PA1", "PA2")
cv_run <- c("RUN1", "RUN2", "Full")

#构建（模型/对象）组合
groups <- 
  as.matrix(
    expand.grid(
      models = mods, 
      data_set = data_set, 
      cv_run = cv_run,
      stringsAsFactors = FALSE)
  )

#加载我们计算出的投影
all_bin_proj_files <- 
  list.files( 
    path = "Protea.laurifolia",  
    pattern = "_TSSbin.img$",
    full.names = TRUE, 
    recursive = TRUE
  )

#当前与2070年对比（删除了当前和2050年的预测）
current_and_2070_proj_files <- grep(all_bin_proj_files, pattern="2070", value=T)

#只保留与我们选择的镶嵌面组匹配的投影
selected_bin_proj_files <- 
  apply(
  groups, 1, 
    function(x){
      proj_file <- NA
      match_tab <- sapply(x, grepl, current_and_2070_proj_files)
      match_id <- which(apply(match_tab, 1, all))
      if(length(match_id)) proj_file <- current_and_2070_proj_files[match_id]
      proj_file
    }
  )

#删除不匹配的组
to_remove <- which(is.na(selected_bin_proj_files))
if(length(to_remove)){
  groups <- groups[-to_remove,]
  selected_bin_proj_files <- selected_bin_proj_files[-to_remove]
}

#生成选定投影的堆栈图
proj_groups <- stack(selected_bin_proj_files)

ProbDensFunc(
  initial = as.vector(ref),
  projections = raster::as.matrix(proj_groups),
  groups = t(groups),
  plothist = TRUE,
  resolution = 10,
  cvsn = FALSE
)

dis modelcom

Multiple variables model fitting and prediction

## setup environment ----
setwd('workdir')

## load the required packages
library(biomod2)
library(raster)
library(rasterVis)
library(gridExtra)
library(reshape2)

## read data ----
## species occurences data
data <- read.csv('../data/Larus_occ.csv', stringsAsFactors = FALSE)
head(data)
table(data$species)
spp_to_model <- unique(data$species)

## curent climatic variables
stk_current <- 
  raster::stack(
    c(
      bio_1 =  "../data/worldclim_EU/worldclim_EU_bio_1.tif",
      bio_12 = "../data/worldclim_EU/worldclim_EU_bio_12.tif",
      bio_8 =  "../data/worldclim_EU/worldclim_EU_bio_8.tif"
    ),
    RAT = FALSE
  )

## 2050 climatic variables
stk_2050_BC_45 <- 
  raster::stack(
    c(
      bio_1 =  "../data/worldclim_EU/worldclim_EU_2050_BC45_bio_1.tif",
      bio_12 = "../data/worldclim_EU/worldclim_EU_2050_BC45_bio_12.tif",
      bio_8 =  "../data/worldclim_EU/worldclim_EU_2050_BC45_bio_8.tif"
    ),
    RAT = FALSE
  )

## 2070 climatic variables
stk_2070_BC_45 <- 
  raster::stack(
    c(
      bio_1 =  "../data/worldclim_EU/worldclim_EU_2070_BC45_bio_1.tif",
      bio_12 = "../data/worldclim_EU/worldclim_EU_2070_BC45_bio_12.tif",
      bio_8 =  "../data/worldclim_EU/worldclim_EU_2070_BC45_bio_8.tif"
    ),
    RAT = FALSE
  )


## build species modelling wrapper ----
biomod2_wrapper <- function(sp){
  cat("\n> species : ", sp)
  
  ## get occurrences points
  sp_dat <- data[data$species == sp, ]
  
  ## formating the data
  sp_format <- 
    BIOMOD_FormatingData(
      resp.var = rep(1, nrow(sp_dat)), 
      expl.var = stk_current,
      resp.xy = sp_dat[, c("long", "lat")],
      resp.name = sp,
      PA.strategy = "random", 
      PA.nb.rep = 2, 
      PA.nb.absences = 1000
    )
  
  ## print formatting summary
  sp_format
  
  ## save image of input data summary
  if(!exists(sp)) dir.create(sp)
  pdf(paste(sp, "/", sp ,"_data_formated.pdf", sep="" ))
  try(plot(sp_format))
  dev.off()
  
  ## define models options
  sp_opt <- BIOMOD_ModelingOptions()
  
  ## model species
  sp_model <- BIOMOD_Modeling( 
    sp_format, 
    models = c('GLM', 'FDA', 'RF'), 
    models.options = sp_opt, 
    NbRunEval = 2, 
    DataSplit = 70, 
    Yweights = NULL, 
    VarImport = 3, 
    models.eval.meth = c('TSS', 'ROC'),
    SaveObj = TRUE,
    rescal.all.models = FALSE,
    do.full.models = FALSE,
    modeling.id = "demo2"
  )
  
  ## save some graphical outputs
  #### models scores
  pdf(paste0(sp, "/", sp , "_models_scores.pdf"))
  try(gg1 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'models', plot = FALSE))
  try(gg2 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'data_set', plot = FALSE))
  try(gg3 <- models_scores_graph(sp_model, metrics = c("TSS", "ROC"), by = 'cv_run', plot = FALSE))
  try(grid.arrange(gg1, gg2, gg3))
  dev.off()
  
  ## build ensemble models
  sp_ens_model <- 
    BIOMOD_EnsembleModeling(
      modeling.output = sp_model,
      em.by = 'all',
      eval.metric = 'TSS',
      eval.metric.quality.threshold = 0.4,
      models.eval.meth = c('TSS','ROC'),
      prob.mean = FALSE,
      prob.mean.weight = TRUE,
      VarImport = 0
    )
  
  ## do projections
  proj_scen <- c("current", "2050_BC_45", "2070_BC_45")
  
  for(scen in proj_scen){
    cat("\n> projections of ", scen)
    
    ## single model projections
    sp_proj <- 
      BIOMOD_Projection(
        modeling.output = sp_model,
        new.env = get(paste0("stk_", scen)),
        proj.name = scen,
        selected.models = 'all',
        binary.meth = "TSS",
        filtered.meth = NULL,
        compress = TRUE,
        build.clamping.mask = FALSE,
        do.stack = FALSE,
        output.format = ".img"
      )
    
    ## ensemble model projections
    sp_ens_proj <- 
      BIOMOD_EnsembleForecasting(
        EM.output = sp_ens_model,
        projection.output = sp_proj,
        binary.meth = "TSS",
        compress = TRUE,
        do.stack = FALSE,
        output.format = ".img"
      )
  }
  
  return(paste0(sp," modelling completed !"))
}


## launch the spiecies modelling wrapper over species list ----
if(require(snowfall)){ ## parallel computation
  ## start the cluster
  sfInit(parallel = TRUE, cpus = 5) ## here we only require 4 cpus
  sfExportAll()
  sfLibrary(biomod2)
  ## launch our wrapper in parallel
  sf_out <- sfLapply(spp_to_model, biomod2_wrapper)
  ## stop the cluster
  sfStop()
} else { ## sequencial computation
  for (sp in spp_to_model){
    biomod2_wrapper(sp)
  }
  ## or with a lapply function in sequential model
  ## all_species_bm <- lapply(spp_to_model, biomod2_wrapper)
}

## produce alpha-diversity maps ----

## current conditons
### load binary projections
f_em_wmean_bin_current <- 
  paste0(
    spp_to_model,
    "/proj_current/individual_projections/", 
    spp_to_model, "_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
  )

### sum all projections
if(length(f_em_wmean_bin_current) >= 2){
  ## initialisation
  taxo_alpha_div_current <- raster(f_em_wmean_bin_current[1]) 
  for(f in f_em_wmean_bin_current[-1]){
    taxo_alpha_div_current <- taxo_alpha_div_current + raster(f)
  }
}

## 2050 conditons
### load binaries projections
f_em_wmean_bin_2050 <- 
  paste0(
    spp_to_model,
    "/proj_2050_BC_45/individual_projections/", 
    spp_to_model, "_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
  )

### sum all projections
if(length(f_em_wmean_bin_2050) >= 2){
  ## initialisation
  taxo_alpha_div_2050 <- raster(f_em_wmean_bin_2050[1]) 
  for(f in f_em_wmean_bin_2050[-1]){
    taxo_alpha_div_2050 <- taxo_alpha_div_2050 + raster(f)
  }
}

## 2070 conditons
### load binaries projections
f_em_wmean_bin_2070 <- 
  paste0(
    spp_to_model,
    "/proj_2070_BC_45//individual_projections/", 
    spp_to_model, "_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_TSSbin.img"
  )

### sum all projections
if(length(f_em_wmean_bin_2070) >= 2){
  ## initialisation
  taxo_alpha_div_2070 <- raster(f_em_wmean_bin_2070[1]) 
  for(f in f_em_wmean_bin_2070){
    taxo_alpha_div_2070 <- taxo_alpha_div_2070 + raster(f)
  }
}

## plot the alpha-div maps
levelplot(
  stack(
    c(
      current = taxo_alpha_div_current, 
      in_2050 = taxo_alpha_div_2050, 
      in_2070 = taxo_alpha_div_2070 
    )
  ),
  main = expression(paste("Larus ", alpha, "-diversity")),
  par.settings = BuRdTheme
)

Species migration model: KissMig

KissMig package

KissMig意思就是“Keep it simple species migration model”，这是一个简单的基于栅格图像的物种迁移模型，它可以非常简单快速地模拟物种的迁移。

kissmig()

kissmig运行一个简单的、基于栅格的随机迁移模型来模拟物种迁移和范围转移。它使用地理区域的起源和适宜性地图迭代运行一个简单的3x3单元算法。特别是，它允许生成可访问性地图，以便在物种分布模型中轻松整合有限的迁移。

kissmig(O, S = NULL, it, type = 'FOC', signed = FALSE, pext = 1.0, pcor = 0.2, seed = NULL)

其中： O为地理起源的单一栅格图像； S为适宜性图像，可以是单一的栅格，也可以是堆栈的栅格图像，或者是别的什么； it为迭代的次数； type为输出结果的类型，‘DIS’为最终的分布图，'FOC'是迭代至第一次出现的情景，'LOC'是迭代至最后一次出现时的情景，'NOC'是出现的迭代次数； signed如果为TRUE，则符号指示单元在最后的迭代步骤后是定植(+)还是非定植(-)； pext在0与1之间，一个定殖细胞在迭代步骤之间变为非定殖，则物种局部灭绝； pcor在0与1之间，在3x3单元邻域中考虑对角的单元； seed用于设置随机数生成器种子的整数。

从最初的O开始，kissmig在由适合性层特征的异类环境中模拟it迭代步骤的迁移S。原点O的定植单元值为1，未定植细胞值为0。如果“S”包含几个适合性层以覆盖环境变化，则it应用于每一层。适宜性在0(不合适)和1(适宜性最大)之间。kissmig使用3x3算法进行物种传播/迁移。所有的细胞在以pext为概率的迭代步骤前都得到扩增，对于一个再克隆或新的定植事件，3x3邻域内的对角单元以pcor为概率考虑(pcor=0.2产生更真实的循环扩散模式——参见Nobis & Normand 2014)。对于运行时优化，signed=TRUE会生成带符号的结果，也就是说，在结果类型“FOC”、“LCO”或“NOC”的基础上，符号表示最终的分布('DIS')，其中正的值是已定植的，负的值是已定植但在最后一步迭代后未殖民化的。为了得到可重复的结果，可以使用“seed”参数设置R随机数生成器的种子。

kissmigAccess()

从kissmig中获取可访问的地图，kissmigAccess从第一次出现的kissmig输出计算可访问性映射(type = ‘FOC’)。这些地图允许在物种分布模型和宏观生态分析中整合有限的迁移。

kissmigAccess(grd, rel=FALSE)

grd是由kissmig成的第一次出现的单个光栅层 rel如果为TRUE，kissmigAccess返回最大值为1的相对值，否则返回绝对积分值。

第一次出现的kissmig映射显示栅格单元的第一个迭代步骤的值。早期定殖细胞值低，晚期定殖细胞值高。这些值与可及性正好相反，可及性在早期定植的单元中较高，在晚期定植的单元中较低。kissmigAccess只是将每个单元格的可访问性计算为单元格值与max(grd)+1之间的差值。从未被定植的单元保持不变(值为0)。

Start using KissMig

#  Michael Nobis/WSL (michael.nobis@wsl.ch), December 2014
  library(kissmig)  # loads also raster package as dependency
# 生成/下载一些地图以供以后使用
  ## 得到有着年平均正态分布的适宜性图
  ## 基于worldclim的气温图
  s <- kissmigDummyS(mean=12,sd=3, download=TRUE)
  ## 并绘制出来
  plot(s, asp=1.0, main='simple climate suitability map')

  ## 定义一个起源
  o <- kissmigOrigin(s, 8, 44.5, 0.5)
  plot(o  , asp=1.0, main='origin')
  plot(s+o, asp=1.0, main='suitability + origin')

  ## 定义土地掩膜
  landMask <- s>=0
  plot(landMask, asp=1.0, main='land mask')

kissmig1

# 运行kissmig并输出各种东西

  k <- kissmig(o, s, it=150, type='DIS')
  plot(k*landMask, asp=1.0, main='final distribution')

  k <- kissmig(o, s, it=150, type='FOC')
  plot(k*landMask, asp=1.0, main='first iteration step of occurrence')

  a <- kissmigAccess(k)
  plot(a*landMask, asp=1.0, main='accessibility based on "FOC", absolute values')

  a <- kissmigAccess(k, rel=TRUE)
  plot(a*landMask, asp=1.0, main='accessibility based on "FOC", relative values')

  k <- kissmig(o, s, it=150, type='LOC')
  plot(k*landMask, asp=1.0, main='last iteration step of occurrence')
  
  k <- kissmig(o, s, it=150, type='NOC')
  plot(k*landMask, asp=1.0, main='number of iteration steps with occurrences')

# 通过设置随机数生成器的种子来控制随机性

  ## 带有随机差异的两次运行
  k1 <- kissmig(o, s, it=150, type='DIS') # two final distributions with
  k2 <- kissmig(o, s, it=150, type='DIS') # same origin and suitability map
  plot(k1*landMask, asp=1.0, main='final distribution - no seed set - run1')
  plot(k2*landMask, asp=1.0, main='final distribution - no seed set - run2')
  plot(abs(k1-k2)*landMask, asp=1.0, main='differences between run1 and run2')

  ## 无随机差异的两次运行
  k1 <- kissmig(o, s, it=150, type='DIS', seed=123)
  k2 <- kissmig(o, s, it=150, type='DIS', seed=123)
  plot(k1*landMask, asp=1.0, main='final distribution - seed set - run1')
  plot(k2*landMask, asp=1.0, main='final distribution - same seed set - run2')
  plot(abs(k1-k2)*landMask, asp=1.0, main='stochastic differences between run1 and run2')

# 创建一个“签名”结果;符号返回最终的分布'DIS'
# creating a "signed" result; the sign returns the final distribution 'DIS' 
# 如果与“FOC”、“LOC”或“NOC”组合;用于运行时优化
# if combined with 'FOC', 'LOC', or 'NOC'; useful for runtime optimization

  k <- kissmig(o, s, it=200, type='FOC', signed=TRUE)
  plot(    k   *landMask, asp=1.0, main='signed "FOC" call, raw output')

  ## abs(k): same as type='FOC' and signed=FALSE
  plot(abs(k  )*landMask, asp=1.0, main='signed "FOC" call, absolute values')

  ## extract information of final distribution 'DIS'
  plot(abs(k>0)*landMask, asp=1.0, main='signed "FOC" call, values > 0')

  ## extract cells with 'FOC' information but no final occurrences
  plot(abs(k<0)*landMask, asp=1.0, main='signed "FOC" call, values < 0')

# --------------------------------------------------------------------------
# 使用kissmig对许多适宜性地图进行运算; 对观察变化很有用 
# 物种传播或繁殖期间的环境

  ## 单一适宜性地图显示气候变暖
  s1 <- kissmigDummyS(mean=12,sd=3) # 
  s2 <- kissmigDummyS(mean=11,sd=3) # 模拟一种气候变暖的适宜性s1
  s3 <- kissmigDummyS(mean=10,sd=3) # 模拟一种气候变暖的适宜性s2

  ## 构建具有或不具有适应性变化的堆栈数据
  s111 <- stack(s1,s1,s1) # or brick(s1,s1,s1)
  s123 <- stack(s1,s2,s3) # or brick(s1,s2,s3)

  ## 运行kissmig，无论环境是否改变
  k111 <- kissmig(o, s111, it=50, type='NOC', seed=123) # 3层=150次迭代
  k123 <- kissmig(o, s123, it=50, type='NOC', seed=123) #总步骤

  ## kissmig results
  plot(k111*landMask, asp=1.0, main='3 suitability layers, 50 it./layer, no env. changes')
  plot(k123*landMask, asp=1.0, main='3 suitability layers, 50 it./layer, climate warming')

  ## 在传播过程中随着气候变暖而变化
  gain <- (k123 >0) * (k111==0)
  same <- (k123 >0) * (k111 >0)
  loss <- (k123==0) * (k111 >0)
  plot((loss+(2*same)+(3*gain))*landMask, 
    col=colorRampPalette(c(grey(0.9),"red",grey(0.6),"green"))(4), asp=1.0, 
    main='differences between spread patterns under climate warming', legend=FALSE)
  legend("topleft", legend=c('gain', 'no change', 'loss'), pch=15, col=c('green', grey(0.6), 'red') )

kissmig2

# 探索地理空间
# example: 传播模式的大小=100:来自不同起源
# remark: 运行得慢主要是因为绘制地图较慢

  # 适宜性图和结果图
  par(mfrow=c(1,1))
  s <- kissmigDummyS(mean=14,sd=3)
  plot(s, asp=1.0, main='simple climate suitability map')
  res <- s>9999 # a map to store results

  # 探索空间
  par(mfrow=c(2,1))
  for (y in (15:17)*2.5) {
    for (x in (-4:11)*2.5) {
      
      # 定义起源并运行kissmig
      o <- kissmigOrigin(s, x, y, 2.5)
      k <- kissmig(o, s, type='FOC', it=100)
      
      # 保存扩散模式的大小(定植细胞的数量)  
      # 在单元格或结果映射(res)的原始数据(origin)中
      values(res)[values(o)==1] <- sum(values(k)>0)

      # 除了面积，这里可以添加任何评估过程，例如SDM框架中的可访问性性能
      
      # 传播模式与结果图
      plot(k*landMask, asp=1.0, main='spread pattern from single region')
      plot(res*landMask, asp=1.0, main='size of spread pattern (number of colonized cells)')
      
    }
  }

# EOF - end of file

Taxonomic data processing based on R

Introduction

The most important aspect of macroecology is data collection and processing. We often need to obtain the traits and habitat environment of various species from databases and literature. However, an important prerequisite for conducting such research is the filter species lists because the species dataset we use may include a large number of synonyms, variants, or subspecies. Belowing two software packages can be used for species list verification.

Introduction
SP2000 package
- Install SP2000
- Query species data
TPL package
- Download TPL from github
- Query species data

SP2000 package

Install SP2000

SP2000 is a package developed by Yunnan University based on the Species 2000 China Node (a database that records the proper names, synonyms, and Chinese names of almost all species in China). To use this software package, you must first go to SP2000 to rigister，then you can get a free API Key that can query 2000 queries per day. You can use various functions of the SP2000 package after obtaining this API Key.

# install
install.packages("SP2000")
# load it
library ("SP2000")
# Need API Key
set_search_key("...", db = "sp2000")

How to query： Liuyong Ding, Hao Li, Juan Tao, Jinlong Zhang, Minrui Huang, Ke Yang, Jun Wang, Chengzhi Ding, Daming He. SP2000: An open-sourced R package for querying the Catalogue of Life [J]. Biodiv Sci, 2021, 29(1): 118-122.

Query species data

# based on family
familyid <- search_family_id(query = "Anguillidae")
str(familyid)
# Next, the species ID can be obtained through the family ID
taxonid1 <- search_taxon_id(query = familyid$Anguillidae$data$record_id,name = "familyID")
# You can also search for detailed species information through scientific names
search_checklist(query = "Pinus yunnanensis")
# Obtain a global dataset
globaldata <- get_col_global(query = "Pinus yunnanensis", option = "name")
# Obtain a synonyms
find_synonyms("Pinus yunnanensis")
# Obtain species in redlist
get_redlist_china(query = "Pinus", option = "Scientific Names")

TPL package

The TPL package verifies the scientific names of species based on the Plant List 1.1 database. We can also manually download other databases and replace the original data in the TPLdata package to achieve other results.

Download TPL from github

# install from github
install.packages("devtools")
install.packages("shiny")
library("devtools")
# install relative data
install_github("gustavobio/tpldata")
install_github("gustavobio/tpl")
library(tpl)
library(tpldata)

Query species data

# based on scientific name
tpl.get("Miconia albicans")
#             id          family   genus  species
# 1 tro-20300135 Melastomataceae Miconia albicans
#   infraspecific.rank infraspecific.epithet   authorship
# 1                                          (Sw.) Steud.
#   taxonomic.status.in.tpl confidence.level source accepted.id
# 1                Accepted                M    TRO            
#               name note  original.search
# 1 Miconia albicans      Miconia albicans
# Function to remove strange spaces in a specific name
sp <- trim("Miconia albicans                                          (Sw.) Steud.")
sp
# [1] "Miconia albicans (Sw.) Steud."
# Delete the named person of the species
noauthors(sp)
# [1] "Miconia albicans"
# they have a local website
web.tpl()

Twitter Facebook LinkedIn

Yuheng Chen (Chen)

Introduction to R

Biodiversity

Phylogenetic Diversity

Indicator about phylogenetic diversity

null model of phylogenetic diversity

Biodiversity and ecosystem functions

The impact of land composition and species richness on biomass

The complementarity and selection of biodiversity

Dimensions of Biodiversity and Its Productivity

Using Spectral Data to Investigate Biodiversity

Get band spectrum

Ecological indicators based on remote sensing

Connect plot, traits and environment

RQL analysis

Data exploring

Multiple Generalized Linear Regression (mGLM)

multipleSDM

Forth corner analysis

Variable selection (based on LASSO)

Biogeography

Diversity and distributions

Data preparation and geographic visualization

Range of occurance

Based on the minimum convex polygon (Convex hull)

Boundary based on dynamic Alpha

Biodiversity gradients (based on boundary)

Scale dependence problem

Specie distribution model

Spatial autocorrelation

Species distribution and niche theory

Fitting and prediction of ecological niche model

Bioclim model

Gower model

Support vector machine (SVM)

Generalized Linear Model (GLM)

biomod2 package

Prepare biomod2

Single variable model fitting and prediction

Multiple variables model fitting and prediction

Species migration model: KissMig

KissMig package

kissmig()

kissmigAccess()

Start using KissMig

Taxonomic data processing based on R

Introduction

SP2000 package

Install SP2000

Query species data

TPL package

Download TPL from github

Query species data

Limburg Day Trip Proposal

Future travel plans

One sentence for one city

Python in E…everywhere