Machine Learning

My Reading List: Data Science

Live post with a curated list of high-quality data science posts and videos I found enlightening.

R package collinear

R package for multicollinearity management in data frames with numeric and categorical variables.

R package spatialRF

R package for spatial regression with Random Forest

R package memoria

R package to assess ecological memory in multivariate time-series.

Ecological memory at millennial time‐scales: the importance of data constraints, species longevity and niche features

Paper published in the section "Editor's Choice" of the *Ecography* journal. It received [an award](https://www.dropbox.com/s/oacsy1xqx4omv1b/2019_BMB_Ecography_b_top_downloaded.png?dl=1) for the number of downloads during the 12 months after its publication.

Potential changes in the distribution of Carnegiea gigantea under future scenarios

The goals of this study are to provide a map of actual habitat suitability (1), describe the relationships between abiotic predictors and the saguaro distribution at regional extents (2), and describe the potential effect of climate change on the spatial distribution of the saguaro (3).

Historical anthropogenic footprints in the distribution of threatened plants in China

In this study, for the first time, we linked the distribution of threatened species across China to current and historical changes in human population densities, cropland area, and pasture area since 1700 (at a 100 km × 100 km resolution). We find that variables describing historical changes in human impacts were consistently more strongly associated with proportions of threatened plants than variables describing current changes in human impacts. Notably, threatened plant species in China tend to be concentrated where historical anthropogenic impacts were relatively small, but anthropogenic activities have intensified relatively strongly since 1700.

Supporting underrepresented forests in Mesoamerica

We had three key findings. First, dry forest is the least protected biome in Mesoamerica (4.5% protected), indicating that further action to safeguard this biome is warranted. Secondly, the poor overlap between protected areas and high-value forest conservation areas found herein may provide evidence that the establishment of protected areas may not be fully accounting for tree priority rank map. Third, high percentages of forest cover and high-value forest conservation areas still need to be represented by the protected areas network. Because deforestation rates are still increasing in this region, Mesoamerica needs funding and coordinated action by policy makers, national and local governmental and non-governmental organizations, conservationists and other stakeholders.

Distribution and conservation of the relict interaction between the butterfly Agriades zullichi and its larval foodplant (Androsace vitaliana nevadensis)

Herein we investigate the distribution and conservation problems of a relict interaction in the Sierra Nevada mountains (southern Europe) between the butterfly *Agriades zullichi* —a rare and threatened butterfly— and its larval foodplant *Androsace vitaliana* subsp. *nevadensis*. We designed an intensive field survey to obtain a comprehensive presence dataset. This was used to calibrate species distribution models with absences taken at local and regional extents, analyze the potential distribution, evaluate the influence of environmental factors in different geographical contexts, and evaluate conservation threats for both organisms.

Comparing the performance of species distribution models of Zostera marina: Implications for conservation

A time series of 14-year distribution data of Zostera marina in the Ems estuary (The Netherlands) was used to build different data subsets: (1) total presence area; (2) a conservative estimate of the total presence area, defined as the area which had been occupied during at least 4 years; (3) core area, defined as the area which had been occupied during at least 2/3 of the total period; and (4–6) three random selections of monitoring years. On average, colonized and disappeared areas of the species in the Ems estuary showed remarkably similar transition probabilities of 12.7% and 12.9%, respectively. SDMs based upon machine-learning methods (Boosted Regression Trees and Random Forest) outperformed regression-based methods. Current velocity and wave exposure were the most important variables predicting the species presence for widely distributed data. Depth and sea floor slope were relevant to predict conservative presence area and core area.