Species distribution models (SDMs) are often calibrated using presence-only datasets plagued with environmental sampling bias, which leads to a decrease of model accuracy. In order to compensate for this bias, it has been suggested that background data (or pseudoabsences) should represent the area that has been sampled. However, spatially-explicit knowledge of sampling effort is rarely available. In multi-species studies, sampling effort has been inferred following the target-group (TG) approach, where aggregated occurrence of TG species informs the selection of background data. However, little is known about the species- specific response to this type of bias correction.
The present study aims at evaluating the impacts of sampling bias and bias correction on SDM performance. To this end, we designed a realistic system of sampling bias and virtual species based on 92 terrestrial mammal species occurring in the Mediterranean basin. We manipulated presence and background data selection to calibrate four SDM types. Unbiased (unbiased presence data) and biased (biased presence data) SDMs were calibrated using randomly distributed background data. We used real and TG-estimated sampling efforts in background selection to correct for sampling bias in presence data.
Overall, environmental sampling bias had a deleterious effect on SDM performance. In addition, bias correction improved model accuracy, and especially when based on spatially-explicit knowledge of sampling effort. However, our results highlight important species-specific variations in susceptibility to sampling bias, which were largely explained by range size: widely-distributed species were most vulnerable to sampling bias and bias correction was even detrimental for narrow-ranging species. Furthermore, spatial discrepancies in SDM predictions suggest that bias correction effectively replaces an underestimation bias with an overestimation bias, particularly in areas of low sampling intensity. Thus, our results call for a better estimation of sampling effort in multispecies system, and cautions the uninformed and automatic application of TG bias correction.