11/07/2025 | Press release | Distributed by Public on 11/07/2025 09:43
This report presents protocols developed for modelling vector habitat suitability using presence‐absence data derived from sources such as VectorNet and GBIF. These datasets, while extensive, often suffer from spatial gaps and lack of absence data. To address this, the report outlines a comprehensive workflow involving the generation of pseudo‐absences and the use of environmental unsuitability layers. Covariate datasets - including climatic, land cover, and topographic variables - are used to train machine learning models such as Random Forest and Boosted Regression Trees. Models are independently run and ensembled to improve predictive robustness. Emphasis is placed on automation using R's tidymodels framework, enabling reproducible and scalable modelling pipelines. The protocols include detailed steps for data acquisition, covariate extraction, model training, evaluation, and ensemble generation. Expert validation is incorporated to ensure ecological realism and methodological rigour. The goal is to produce spatially explicit habitat suitability maps at resolutions of 1 to 5 km, suitable for surveillance planning and risk assessment. This standardised approach allows for consistent modelling across diverse vector species and geographical contexts, forming a reference methodology for future VectorNet outputs and related public health applications.