Las bases de datos de ciencia ciudadana y profesionales poseen información diferente y complementaria sobre la avifauna. Galván, S., Barrientos, R. & Varela, S. 2021 Ardeola. doi: 10.13157/arla.69.1.2022 VIEW

Are volunteer-based biodiversity data better or worse than that of professionals? Although the sub-title gives a clue, you can join me along this work in where we try to answer this important question.

In recent years, the gathering of biodiversity data has shifted from scientists to other bird enthusiasts. Nowadays, most of the information on current distribution and abundance of birds are collected by amateur or volunteer birdwatchers. Specifically, the so-called ‘citizen scienceis a powerful tool to collect big data on species occurrences across time and space (Figure 1). However, concerns have been raised about potential biases of these datasets.


Figure 1 Citizen science birdwatchers © Rafael Barrientos.

In our latest paper, we analysed six Spanish bird databases: three professionals (Museo Nacional de Ciencias Naturales, MNCN; Estación Biológica de Doñana, EBD and Inventario Español de Especies Terrestres, IEET), two based on citizen science projects (eBird and Proyecto AVIS) and one mixed (Sociedad Española de Ornitología ringing database, SEO). These databases are freely available in GBIF and repositories, so everyone can use them for different purposes. We evaluated the databases to compare their taxonomic, temporal and spatial coverage. Then, we tested whether bird records differ between databases on the basis of species’ ranges (percentage of occupancy in the Iberian Peninsula), body sizes (weight and wingspan), breeding habitats (considering farmland, agroforest, forest, scrubland, wetland and cliffs) or conservation status (if they are threatened, or not).

We found an important difference in the taxonomic and temporal coverage between databases. MNCN and EBD museum collections date back to the early 1900s and have not been updated, whereas citizen science databases are the most modern and sometimes the only source of information about current species occurrence and abundance. The taxonomic uniqueness of these museum collections was also detected (Figure 2), which has been related to a narrower geographical coverage of their records. However, despite their availability on public repositories such as GBIF, important information such as the coordinates are not included, which prevent them being used in numerous scientific studies. Our results help to understand the potential limitations of these datasets for answering some scientific question (for example, those related with species’ spatial distributions).

Figure 2 Taxonomic coverage shared between databases. The central figure in each circle represents the number of species shared between the databases indicated in parenthesis.

Spatial analysis of the data from citizen science and mixed databases also showed concentration of records in some Spanish regions. These findings agree with previous studies showing a prevalence of birdwatching in urban, high human population density or protected areas. Also, this agrees with results showing that most species in all these databases select agroforest systems as breeding habitats. These results suggest that these areas may have been better explored due to their greater accessibility. Thus, these potential biases in the coverage of open-access data coverage also need to be considered when using it for research.

Regarding species traits, no clear differences were found between professional and citizen databases (Fig. 3). Bird body size is a key predictor of the number of species observations in all databases: bigger species are more frequently recorded except in the SEO database, where the opposite occurs. This can be explained by the most widespread sampling method used in this database (mist nests), which biases towards smaller species. On the other side, the importance of geographical coverage is also relevant in all databases (species are more frequently recorded because they are common or abundant), as well as the tendency to record a similar number of endangered species. The exception to this general trend was the EBD database, with a large proportion of big, endangered, narrow-ranged and aquatic birds (Fig. 3). This database is managed by a research centre on a National Park, with wetlands as one of its main biotopes, explaining the higher representation of these species.

Figure 3 Boxplots for (A) geographical distribution (%), (B) weight (grams) (log-transformed). Professional databases in black, citizen databases in light grey and mixed database in dark grey.

In summary, and in answer to the initial question, our results indicate that none of the databases are perfect. We have detected potential biases related to the spatial and temporal coverage of records and to species traits for all databases. Thus, depending on our research question, some databases would be more appropriate than others; although none of this raw information can be used as a realistic sample to understand actual biodiversity patterns. For every data source, whether based on professional or amateur sources, it is necessary to consider its history, sampling approaches and potential biases. Although solutions to mitigate these biases have been proposed (for instance, stratified sub-samplings, standardized protocols, filters or other statistical tools), we propose the need to develop new analytical methods that allow us to combine heterogeneous sources of biodiversity data that are already available for research.

Immersed in this new big data era, we will need to combine useful data collected by both citizen scientists and professional ornithologists, and to overcome their associated biases, to build adequate sets to answer our questions.

Image credit

Top right: Birdwatching © Cirenia Arias.

If you want to write about your research in #theBOUblog, then please see here.