Citizen Science in Ecology: can it Compare to Professionally-Gathered Data?

Citizen science is a big deal, these days. From ornithology to oncology, astronomy to entomology, there has been a growing awareness over the last decade that citizen scientists – members of the public who volunteer to help out with scientific research, usually by gathering data – can be a powerful asset to professional researchers. Citizen science can give researchers access to armies of assistants that can generate huge datasets that would otherwise be impossible to acquire. It can help stretch budgets by employing workers who are willing to work for free, and it provides a valuable opportunity for outreach and collaboration between the professional scientific community and the public at large. Using volunteers comes with some caveats though: They have less training, less reliability, and sometimes less motivation than professional researchers. These challenges, if not adequately addressed, often cast doubt on data gathered through citizen science – and in any case create complications for those who want to employ citizen science effectively in their research.

One major step toward resolving these issues is to quantify the differences between datasets gathered by citizen scientists and those gathered by professional researchers, so that professional scientists who are designing studies can make wise choices and perhaps compensate for any deficiencies or biases in their data that may arise through the less-standardized techniques typically employed by citizen scientists. Today’s study tries to get at those questions. It compares a traditional technique for gathering species diversity data with one often employed by citizen science projects, and the results are quite interesting and more than a little unexpected.

Comparing diversity data collected using a protocol designed for volunteers with results from a professional alternative

Holt, B. G., Rioja-Nieto, R., Aaron MacNeil, M., Lupton, J., Rahbek, C. (2013), Comparing diversity data collected using a protocol designed for volunteers with results from a professional alternative. Methods in Ecology and Evolution. doi: 10.1111/2041-210X.12031

A diver participating in a volunteer survey with the Reef Environmental Education Foundation (REEF) in Monterey Ba. credit: Pete Naylor

Today we will be looking at a marine study, in which researchers compared species diversity data collected through traditional means (professional researchers doing belt transects) with data for the same sites collected by citizen scientists (using a protocol called a roving diver transect – more on the two methods later). The reasons why they wanted to make this comparison are summarized in the introduction to the article:

A key aspect relating to the value of volunteer data is the reliability of data returned from the protocols used to collect it. For studies performed by professional scientists, underwater visual survey protocols are often designed to minimize bias, maximize precision and ensure repeatability. Due to logistical limitations, vast sections of the world’s aquatic ecosystems are rarely, or never, surveyed by professional scientists. The large pool of volunteer enthusiasts has potential to substantially augment the census capabilities of professional researchers. For example, over 8,000 surveys were performed worldwide during 2011 alone by one volunteer organization (R.E.E.F. 2012). Protocols designed for volunteers also attempt to standardize survey efforts, but must balance this requirement against the need to maintain the interest of the public. Whether data produced by such protocols are suitable for comparative studies of biological diversity remains unclear.

A constant challenge in marine science is the sheer logistical difficulty of actually getting information about things that are under water. Conducting ecological research underwater adds a tremendous amount of effort, expense, and time to a process that is rarely easy, cheap, or quick even when performed on dry land. In the past this has severely limited the ability of marine ecologists to do what they do, and if they could leverage an enthusiastic community of amateur divers to supplement their ranks then this would be a major coup. However, before marine researchers can make use of data collected by citizen scientists in their studies, they need to know whether that data is going to be as good as the data that they can gather themselves. There are some reasons to believe that it might not be, but there are some possible advantages as well. Let’s head inside and take a more detailed look.

The first thing to realize here is that we are fundamentally comparing two different protocols. The types of data-gathering protocols that work for professionals don’t always translate well to volunteer efforts, and must often be adapted. As I mentioned above, volunteer researchers typically have much less training than professional researchers. They may be less skilled at identifying species, or they may have less experience with the environment where the research is being conducted. (As it turns out this is not a major concern in this study, but it often is.) They may also be less meticulous about following a protocol that relies on strict adherence to very specific guidelines in order to be effective – for instance, the elephant dung transects that I mentioned in my last post can be corrupted if the surveyors stray even a foot from the central transect line, something which can be difficult to impress upon volunteers. And finally, researchers who wish to employ citizen scientists rely on the goodwill and enthusiasm of their volunteer assistants, and so must balance such concerns as the need to perform sampling at a time when the target organism is active with the need to not ask volunteers to turn up at four in the morning for a survey. (Unless it’s an ornithological survey. Birders are crazy.)

Either way, what this boils down to is the fact that sampling protocols which are appropriate for professional researchers are often inappropriate for volunteers. As is mentioned in the abstract, volunteer survey protocols also incorporate attempts to standardize the surveying – but a compromise has to be struck. This study is comparing a protocol commonly used for biodiversity surveying by professional researchers with one that is designed for volunteers, so let’s see what Holt et al have to say about the chosen protocols and the type of data that they want to gather with them. Buckle up, because this is going to be a long chunk of the paper here. There are some technical terms in here, but they’re not complicated and I’ll explain them in a minute.

The techniques chosen for this study represent the most frequently used underwater visual survey methodology in published peer reviewed fish diversity studies (the belt transect) and the Roving Diver Technique (RDT) used by the Reef Environmental Education Foundation (REEF) volunteer fish survey project. … As belt transects are regularly used in professional reef fish diversity studies (Kulbicki et al. 2010), they represent a logical choice with which to compare the performance of the RDT protocol. The extent to which belt transect results are consistent to those produced using RDT protocols is therefore informative regarding the utility of vast amounts of volunteer data that are currently available and collected in the future. The objective of this study is to determine whether the two protocols differ in terms of the α (i.e. within site diversity) and β-diversity (i.e. differentiation between sites) of the communities they record and in their power to detect significant differences in these biodiversity measures between these communities. We also examine how detectability (i.e. probability to detect a species that is present in a surveyed area at the time of survey) varies between protocols, as well variation associated with sites, functional groups, taxonomic groups, survey duration and underwater visibility.

Study design

The study included a total of 144 underwater visual surveys focused on three sites, with a survey site defined by the precise location at which divers entered the water. All sites were close to Long Cay off South Caicos in the Turks & Caicos Islands (Fig. 1). The survey sites were chosen to represent habitats that might be expected to differ in fish diversity. Our study was conducted at sites that appeared to differ in terms of species richness; based on preliminary visual inspection rather than existing survey data to avoid any bias based on similarity of either of our test protocols to protocols used to collect pre-existing data. Site A comprised of primarily bare rock substrata, with very little benthic biota, and was proposed to have low diversity. Site B primarily comprised of sand with abundant soft corals and very low hard coral cover, and was proposed to have intermediate diversity. Site C represented a fairly healthy coral reef site, with relatively high hard coral cover, and was proposed to have high diversity.

Survey methodology

Surveys were completed by two teams of 12 divers over two periods of 2 weeks during the spring and autumn of 2009, with each team responsible for one study period. During each study period, all 12 divers surveyed each of the three sites twice (once using the belt transect protocol, once using the RDT protocol), with the order of surveys alternated among the sites and the protocols used, to address any possible temporal bias in data collection. Sampling effort was identical for each of the two study periods and all data were pooled together (analysis of seasonal trends in these fish communities is not within the scope of this study). For both of the survey protocols tested, dive teams were divided into buddy pairs, with one buddy pair responsible for one survey. Prior to the beginning of the study, all divers completed an intensive fish identification course, which covered over 130 species commonly occurring in the local area. It was rare for surveyors to encounter a species they could not positively identify, and on these occasions, divers took detailed notes on these fish and identified them after returning from the survey trip.

Protocols tested

Belt transect

At each site, a pair of divers conducted three 50 m long transects that were set approximately 50 m apart, parallel to the isobaths [like contour lines on a map, but under water]. For each transect, divers positioned themselves 2.5 m either side of and 2.5 m above a transect line and recorded all fish found within the 5 m wide belt transect. Once the transect line was laid out, divers waited for 1 min to allow the fish to settle before beginning the transect. Divers swam along the transect at a rate of 10 m per minute, therefore taking 5 min to complete each transect. For each species, the total number of individuals seen at each transect was recorded. Data from all three transects completed during one dive were pooled.

Roving diver transect

During these surveys, divers swam throughout a dive site for a period of approximately 45 min and recorded every fish species seen that could be positively identified. The search for fishes began as soon as the diver entered the water. Divers were encouraged to look under ledges and up in the water column. Each recorded species was assigned one of four abundance categories based on how many were seen throughout the dive [single (1); few (2–10); many (11–100) and abundant (> 100)]. For this study, sighting records were used only as presence/absence data, as no diversity metrics are currently available to include such abundance categories. In addition to fish species observations, divers also reported the time, date, bottom time, visibility, average depth, current strength and habitat category for each dive, in accordance with the REEF volunteer fish survey requirements. All RDT survey data were entered into the REEF volunteer survey project database (

Whew, OK. Some summarization is probably in order. The two survey protocols that they used were the belt transect and the roving diver transect. The belt transect is a time-honored survey method in ecology, and it goes a little something like this: You define a straight line through your study site which is the transect line, and get your surveyors to spread out to a pre-determined distance on either side of it. They then walk (or swim, since in this case they were divers) along their paths, maintaining their distance precisely, and note every time their target passes between them. (In this case, their target was any fish and they also noted its species.) You then simply compute the area that they covered (the length of the transect line times the width of the “belt” defined by the two surveyors) and you can start to build a picture of what is present in your study site and at what density. Roving diver transects are a bit less systematic. In an RDT, you basically just let two divers wander around the study site for a pre-determined length of time and find as many different types of fish as they can, noting each species that they find. You don’t have to be a scientist to think of how this could give one a biased picture of what’s present in the study site, but it is hoped that over the course of enough dives those biases will even out.

The other main terms here are the terms that the researchers used to describe the two types of biodiversity that they wanted to measure: α-diversity (alpha diversity) and β-diversity (beta diversity). α-diversity, sometimes also called “species richness”, is pretty much just a measure of how many species are present at a site. β-diversity is the difference in the types of species that are found at two sites – i.e., how many species are found in one place and not the other, a way of seeing how much variation there is from place to place. Speaking of study sites, the researchers had three, all fairly close together. They picked three spots near South Caicos Island, a spot in the Caribbean Ocean, north of Hispanola, which they thought would have different levels of α-diversity and therefore provide a chance to try the protocols in a few different environments, as well as hopefully generating some nice β-diversity data.

Long Beach, South Caicos Island. Can’t imagine why anyone would ever want to do research in a place like that. credit:

They did their sampling (an equal amount for each protocol) and got their data. Holt et al don’t do the best job of presenting the bones of that data in a really easy-to-grasp format, so I’ll just pull out the highlights here. The roving diver transect came up with rarefied species richness totals of 119, 124.1, and 130.2 for sites A, B, and C respectively. (Rarefied species totals are the total number of species that are statistically predicted to be present in an area based on the number that were actually found and the amount of effort that was expended.) The belt transect surveys delivered rarefied 68, totals of 68, 77.7, and  74.3 species for sites A, B, and C respectively. These results may appear quite different, but they’re actually not different in a statistically significant way for this study and so we can conclude that the two methods provide at least roughly similar results, though the roving diver transects do appear more sensitive. As far as assessing α-diversity, both the traditional technique (belt transects) and the citizen science protocol (roving diver transects) appear more or less equally valid.

Things are a bit different where β-diversity is concerned. The belt transect surveys did a considerably better job of detecting differences between the compositions of the sites, finding significant relationships that the roving diver transects missed. In this respect, professionally-executed sampling protocols seem to have the edge over those designed for volunteer surveyors.

This is valuable data to have, as it can help researchers choose their protocols in the future, and provides empirical backing for scientists who might wish to use data gleaned from roving diver transects to infer α-diversity in future studies. It helps to cement the legitimacy of citizen science by clarifying the ways in which volunteer-based research is (and is not) equivalent to research done by professionals. In fact, in one respect – sensitivity – the volunteer-optimized protocol may well have been superior to the standard.

I hope that studies like this will encourage even more scientists to consider including citizen science-based elements in their work. It has long been a personal belief of mine that the scientific enterprise works best when it is conducted in close cooperation with the public, and when the public has access to the scientific process at all stages – from planning and execution to the final published results. The more we learn about how citizen scientists can perform effective research, the more we can strengthen the bonds between the scientific community and the community at large.

What did you think of this study? Do you have questions about the researchers’ methods, or their results? Have you ever participated in a citizen science project yourself, or used one in your research? Let us know in the comments! We’d love to hear from you.

About these ads

One comment on “Citizen Science in Ecology: can it Compare to Professionally-Gathered Data?

  1. Henry Valz says:

    Would LOVE to compare some dives with closed circuit rebreathers. Two scientists from East Carolina University did a study probably 6-7 years ago on the topic, showing a large effect of gear.

Comments are closed.