From bio-ontologies to academic lives: What studying biocuration can tell us about the conditions of academic work

SARAH R. DAVIES

When I arrived at the Käte Hamburger Kolleg in February 2024, my plan was to study bio-ontologies: the systems that are used to categorise and organise biological data. As a Science and Technology Studies (STS) researcher, I had been interested in biocuration for a while, and one key aspect of biocuration work is developing and applying ontologies. Exploring bio-ontologies would, I thought, give me important insights into the practice of biocuration and what it is doing to our understandings of biology, the organisms, and entities that are studied, and ideas about ‘life’ itself.

Profile Image

Sarah R. Davies

Sarah R. Davies is Professor of Technosciences, Materiality, and Digital Cultures at the Department of Science and Technology Studies, University of Vienna, Austria.
Her work explores the intersections between science, technology, and society, with a particular focus on digital tools and spaces.

I am a social scientist, so delving into the nature of bio-ontologies by looking at natural science and philosophy literature about them was something of a departure for me. What I hadn’t necessarily expected was that doing so would bring me back to more sociological questions, in particular regarding the conditions of academic work. In other words, studying bio-ontologies led me to argue that these systems, which are “axioms that form a model of a portion of (a conceptualization) of reality”[1]Bodenreider, Olivier, and Robert Stevens. 2006. “Bio-ontologies: current trends and future directions.” Briefings in Bioinformatics 7 (3): 256–74. https://doi.org/10.1093/bib/bbl027., are connected not just to forms of life in the context of biological entities, but with regard to the researchers who create and use them.

Let me rewind a bit. What is biocuration, and what exactly are bio-ontologies? Biocuration is “the process of identifying, organising, correcting, annotating, standardising, and enriching biological data”. [2]Tang, Y. Amy, Klemens Pichler, Anja Füllgrabe, Jane Lomax, James Malone, Monica C. Munoz-Torres, Drashtti V. Vasant, Eleanor Williams, and Melissa Haendel. 2019. “Ten quick tips for … Continue reading Its “primary role … is to extract knowledge from biological data and convert it into a structured, computable form via manual, semi-automated and automated methods.”[3]Quaglia, Federica, Rama Balakrishnan, Susan M Bello, and Nicole Vasilevsky. 2022. “Conference report: Biocuration 2021 Virtual Conference.” Database 2022 (Januar): baac027. … Continue reading This is largely done in the context of large data- and knowledgebases (such as FlyBase or UniProt), which are now central to the biosciences. Biocurators work to develop and maintain such databases, for example by reading scientific articles and extracting useful information from them, inputting data into databases, adding metadata and annotating information, and – importantly – creating and using the bio-ontologies I have already mentioned. 

Bio-ontologies, then, are a means of classifying and organising biological data. They offer a ‘controlled vocabulary’ (meaning a standardised terminology), but also represent current knowledge about biological entities in that they consist of “a network of related terms, where each term denotes a specific biological phenomenon and is used as a category to classify data relevant to the study of that phenomenon.”[4]Leonelli, Sabina. 2012. “Classificatory Theory in Data-intensive Science: The Case of Open Biomedical Ontologies.” International Studies in the Philosophy of Science 26 (1): 47–65. … Continue reading Bio-ontologies such as the Gene Ontology therefore offer not only a means of accessing knowledge and data, but investigating biological phenomena by creating, as noted on the Gene Ontology’s website, “a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research”.

AI-generated picture of a network by Pixapay.

As I looked into the nature of bio-ontologies, it became clear to me that these organisational systems for biodata are hugely important. They allow researchers in the biosciences to access current knowledge and relevant data (not always easy in the midst of a ‘data deluge’), but they also have epistemic significance. As Sabina Leonelli writes, bio-ontologies “constitute a form of scientific theorizing that has the potential to affect the direction and practice of experimental biology.”[5]Ibid. The development and application of ontologies to biological data thus renders the contemporary biosciences thinkable, capturing the current state of the art and allowing researchers to extrapolate from that. 

Given this significance, it is perhaps somewhat surprising that biocuration, as an area of science, often goes unnoticed by its users and by research funders. As one biocurator told me:

…we are in the background. Even researchers who heavily use these resources [databases], don’t usually know our names and don’t think about us existing. But they love the resource. And that’s actually something we’ve gotten with the booth when we were at conferences. People will come up and be like, oh you are the [resource]! Wow, you are good, awesome. They are kind of shocked that there’s humans there.[6]Davies, Sarah R., and Constantin Holmer. 2024. “Care, collaboration, and service in academic data work: biocuration as ‘academia otherwise.'” Information, Communication & … Continue reading

Biocurators are not only ‘in the background’, they frequently struggle to get sustained funding for their work, and generally need to build careers through a series of temporary contracts. Perhaps because databases are machine-readable and can be queried automatically, both funders and the researchers who use curated resources often seem to imagine that the work of biocuration can be readily carried out through automated means; in practice, while biocurators make use of automated tools such as text-mining, interpreting scientific literature and annotating data is a highly skilled activity that cannot be easily replicated by AI or other technologies.

Why is biocuration so under-valued despite its epistemic importance? One answer is that biocuration does not fit well with current systems of reward and evaluation within academia. Researchers are, for instance, rewarded for publishing frequently and in high-profile journals, but biocurators produce other kinds of outputs to journal articles – the data – and knowledgebases that they work on. Similarly, gaining research funding is typically seen as a sign of a successful academic, but biocurators’ work does not fit well into the categories that funders use to assess research quality (such as novelty). As Ankeny and Leonelli explain:

Value in science (be it of individual researchers or particular research projects) is largely calculated on the basis of the number of publications produced, the quality of the journals in which those publications appeared, and the impact of the publications as measured by citation indices and other measures: given that [data] donation and curation are still largely unrecognized, the value of these activities correspondingly is limited in part because it cannot be measured using traditional metrics.[7]Ankeny, Rachel A., and Sabina Leonelli. 2015. “Valuing Data in Postgenomic Biology:: How Data Donation and Curation Practices Challenge the Scientific Publication System.” In … Continue reading

Studying bio-ontologies thus led me to consider the lives of their creators, and the conditions under which they work. Despite the epistemic significance of biocuration, it escapes recognition under contemporary ways of crediting and rewarding academic work – something which seems to me to be deeply unfair. Perhaps, then, we need to find new ways of valuing, funding, and rewarding the wide variety of epistemic contributions made within research, rather than relying on metrics such as number of publications and citations as the key means of assessing research?


References

References
1Bodenreider, Olivier, and Robert Stevens. 2006. “Bio-ontologies: current trends and future directions.” Briefings in Bioinformatics 7 (3): 256–74. https://doi.org/10.1093/bib/bbl027.
2Tang, Y. Amy, Klemens Pichler, Anja Füllgrabe, Jane Lomax, James Malone, Monica C. Munoz-Torres, Drashtti V. Vasant, Eleanor Williams, and Melissa Haendel. 2019. “Ten quick tips for biocuration.” PLoS Computational Biology 15 (5): e1006906. https://doi.org/10.1371/journal.pcbi.1006906.
3Quaglia, Federica, Rama Balakrishnan, Susan M Bello, and Nicole Vasilevsky. 2022. “Conference report: Biocuration 2021 Virtual Conference.” Database 2022 (Januar): baac027. https://doi.org/10.1093/database/baac027.
4Leonelli, Sabina. 2012. “Classificatory Theory in Data-intensive Science: The Case of Open Biomedical Ontologies.” International Studies in the Philosophy of Science 26 (1): 47–65. https://doi.org/10.1080/02698595.2012.653119.
5Ibid.
6Davies, Sarah R., and Constantin Holmer. 2024. “Care, collaboration, and service in academic data work: biocuration as ‘academia otherwise.'” Information, Communication & Society 27 (4): 683–701. https://doi.org/10.1080/1369118X.2024.2315285.
7Ankeny, Rachel A., and Sabina Leonelli. 2015. “Valuing Data in Postgenomic Biology:: How Data Donation and Curation Practices Challenge the Scientific Publication System.” In Postgenomics: Perspectives on Biology after the Genome, edited by Sarah S. Richardson and Hallam Stevens, 126–49. Duke University Press. https://doi.org/10.1515/9780822375449-008.