Cross posted from Northwestern
A new award to Northwestern University Feinberg School of Medicine and the European Organization for Nuclear Research (CERN) will enhance capabilities of data management and sharing for National Institutes of Health-funded researchers through the Generalist Repository Ecosystem Initiative (GREI), led by the NIH Office of Data Science Strategy.
This modernization of the data ecosystem aligns with the NIH Strategic Plan for Data Science and includes search and discovery of NIH-funded data in generalist repositories. The GREI establishes a common set of cohesive and consistent capabilities, services, metrics, and social infrastructure across repositories, and facilitates the adoption of FAIR principles to better share and reuse data.
Zenodo joins the GREI through a partnership between Northwestern University and CERN, led by Kristi Holmes, PhD, director of Galter Health Sciences Library and Learning Center and professor of Preventive Medicine in the Division of Health and Biomedical Informatics, and Tim Smith, PhD, head of IT Communication, Education and Outreach at CERN. The Zenodo GREI team features expertise and leadership from both sites, including Jose Benito Gonzalez Lopez, PhD, head of Institutional Repositories at CERN; Lars Holm Nielsen, InvenioRDM product manager at CERN; Matthew Carson, PhD, senior data scientist and head of Digital Systems at Galter Library; and Sara Gonzales, senior data librarian at Galter Library and community manager for InvenioRDM. Additional team members will be recruited in the coming months.
Since its launch almost 10 years ago, Zenodo has served as an open, dependable home for science, enabling researchers to share and preserve a wide range of interdisciplinary research outputs. Zenodo was established through the European Commission OpenAIRE program and is operated by CERN. Zenodo houses over 2 million records and a petabyte of data, serving 15 million user visits from around the world annually.
Over the past several years, CERN and Northwestern have partnered with the Invenio Open Source Community (IOSC) to develop InvenioRDM, a turnkey, scalable, and top-of-the-class user experience software for repositories, forming a strong and sustainable foundation for Zenodo. The InvenioRDM software is dedicated to offering a reliable environment for science, empowering preservation, credit, discovery, and sharing while maintaining integrity in its responsiveness to the evolving needs of the research community, including data sharing policy compliance.
“Our strong and efficient partnership with Northwestern through the InvenioRDM project has shown how effective we can be with our complementary skills and common goals,” Smith said. “The GREI allows us to take this partnership to the next level in delivering a useful service to NIH-funded researchers. We are excited that the NIH is supporting us and entrusting us with this task.”
The NIH Office of Data Science Strategy, formed in 2018 within the Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI), leads implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that comprise NIH. DPCPSI also plans and coordinates the NIH Common Fund’s support of trans-NIH initiatives and research.
“Modern research requires collaboration and thoughtful, feature-rich technology for success,” Holmes said. “We’re thrilled to build on our longstanding partnership with CERN to advance our shared commitment to FAIR practices and we look forward to working together and with the GREI partners to achieve the goals of the program.”
The Zenodo GREI project is supported by the NIH Office of Data Science Strategy/Office of the NIH Director pursuant to OTA-21-009, “Generalist Repository Ecosystem Initiative (GREI)” through Other Transactions Agreement (OTA) Number 1 OT2 DB000013-01.
Situated within the Northwestern University Clinical and Translational Sciences (NUCATS) Institute, Galter Library is the only library embedded within a CTSA hub. NUCATS is supported, in part, by the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant Number UL1TR001422.
Zenodo's vision is to enable researchers around the world to share and preserve any research output from any discipline via a seamless user experience. The same features that make it easy for any researcher to share and preserve their research, as a side effect also make it easy for spammers to misuse our service.
As Zenodo grew in popularity, our spam problem grew as well. We firmly believe in the need to make sharing and preserving research data as easy as possible, and thus we have always opted against introducing factors blocking researchers' ability to share and preserve their research instantly. So far, we have been fighting spammers with automated classification systems and manual reviews, yet with every counter-measure we've taken, spammers have adapted their methods.
Today, we're introducing yet another counter-measure to fight spammers. Content from new users will, as of today, be ranked below content from safelisted users. This means that spam will be less visible in all search results, allowing our automated classification system more time to catch the spam. In addition, we will be introducing a human review of all new users uploading content to Zenodo that will allow us to safelist new users and catch spammers. The human review is in progress of being introduced as part of our support operations, and we will also go through the backlog of existing users to safelist them. We have seeded the initial safelist with all users who logged in via ORCiD and GitHub, and users with existing uploads accepted in communities.
All in all, if you're an existing Zenodo user and didn't login via ORCiD or GitHub, or had any of your uploads accepted in a community, your records will appear at the bottom of search results. We will be working on safelisting all existing users as fast as possible.
If you're a new Zenodo user, your uploads will also appear at the bottom of search results until our manual review has safelisted you. We plan safelisting new users at least once a day during business days, but until we have worked through the backlog of existing users there might be a longer delay.
The new feature in no way limits you ability to share and preserve research results. You can still upload your data, software and publications to Zenodo, and get a DOI instantly. The new measures only make sure that spam that makes it past our automated classification system is much less visible in search results, until a human review can catch the spammer.
Bionomia, launched August 2018 with the aim of linking natural history specimen records to people who collected them and/or identified them to species. The two main goals are to:
(1) give credit to and improve the visibility of people who have contributed to the world’s natural history collections (see Thessen et al., 2019; McDade et al., 2011); and,
(2) encourage natural history collections data managers to incorporate these new digital annotations into their source data warehouses, which completes a round-trip of high-quality, curated annotations.
Zenodo is a key piece of infrastructure to help realize the first goal, especially for graduate students and early career researchers who desire a breadth of ways to illustrate their expertise and impact.
Bionomia uses data that is produced by the world’s natural history collections and subsequently shared with the Global Biodiversity Information Facility (GBIF). Linking specimen records to people in this data is however challenging because people names are typically expressed as free-text with considerable variability in the ordering of the parts of names, abbreviations, and according to local cultural practices in the used data exchange standard maintained by Biodiversity Information Standards (TDWG). Through Bionomia, authenticated users actively disambiguate these text-based “agent strings” as they are commonly called, into Uniform Resource Identifiers (URIs) from Wikidata or ORCID as declarations of unequivocal person identity. Thus, free-text strings for people are enhanced to become uniquely identifiable and are thus better participants in the exchange of data according to the FAIR principles (Wilkinson et al., 2016).
When a user first logs in to Bionomia via OAuth using their ORCID credentials, they are presented with an interface to claim the records of natural history specimens they collected or identified. Over 175M records are downloaded and refreshed from GBIF into Bionomia every few weeks, then processed and pre-indexed for a pleasing experience, which would ordinarily have been a daunting task for any user. The reason for this refresh cycle is to keep pace with the continuous activities that occur upstream in the world’s museums and collections in which researchers deposit their physical specimens. Each user has a “Settings & Integrations” section in their account in Bionomia where they have the option to archive their data to Zenodo. Behind the scenes, Bionomia makes use of Zenodo’s well-documented REST API for users to authenticate using OAuth – most often using their ORCID credentials once again – and then auto-deposits versioned archives of users’ claimed specimen records as both csv and JSON-LD files. The mechanics of this interaction is made seamless for users (Figures 1–4); they complete the process with a few clicks and no typing or form submission is required of them. Within moments, they have a new freely accessible (Creative Commons Zero v1.0 Universal) entry in Zenodo as a resource type “dataset” along with their ORCID ID clearly indicated, a handful of keywords, a formatted description, a title, a referenceable citation (Figures 5 & 6), and a DataCite DOI. Whenever specimen records are newly attributed to or claimed by users who have enabled this integration between Bionomia and Zenodo, their datasets are automatically constructed anew, pushed to Zenodo, and new versions appear (Figure 7). If a user additionally configures their ORCID profile to accept DataCite as a trusted party, their new dataset entry appears in ORCID soon afterward alongside their publications and affiliations (Figure 8).
To date, 84 users of Bionomia have integrated their account with Zenodo and are collectively archiving 552,659 specimen records. While this may seem like a small number of active integrations, this is a valuable service for each of them. Many researchers were newly introduced to this novel workflow – archiving specimen-based data is an uncommon practice in our domain – and are now appreciative of Zenodo’s mission, user-friendly design, flexibility, and openness. Why not join this growing open science movement, by claiming your specimen data?
As a tropical botanist, a lot of my effort goes into the fundamental aspects of natural history data: the collection, identification and naming of plant specimens. Without these actions we can have no understanding of the natural world around us. However, until now it has been virtually impossible to keep track of these actions, claim them as work and then see how my particular efforts have contributed to other scientists’ research. By using Bionomia to claim the specimens that I have collected and identified in GBIF, for the first-time others can see what I have done and see how I have contributed to our understanding of the natural world. Then, importantly as an early career researcher I can include all this information as a citation & DOI using Zenodo in my CV or on my personal webpage.
Goodwin, Zoë. 2022. Natural history specimens collected and/or identified and deposited. [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3581428
When I started studying insects in Central America, I had the opportunity to collect, process, and identify thousands of specimens in different entomological collections. However, I didn’t have the means or even know how to digitize and mobilize collection data and many of these activities went unnoticed. It wasn’t until I started my research as a PhD student in the US, that I was able to learn about the digitization workflows and the importance of sharing data in global aggregators such as GBIF. At the same time, I was able to learn about Bionomia and I was instantly engaged. Being able to see the details about my work in collections, and see how the specimens are connected to other collectors or taxonomists, became a strong motivation to continue digitizing and sharing data. Moreover, the ability to compile this information via Zenodo and make it citable has been a great way to make my contributions visible through my ORCID researcher profile, also allowing me to keep records of the progress made during my doctoral program.
Orellana, K. Samanta. 2022. Natural history specimens collected and/or identified and deposited. [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3942155
McDade, L.A., D.R. Maddison, R. Guralnick, H.A. Piwowar, M.L. Jameson, K.M. Helgen, P.S. Herendeen, A. Hill, and M.L. Vis. 2011. Biology needs a modern assessment system for professional productivity. BioScience 61(8): 619–625. https://doi.org/10.1525/bio.2011.61.8.8
Thessen, A.E., M. Woodburn, D. Koureas, D. Paul, M. Conlon, D.P. Shorthouse and S. Ramdeen. 2019. Proper attribution for curation and maintenance of research collections: Metadata recommendations of the RDA/TDWG Working Group. Data Science Journal 18(1): 54. http://doi.org/10.5334/dsj-2019-054
Wilkinson, M., M. Dumontier, I. Aalbersberg et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3: 160018. https://doi.org/10.1038/sdata.2016.18