How it Works

This application is developed and maintained by David P. Shorthouse using specimen data periodically downloaded from the Global Biodiversity Information Facility (GBIF) and authentication provided by ORCID. It was launched in August 2018 as a submission to the annual Ebbe Nielsen Challenge. Since then, wikidata identifiers were integrated to capture the names, birth, and dates of death for deceased biologists to help maximize downstream data integration and engagement.

Inner Workings

The approximately 194M specimen records processed in this project have content in their recordedBy (collector) or identifiedBy Darwin Core fields and are refreshed every few weeks. Names of collectors and determiners are parsed and cleaned using the test-driven dwc_agent ruby gem available for free integration in other projects. Similarity of people names is scored using a graph theory method outlined by R.D.M. Page and incorporated as a method in the dwc_agent gem. These scores are used to help expand the search for candidate specimens, presented in order of greatest to least probable. If you declared alternate names in your ORCID account such as a maiden name or if aliases are mentioned in wikidata profiles, these are used to search for candidate specimen records. Processing this large number of specimen records is an intensive though repeatable process using MIT-licensed, open source code.

Specimens used in the analyses for published papers are found by downloading cited data packages (less than 100MB zipped) that GBIF serves on behalf of the research community and then reattributed to collectors or determiners. The outcome is comparable to the common practice in taxonomy whereby authors of new species descriptions acknowledge the critical role collectors play in museum and biodiversity science.

Citation & Archival

From the settings panel in your account, you may connect with Zenodo in two clicks using your ORCID credentials. Once you make this set-it-and-forget-it connection, Bloodhound pushes your specimen data into this industry-recognized, stable, longterm archive and mints a new DataCite DOI. Your Zenodo token is cached in Bloodhound and every week on your behalf, a new version of your specimen data is pushed to the archive when you make new claims. You will also receive a DataCite DOI badge on your Bloodhound profile page and a formatted citation for your professional resume. The versioned data packages stored in Zenodo each consist of a csv file and a JSON-LD document, preparing the way for future Linked Data integrations. If you accept DataCite as a trusted organization in your ORCID account, you will receive a new formatted work entry there for your specimen dataset.