For Collection Data Managers

Background

Bloodhound trainers attribute specimen records to the collectors and determiners represented in your dataset(s) by linking natural history specimen records you publish to GBIF to their Wikidata Q numbers or ORCID IDs. People with ORCID IDs also claim records for the specimens they themselves collected or identified. Wikidata and ORCID identifiers have associated resources and services that are unquestionably useful for collections ranging from disambiguating people names to gauging the impact your collection has on the academic community.

Engaging With Your Community

Bloodhound trainers are a welcoming, international group of enthusiasts who are driven to help you attribute specimen records to the collectors and determiners represented in your dataset(s). They work tirelessly to enhance entries in Wikidata by adding links and attributes like birth and death dates to deceased natural historians. They are also advocates of ORCID and can help you campaign for its adoption at your institution. The easiest way to seek help, guidance, or to appreciate the scope of their activities is to follow @BloodhoundTrack on Twitter. Come thank the trainers for their efforts and to share in conversations that will lead to rewarding new connections.

Data round trip

Incorporating Enhancements

Every few weeks, Bloodhound refreshes a subset of the Darwin Core data you publish to GBIF. See how it works for more details.

Search for your dataset(s) and find the link to a Frictionless Data package. These zipped, UTF-8 encoded relational files are similar to the Darwin Core Archives you produced for GBIF. They differ in that they more efficiently represent many:many relationships. There is also a breadth of open software libraries in many programming languages that read, validate, and process Frictionless Data. You can also extract the zipped package and import the UTF-8 encoded csv files into any spreadsheet software, provided the files are not excessively large.

The packages contain a standard datapackage.json metadata file and three csv files: users.csv, occurrences.csv, and attributions.csv. The datapackage.json metadata file contains a "created" timestamp for when the package was last produced. Regeneration of these packages typically occurs once every few weeks but if you would like a more up-to-date version, please create a ticket. The users.csv file contains a list of unique users that were attributed or have claimed specimen records as their own in your dataset. It also contains their full names, aliases, ORCID IDs or Wikidata Q numbers plus birth and death dates for the latter. The occurrences.csv file contains the subset of Darwin Core fields from your specimen records for which attributions have been made. Finally, the attributions.csv file is a join table for the other two csv files and also contains columns for who made the attribution, their ORCID ID, and a timestamp for when they made the attribution.

Assessing Data Quality

In the set of "Help Others" pages where specimen records are attributed to collectors and determiners, there are tabs to Fix and Visualize records. Here, a collector's birth and death dates are cross-referenced against those on their specimen records. Countries on maps and date ranges on charts can also be clicked to execute dynamic filters. In time and as more attributions are made, data quality reports like these on individuals' specimen records may be rolled-up to dataset-level reports. These may be included as additional csv files in your Frictionless Data packages such that you can investigate and repair problem records in your collection management system as required.

Reconcile

OpenRefine logo
OpenRefine reconciliation endpoint:
https://api.bloodhound-tracker.net/reconcile
Recommended Use

The endpoint works best when there is a single name in a person column. Other columns such as Family collected or identified and/or date collected or identified may be optionally used to help adjust the score of returned results. Dates of birth and death (when known) are cross-referenced against the date column you use. Try out the Bloodhound ID endpoint among others on the Reconciliation service test bench.