Bloodhound data are exposed as csv or JSON-LD documents on publicly shared profile pages. Individual occurrence records are exposed as JSON-LD documents. Attributions and claims are shared as frictionless data packages for every dataset.
User Data from a Public Profile
Where 0000-0001-7618-5230 is a user's ORCID identifier, https://orcid.org/0000-0001-7618-5230
Where Q1035 is a user's Wikidata Q number, https://www.wikidata.org/wiki/Q1035
Datasets as Frictionless Data Packages
Where 74f92761-3a24-4d85-9bfb-00b1fee0119d is a GBIF dataset identifier, https://gbif.org/dataset/74f92761-3a24-4d85-9bfb-00b1fee0119d
List of Public Profiles
All Claims and Attributions (Public Profiles)
Where the above gzipped csv includes a header, "Subject, Predicate, Object" and rows are expressed as, "https://gbif.org/occurrence/1801358422, http://rs.tdwg.org/dwc/iri/identifiedBy, https://orcid.org/0000-0001-9008-0611"
The above gzipped folder includes 4 csv files each with a header, "agent_id, family, given, recordings_count, recordings_year_range, recordings_institutionCode, recordings_countryCode, determinations_count", produced using https://doi.org/10.15468/dl.ejwnnv where "recordings_count" is the total number of collected specimens, "recordings_institutionCode" is a pipe-delimited list of institutionCode entries, and "recordings_countryCode" is a pipe-delimited list of ISO Alpha-2 country codes. The column "agent_id" is a local identifier and is not persistent.
Unverified, Unauthenticated Agents
The above gzipped csv includes a header, "agents, gbifIDs_recordedBy, gbifIDs_identifiedBy", was constructed from https://doi.org/10.15468/dl.rohj3n using a Scala / Apache Spark script where the gbifIDs_recordedBy and gbifIDs_identifiedBy columns each contain an array of GBIF IDs. The "agents" column is as presented on GBIF and will require additional parsing.
The MIT-licensed code is available on GitHub. Technologies at play include Apache Spark to group occurrence records by raw entries in recordedBy and identifiedBy and to import into MySQL, Neo4j to store the scores between similarly structured people names, Elasticsearch to aid in the searching of people names once parsed and cleaned, Redis to coordinate the processing queues, and Sinatra/ruby for the application layer. A stand-alone ruby gem, dwc_agent may be used to parse people names and additionally score them for structural similarity.