Appends authorship and year to one or more taxonomic name columns using
GBIF (preferred) and ITIS (fallback) as reference sources. For each
specified column a new <column>_cite column is appended containing
names in the format "Genus species (Author, Year)". Intended as
the final step in the taxon_validate -> taxon_spellcheck
workflow.
Arguments
- data
A data frame.
- columns
Column name or
c()of column names to append authorship to, supplied either unquoted (species) or quoted ("species").- source
Character. Taxonomic reference source. One of
"both"(default),"gbif", or"itis". When"both", GBIF authorship is preferred and ITIS is used as a fallback when GBIF returns no valid authorship or a malformed result.- drop_na
Logical. If
TRUE, rows withNAin the column are dropped before look-up. Default isFALSE.
Value
The input data frame with one additional character column appended per
entry in columns, named <column>_cite. Rows where
authorship cannot be found retain the original canonical name in the cite
column unchanged. A report tibble is attached as
attr(result, "cite_report") with columns:
columnName of the column processed.
nameCanonical name for which no authorship was found.
nNumber of rows containing that name.
An empty tibble is attached when authorship is found for all names. A console message per column reports the number of names resolved and lists any names without authorship.
Details
Authorship look-up follows the same logic as pass 5 of
taxon_validate:
GBIF
name_backboneis queried withstrict = TRUE.HIGHERRANKresults are accepted only when the canonical name matches the input exactly.Malformed authorship strings starting with a comma or punctuation are rejected and treated as missing.
Synonym chains are followed up to three steps via
name_usage()to reach the accepted name authorship.When GBIF has no valid authorship and
source = "both", ITISitis_getrecordis queried as a fallback.
Authorship is stripped from input values before lookup (parenthetical
suffixes matching \s*\(.*\)\s*$ are removed), so columns
already containing authorship from a prior taxon_validate
call are handled correctly.
Results are memoised for the duration of the session. Only unique
non-NA values are looked up, so performance scales with the number
of distinct names rather than total rows.
Requires rgbif for GBIF lookups, taxize for ITIS lookups, and memoise. Informative errors are raised if required packages are not installed.
Note
This function queries external web services (GBIF via rgbif and/or ITIS via taxize) and requires an active internet connection with reliable access to those servers. Performance on unstable or restricted connections (e.g. public WiFi, VPN, or firewalled networks) may be slow or produce incomplete results. Results are memoised for the duration of the session; running on a stable connection first and retaining the session will avoid repeated API calls for the same names.
Connectivity can be tested before appending authorship:
# Test ITIS connectivity
taxize::get_tsn("Homo sapiens", accepted = FALSE, verbose = TRUE,
messages = TRUE, ask = FALSE)
# Test GBIF connectivity
rgbif::name_backbone(name = "Homo sapiens", strict = TRUE)See also
taxon_validate for resolving synonyms and validating names
before appending authorship,
taxon_spellcheck for correcting misspellings before
appending authorship,
taxon_add for appending higher taxonomic rank columns
alongside authorship,
italicize for formatting cited names for ggplot2
display as the final step in the workflow.
Examples
df <- data.frame(
species = c("Homo sapiens", "Panthera leo", "Canis lupus")
)
# \donttest{
if (requireNamespace("rgbif", quietly = TRUE) &&
requireNamespace("taxize", quietly = TRUE)) {
# Append authorship to a single column
taxon_cite(df, species)
# Append authorship to multiple columns
df2 <- data.frame(
genus = c("Homo", "Panthera"),
species = c("Homo sapiens", "Panthera leo")
)
taxon_cite(df2, c(genus, species))
# Use GBIF only
taxon_cite(df, species, source = "gbif")
# Inspect names where no authorship was found
result <- taxon_cite(df, species)
attr(result, "cite_report")
# Full workflow
df |>
taxon_validate(column = species) |>
taxon_spellcheck(column = species, update = TRUE) |>
taxon_add(column = species, ranks = c(family, order)) |>
taxon_cite(columns = species)
}
#> [taxon_cite] looking up authorship for 3 unique name(s) in 'species'
#> [taxon_cite] species: 1 / 3
#> [taxon_cite] species: 3 / 3
#> [taxon_cite] 'species_cite' appended -- 3 / 3 name(s) with authorship
#> [taxon_cite] looking up authorship for 2 unique name(s) in 'genus'
#> [taxon_cite] genus: 1 / 2
#> [taxon_cite] genus: 2 / 2
#> [taxon_cite] 'genus_cite' appended -- 2 / 2 name(s) with authorship
#> [taxon_cite] looking up authorship for 2 unique name(s) in 'species'
#> [taxon_cite] species: 1 / 2
#> [taxon_cite] species: 2 / 2
#> [taxon_cite] 'species_cite' appended -- 2 / 2 name(s) with authorship
#> [taxon_cite] looking up authorship for 3 unique name(s) in 'species'
#> [taxon_cite] species: 1 / 3
#> [taxon_cite] species: 3 / 3
#> [taxon_cite] 'species_cite' appended -- 3 / 3 name(s) with authorship
#> [taxon_cite] looking up authorship for 3 unique name(s) in 'species'
#> [taxon_cite] species: 1 / 3
#> [taxon_cite] species: 3 / 3
#> [taxon_cite] 'species_cite' appended -- 3 / 3 name(s) with authorship
#> [taxon_spellcheck] no validation_report provided -- running taxon_validate internally
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_spellcheck] taxon_validate complete -- applying corrections
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_add] added column 'family' (3 / 3 values resolved)
#> [taxon_add] added column 'order' (3 / 3 values resolved)
#> [taxon_cite] looking up authorship for 3 unique name(s) in 'species'
#> [taxon_cite] species: 1 / 3
#> [taxon_cite] species: 3 / 3
#> [taxon_cite] 'species_cite' appended -- 3 / 3 name(s) with authorship
#> species family order species_cite
#> 1 Homo sapiens Hominidae Primates Homo sapiens (Linnaeus, 1758)
#> 2 Panthera leo Felidae Carnivora Panthera leo (Linnaeus, 1758)
#> 3 Canis lupus Canidae Carnivora Canis lupus (Linnaeus, 1758)
# }
