Identifies and optionally corrects misspelled taxonomic names using
suggestions from a prior taxon_validate report, or by
running taxon_validate internally if no report is provided.
Names flagged as misspellings or phantoms with available suggestions are
reported and optionally applied; names with no suggestion are flagged for
manual review. When corrections are applied, genus columns detected by
taxon_column are updated automatically from the corrected
binomial. A spellcheck report is attached to the result as an attribute.
Usage
taxon_spellcheck(
data,
column,
source = "both",
update = FALSE,
parallel = FALSE,
max_synonym_depth = 3,
validation_report = NULL
)Arguments
- data
A data frame.
- column
Column name or
c()of column names to check, supplied either unquoted (species) or quoted ("species").- source
Character. Taxonomic reference source passed to the internal
taxon_validatecall ifvalidation_reportisNULL. One of"both"(default),"gbif", or"itis".- update
Logical. If
TRUE, confirmed corrections are applied to each column in place for names with status"misspelling"or"phantom"that have a non-NAsuggestion. Genus columns detected bytaxon_columnare updated automatically for corrected rows by deriving the genus from the first word of the corrected binomial. Default isFALSE.- parallel
Logical. If
TRUE, passes parallel processing to the internaltaxon_validatecall. Default isFALSE.- max_synonym_depth
Integer. Maximum synonym redirect steps passed to the internal
taxon_validatecall. Default is3.- validation_report
Optional. A validation report tibble from a prior
taxon_validatecall (i.e.attr(result, "validation_report")). IfNULL(default),taxon_validateis run internally on the first column incolumnand its report is used.
Value
The input data frame, with names in column corrected to their
canonical form (authorship stripped) where update = TRUE and
corrections were available. A spellcheck report tibble is attached as
attr(result, "spellcheck_report") with columns:
columnName of the column checked.
originalThe original name as it appeared in the data.
suggestionThe suggested canonical correction (authorship stripped), or
NAif no suggestion is available.confidenceNAin the current implementation; reserved for future use.sourceSource of the suggestion (
"taxon_validate"), orNAfor names requiring manual review.nNumber of rows containing the original name.
statusOne of
"misspelling"(suggestion available),"phantom"(name lacks authorship or publication data with a suggestion), or"unmatched"(no match found in any source).
Only names with issues appear in the report. Names confirmed as valid are not included.
Details
When validation_report is NULL, taxon_validate
is run internally on the first column in column only. Passing a
pre-computed report via attr(validated, "validation_report") avoids
redundant API calls when taxon_validate has already been run.
Corrections are matched using match() on canonical names
(authorship stripped from both the input column and the suggestion before
comparison). Corrected values are written as canonical names without
authorship; use taxon_cite to append authorship after
correction.
When update = TRUE, genus columns detected by
taxon_column are updated for corrected rows by extracting
the first word of the corrected binomial. This only fires for rows
containing valid binomial names and skips the source column itself.
Names with status "unmatched" or phantoms without a suggestion are
listed separately in the console output for manual review and appear in
the report with NA in the suggestion column.
Note
When validation_report is NULL, this function calls
taxon_validate internally, which queries GBIF and/or ITIS
web services and requires an active internet connection with reliable
access to those servers. To avoid network dependency, run
taxon_validate separately on a stable connection first and
pass the result via attr(result, "validation_report") to avoid
repeated API calls.
Connectivity can be tested before running spellcheck:
# Test ITIS connectivity
taxize::get_tsn("Homo sapiens", accepted = FALSE, verbose = TRUE,
messages = TRUE, ask = FALSE)
# Test GBIF connectivity
rgbif::name_backbone(name = "Homo sapiens", strict = TRUE)See also
taxon_validate for the underlying validation and synonym
resolution used to generate correction suggestions,
taxon_cleaner for standardising name formatting before
spellchecking,
taxon_column for detecting genus columns updated
automatically when update = TRUE,
taxon_add for appending higher taxonomic rank columns after
spellchecking,
taxon_cite for appending authorship after corrections are
applied.
Examples
df <- data.frame(
species = c("Homo sapiens", "Panthera leo", "Canis lupus")
)
# \donttest{
if (requireNamespace("rgbif", quietly = TRUE) &&
requireNamespace("taxize", quietly = TRUE)) {
# Check spelling and report suggestions without applying corrections
taxon_spellcheck(df, column = species)
# Apply confirmed corrections to the column
taxon_spellcheck(df, column = species, update = TRUE)
# Pass a pre-computed validation report to avoid re-running taxon_validate
validated <- taxon_validate(df, column = species)
taxon_spellcheck(df, column = species,
validation_report = attr(validated, "validation_report"))
# Inspect the spellcheck report
result <- taxon_spellcheck(df, column = species)
attr(result, "spellcheck_report")
# Check multiple columns at once
df2 <- data.frame(
species = c("Homo sapiens", "Panthera leo"),
genus = c("Homo", "Panthara")
)
taxon_spellcheck(df2, column = c(species, genus))
}
#> [taxon_spellcheck] no validation_report provided -- running taxon_validate internally
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_spellcheck] taxon_validate complete -- applying corrections
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_spellcheck] no validation_report provided -- running taxon_validate internally
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_spellcheck] taxon_validate complete -- applying corrections
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_spellcheck] no validation_report provided -- running taxon_validate internally
#> [taxon_validate] column 'species' detected rank: species -- 3 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (3 valid name(s))
#> [taxon_validate] ITIS: 1 / 3
#> [taxon_validate] ITIS: 3 / 3
#> [taxon_validate] ITIS: 3 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (3 resolved names)
#> [taxon_spellcheck] taxon_validate complete -- applying corrections
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_spellcheck] no validation_report provided -- running taxon_validate internally
#> [taxon_validate] column 'species' detected rank: species -- 2 unique name(s) to process
#> [taxon_validate] pass 1: ITIS strict + synonym (2 valid name(s))
#> [taxon_validate] ITIS: 1 / 2
#> [taxon_validate] ITIS: 2 / 2
#> [taxon_validate] ITIS: 2 strict, 0 synonym, 0 unmatched
#> [taxon_validate] pass 5: authorship lookup (2 resolved names)
#> [taxon_spellcheck] taxon_validate complete -- applying corrections
#> [taxon_spellcheck] no issues found for column 'species'
#> [taxon_spellcheck] no issues found for column 'genus'
#> species genus
#> 1 Homo sapiens Homo
#> 2 Panthera leo Panthara
# }
