Infers the taxonomic rank (e.g. species, genus,
family) of one or more columns based on column name pattern
matching. Returns a named character vector of detected ranks, one per
input column. Useful for verifying rank assignments before calling
taxon_sort or taxon_add.
Arguments
- data
A data frame.
- columns
Column name or
c()of column names to check, supplied either unquoted (genus) or quoted ("genus").
Value
A named character vector the same length as columns, where names
are the input column names and values are the detected rank as a lowercase
string (e.g. "family", "genus"). Returns NA for any
column whose name does not match a recognised taxonomic rank pattern.
Details
Detection uses a two-tier approach applied to the lowercased column name:
Strong match — the column name contains a full taxonomic keyword:
scientificname,species,genus,family,order,class,phylum,kingdom, ortaxon. Strong patterns are checked first and take priority.Weak match — for columns not assigned by a strong match, substrings of length 3–5 derived from the strong keywords are checked. The first matching keyword is assigned.
Detection is based on column names only — column values are not inspected.
For content-based detection across all columns in a data frame, use
taxon_column instead.
See also
taxon_column for detecting taxonomic columns across an
entire data frame using both name and content patterns,
taxon_sort for sorting columns into standard taxonomic rank
order,
taxon_add for appending higher taxonomic rank columns,
taxon_validate for validation, which uses this function
internally to detect column rank.
Examples
df <- data.frame(
genus = character(),
family_name = character(),
my_order = character(),
site = character()
)
# Detect rank of a single column
taxon_rank(df, genus)
#> genus
#> "genus"
# Detect ranks of multiple columns
taxon_rank(df, c(genus, family_name, my_order))
#> genus family_name my_order
#> "genus" "family" "order"
# NA returned for columns with no recognisable rank pattern
taxon_rank(df, c(genus, site))
#> genus site
#> "genus" NA
