The combined Japan Octopodoidea data set (museum) after full
taxonomic cleaning, validation, synonym resolution, rank enrichment,
authorship appending, and italic formatting. Represents the final stage
of the datamuseum workflow and is intended for direct use in
analysis and visualisation.
Format
A data frame with 2,222 rows and 20 variables:
- SciName
Validated scientific name in accepted nomenclature, canonical form without authorship.
- Genus
Genus name, updated by
taxon_validatewhere the primary name changed.- Family
Family name.
- order
Taxonomic order, appended by
taxon_add.- phylum
Taxonomic phylum, appended by
taxon_add.- Year
Year of occurrence record.
- Latitude
Decimal latitude, filtered to
[25, 50].- Longitude
Decimal longitude, filtered to
[125, 150].- Country
Country name or code as recorded in the source data set.
- Prefecture
State, province, or region as recorded in the source data set.
- Precise Location
Locality description as recorded in the source data set.
- Source
Institution code or group abbreviation identifying the collecting institution.
- Data Frame
Character. Identifies the source data set for each row. One of
"GBIF","InvBase","BISMAL","OBIS", or"NSMT".- catalogNumber
Museum lot identification code.
- individualCount
Specimen count per lot.
- Family_cite
Family name with authorship appended by
taxon_cite.Enteroctopodidaeauthorship added manually as it could not be resolved automatically.- Genus_cite
Genus name with authorship appended by
taxon_cite.- SciName_cite
Scientific name with authorship appended by
taxon_cite.- Genus_cite_italic
Plotmath italic expression for
Genus_cite, suitable for use in ggplot2 viaitalicize.- SciName_cite_italic
Plotmath italic expression for
SciName_cite, suitable for use in ggplot2 viaitalicize.
Source
Derived from museum. Full source CSVs (raw, trimmed, and
Japan-filtered) are available at
https://github.com/btorgovitsky00/datamuseum.
Original sources:
Global Biodiversity Information Facility (GBIF). GBIF.org (30 March 2026) GBIF Occurrence Download. https://www.gbif.org doi:10.15468/dl.2379hj
Invert-E-Base. Downloaded 30 March 2026. https://invertebase.org
Biological Information System for Marine Life (BISMAL). Downloaded 30 March 2026. https://www.godac.jamstec.go.jp/bismal/e/
Ocean Biodiversity Information System (OBIS). Downloaded 30 March 2026. https://obis.org
National Museum of Nature and Science, Japan (NSMT). Data obtained directly from the museum, early 2024. https://www.kahaku.go.jp/english/
Details
Processing proceeds in the following steps from museum:
taxon_cleanerapplied toSciNamein place withdrop_na = TRUE, removing uncertain names and reducing the data set from 2,633 to 2,222 observations.Octopus vulgaris manually corrected to Octopus sinensis to reflect current accepted taxonomy for the Pacific form.
taxon_validateapplied toSciNamewithupdate_related = TRUEto resolve synonyms and update related taxonomic columns.taxon_spellcheckapplied withupdate = TRUEusing the pre-computed validation report.Pinnoctopus manually corrected to Callistoctopus across all columns — a generic synonym not resolved automatically by
taxon_validate.taxon_addappendsorderandphylumwithsort = TRUE.taxon_citeappends authorship toFamily,Genus, andSciName.Muusoctopus small in mature removed as an informal morphospecies name not representing a valid taxon.
Enteroctopodidaeauthorship added manually as it could not be resolved bytaxon_cite.italicizeapplied toGenus_citeandSciName_cite.
See also
museum for the combined pre-validation data set,
taxon_cleaner for the cleaning function applied during
processing,
taxon_validate for the validation function applied during
processing,
taxon_spellcheck for the spellcheck function applied during
processing,
taxon_add for the rank enrichment function applied during
processing,
taxon_cite for the authorship appending function applied
during processing,
italicize for the italic formatting function applied during
processing.
