Combined Octopodoidea occurrence records for Japan produced by merging the
five Japan-filtered source data sets (GBIF_Japan,
InvBase_Japan, BISMAL_Japan,
OBIS_Japan, and NSMT_Japan) via rbind.
Duplicate records are removed using deduplicate on the
catalogNumber field, and individual-level records are reconstructed
from aggregated specimen counts using duplicate on the
individualCount field. See museum_taxon for the
taxonomically validated and enriched version.
Format
A data frame with 2,633 rows and 13 variables:
- SciName
Scientific name as recorded in the source data set.
- Genus
Genus name.
- Family
Family name.
- Year
Year of occurrence record.
- Latitude
Decimal latitude, filtered to
[25, 50].- Longitude
Decimal longitude, filtered to
[125, 150].- Country
Country name or code as recorded in the source data set.
- Prefecture
State, province, or region as recorded in the source data set.
- Precise Location
Locality description as recorded in the source data set.
- Source
Institution code or group abbreviation identifying the collecting institution.
- Data Frame
Character. Identifies the source data set for each row. One of
"GBIF","InvBase","BISMAL","OBIS", or"NSMT".- catalogNumber
Museum lot identification code used for duplicate detection. Rows with
NAin this field were removed during deduplication.- individualCount
Specimen count per lot. Used to expand rows via
duplicateto reconstruct individual-level records.
Source
Derived from GBIF_Japan, InvBase_Japan,
BISMAL_Japan, OBIS_Japan, and
NSMT_Japan. Full source CSVs (raw, trimmed, and
Japan-filtered) are available at
https://github.com/btorgovitsky00/datamuseum.
Original sources:
Global Biodiversity Information Facility (GBIF). GBIF.org (30 March 2026) GBIF Occurrence Download. https://www.gbif.org doi:10.15468/dl.2379hj
Invert-E-Base. Downloaded 30 March 2026. https://invertebase.org
Biological Information System for Marine Life (BISMAL). Downloaded 30 March 2026. https://www.godac.jamstec.go.jp/bismal/e/
Ocean Biodiversity Information System (OBIS). Downloaded 30 March 2026. https://obis.org
National Museum of Nature and Science, Japan (NSMT). Data obtained directly from the museum, early 2024. https://www.kahaku.go.jp/english/
Details
Processing proceeds in the following steps:
The five Japan-filtered data sets are combined via
rbindwith aData Framecolumn added to identify the source of each row, producing 2,707 observations.deduplicateis applied oncatalogNumberwithdrop_na = TRUE, removing 608 rows with missingcatalogNumberand 143 duplicate rows, leaving 1,956 observations. Duplicate records are accessible viaattr(museum, "duplicates").duplicateis applied onindividualCountto expand aggregated specimen counts to individual-level records, increasing the row count from 1,956 to 2,633.
See also
GBIF_Japan, InvBase_Japan,
BISMAL_Japan, OBIS_Japan,
NSMT_Japan for the individual source data sets,
deduplicate for the deduplication function applied during
processing,
duplicate for the row expansion function applied during
processing,
museum_taxon for the taxonomically validated and enriched
version.
