Skip to contents

Combined Octopodoidea occurrence records for Japan produced by merging the five Japan-filtered source data sets (GBIF_Japan, InvBase_Japan, BISMAL_Japan, OBIS_Japan, and NSMT_Japan) via rbind. Duplicate records are removed using deduplicate on the catalogNumber field, and individual-level records are reconstructed from aggregated specimen counts using duplicate on the individualCount field. See museum_taxon for the taxonomically validated and enriched version.

Usage

museum

Format

A data frame with 2,633 rows and 13 variables:

SciName

Scientific name as recorded in the source data set.

Genus

Genus name.

Family

Family name.

Year

Year of occurrence record.

Latitude

Decimal latitude, filtered to [25, 50].

Longitude

Decimal longitude, filtered to [125, 150].

Country

Country name or code as recorded in the source data set.

Prefecture

State, province, or region as recorded in the source data set.

Precise Location

Locality description as recorded in the source data set.

Source

Institution code or group abbreviation identifying the collecting institution.

Data Frame

Character. Identifies the source data set for each row. One of "GBIF", "InvBase", "BISMAL", "OBIS", or "NSMT".

catalogNumber

Museum lot identification code used for duplicate detection. Rows with NA in this field were removed during deduplication.

individualCount

Specimen count per lot. Used to expand rows via duplicate to reconstruct individual-level records.

Source

Derived from GBIF_Japan, InvBase_Japan, BISMAL_Japan, OBIS_Japan, and NSMT_Japan. Full source CSVs (raw, trimmed, and Japan-filtered) are available at https://github.com/btorgovitsky00/datamuseum.

Original sources:

Global Biodiversity Information Facility (GBIF). GBIF.org (30 March 2026) GBIF Occurrence Download. https://www.gbif.org doi:10.15468/dl.2379hj

Invert-E-Base. Downloaded 30 March 2026. https://invertebase.org

Biological Information System for Marine Life (BISMAL). Downloaded 30 March 2026. https://www.godac.jamstec.go.jp/bismal/e/

Ocean Biodiversity Information System (OBIS). Downloaded 30 March 2026. https://obis.org

National Museum of Nature and Science, Japan (NSMT). Data obtained directly from the museum, early 2024. https://www.kahaku.go.jp/english/

Details

Processing proceeds in the following steps:

  1. The five Japan-filtered data sets are combined via rbind with a Data Frame column added to identify the source of each row, producing 2,707 observations.

  2. deduplicate is applied on catalogNumber with drop_na = TRUE, removing 608 rows with missing catalogNumber and 143 duplicate rows, leaving 1,956 observations. Duplicate records are accessible via attr(museum, "duplicates").

  3. duplicate is applied on individualCount to expand aggregated specimen counts to individual-level records, increasing the row count from 1,956 to 2,633.

See also

GBIF_Japan, InvBase_Japan, BISMAL_Japan, OBIS_Japan, NSMT_Japan for the individual source data sets,

deduplicate for the deduplication function applied during processing,

duplicate for the row expansion function applied during processing,

museum_taxon for the taxonomically validated and enriched version.