Expands a data frame by repeating each row X number of times, specified by a count column. Useful for reconstructing individual-level data from aggregated or frequency-weighted data frames.
Value
A data frame with more rows than data, where each row i
appears n_col[i] times (or once if n_col[i] is NA
and drop_na = FALSE). Row names are not reset. The n_col
column is retained in the output.
Details
Expansion is performed via rep(seq_len(nrow(data)), times = n_col),
so the original row order is preserved within each group of duplicates.
NA counts are replaced with 1 prior to expansion when
drop_na = FALSE.
A console message reports the final row count of the expanded data frame.
See also
deduplicate for the inverse operation,
rep for the underlying row repetition mechanism.
Examples
df <- data.frame(
group = c("A", "B", "C"),
value = c(10, 20, 30),
n = c(3, 1, 2)
)
# Expand so each row repeats n times
duplicate(df, n_col = n)
#> [duplicate] dataset expanded to 6 rows based on 'n'
#> group value n
#> 1 A 10 3
#> 1.1 A 10 3
#> 1.2 A 10 3
#> 2 B 20 1
#> 3 C 30 2
#> 3.1 C 30 2
# NA counts default to 1 repetition
df_na <- data.frame(
group = c("A", "B", "C"),
n = c(2, NA, 3)
)
duplicate(df_na, n_col = n)
#> [duplicate] dataset expanded to 6 rows based on 'n'
#> group n
#> 1 A 2
#> 1.1 A 2
#> 2 B 1
#> 3 C 3
#> 3.1 C 3
#> 3.2 C 3
# Drop rows with NA counts instead
duplicate(df_na, n_col = n, drop_na = TRUE)
#> [duplicate] 1 NA row(s) removed from numbering column
#> [duplicate] dataset expanded to 5 rows based on 'n'
#> group n
#> 1 A 2
#> 1.1 A 2
#> 3 C 3
#> 3.1 C 3
#> 3.2 C 3
