Skip to contents

Sample rows from a dataset with optional seed for reproducibility. Used by optimizers for consistent train/validation splits and demo selection.

Usage

sample_dataset(dataset, n = NULL, seed = NULL, replace = FALSE)

Arguments

dataset

A data frame to sample from.

n

Number of rows to sample. If NULL or greater than nrow(dataset), returns the full dataset.

seed

Random seed for reproducibility. If NULL, sampling is random.

replace

Whether to sample with replacement. Default is FALSE.

Value

A data frame containing the sampled rows.

Examples

df <- tibble::tibble(x = 1:10, y = letters[1:10])

# Deterministic sampling
sample1 <- sample_dataset(df, n = 5, seed = 42)
sample2 <- sample_dataset(df, n = 5, seed = 42)
identical(sample1, sample2)  # TRUE
#> [1] TRUE

# Random sampling (different each time)
sample3 <- sample_dataset(df, n = 5)