Getting Started with measure.sec • measure.sec

What You’ll Learn

By the end of this tutorial, you will be able to:

Load and convert SEC chromatogram data into the measure format
Build a recipe that processes your detector signals
Apply calibration to convert retention time to molecular weight
Calculate MW averages (Mn, Mw, Mz, dispersity)

Time to complete: ~15 minutes

Prerequisites

Before starting, you should have: - Basic R knowledge (data frames, pipes, functions) - R and RStudio installed - No prior SEC/GPC knowledge required (we’ll cover the basics)

Overview

measure.sec provides preprocessing steps for Size Exclusion Chromatography (SEC) and Gel Permeation Chromatography (GPC) data analysis. It extends the measure package using the recipes framework.

What is SEC/GPC?

Size Exclusion Chromatography (SEC), also known as Gel Permeation Chromatography (GPC), separates molecules by size. Larger molecules elute faster (excluded from pores), smaller molecules elute slower (enter pores). This lets you determine molecular weight averages (Mn, Mw, Mz) and dispersity (distribution breadth).

SEC data consists of chromatograms (detector response vs. elution time). Common detectors include RI (concentration), UV (chromophores), and light scattering (absolute MW). This package processes these signals to extract molecular weight information.

Workflow Overview

The typical SEC analysis workflow follows these steps:

┌─────────────────────────────────────────────────────────────────────────┐
│                        SEC Analysis Workflow                            │
└─────────────────────────────────────────────────────────────────────────┘

  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │   Raw Data   │────▶│ Preprocess   │────▶│  Calibrate   │
  │ (CSV/Export) │     │  Signals     │     │   MW Scale   │
  └──────────────┘     └──────────────┘     └──────────────┘
         │                    │                    │
         │                    │                    │
         ▼                    ▼                    ▼
   • Detector signals   • Baseline correct  • Apply standards
   • Elution times      • Align detectors   • Or use MALS for
   • Sample metadata    • Convert units       absolute MW
                                                   │
                                                   ▼
                        ┌──────────────────────────────────────┐
                        │          Calculate Results           │
                        │  • MW averages (Mn, Mw, Mz)         │
                        │  • Dispersity                        │
                        │  • MW distribution                   │
                        │  • Aggregate/fragment %              │
                        └──────────────────────────────────────┘

In measure.sec, each box becomes one or more recipe steps. You chain these steps together into a reproducible analysis pipeline.

Basic Workflow Overview

A typical SEC analysis starts with raw detector signals, converts them to the measure format, applies baseline correction, processes detector signals with appropriate normalization factors, then either applies calibration from standards (conventional) or uses light scattering for absolute molecular weight (MALS).

flowchart TD
    A[Raw Chromatogram<br>RI, UV, MALS signals] --> B[step_measure_input_long<br>Convert to measure format]
    B --> C[step_sec_baseline<br>Baseline correction]
    C --> D[step_sec_ri / step_sec_uv<br>Detector processing with dn/dc or ε]
    D --> E{Calibration<br>method?}
    E -->|"Standards available<br>(same polymer type)"| F[step_sec_conventional_cal<br>Polynomial fit to standards]
    E -->|"MALS detector<br>(absolute MW needed)"| G[step_sec_mals<br>Angular extrapolation<br>Zimm/Debye/Berry]
    F --> H[step_sec_mw_averages<br>Calculate Mn, Mw, Mz]
    G --> I[Absolute MW & Rg<br>directly from MALS]
    H --> J[Results<br>MW averages, distributions, plots]
    I --> J

    style A fill:#e1f5fe
    style J fill:#c8e6c9

Choose conventional calibration when you have narrow MW standards of the same polymer type. Choose MALS when you need absolute MW without polymer-specific standards or when analyzing unknown polymers.

Installation

# Install from GitHub
# install.packages("pak")
pak::pak("JamesHWade/measure")
pak::pak("JamesHWade/measure-sec")

Setup

library(measure)
#> Loading required package: recipes
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(measure.sec)
library(recipes)
library(dplyr)
library(ggplot2)

The Data Model

SEC data in measure.sec uses the measure package’s nested tibble structure. Understanding this structure is key to working effectively with the package.

measure_tbl: A Single Chromatogram

A measure_tbl is a tibble (data frame) representing a single chromatogram with two required columns:

location: The x-axis values—typically elution time (minutes) or elution volume (mL)
value: The y-axis values—detector response (mV, AU, or processed units)

Think of it as one line on a chromatogram plot. For example, an RI detector signal from one injection is stored as a measure_tbl.

measure_list: Multiple Chromatograms (Internal Format)

A measure_list is a list column containing multiple measure_tbl objects. This is the internal format used by measure.sec recipe steps—you typically won’t create this yourself. Instead, step_measure_input_long() converts your raw data into this format automatically.

After conversion, your data will look like this:

┌──────────────────────────────────────────────────────────────┐
│  sample_id   known_mw   dn_dc       ri                      │
├──────────────────────────────────────────────────────────────┤
│  "PS-50K"    50000      0.185       <measure_list[1]>       │
│  "PS-100K"   100000     0.185       <measure_list[1]>       │
│  "PMMA-75K"  75000      0.084       <measure_list[1]>       │
└──────────────────────────────────────────────────────────────┘
                                           │
                                           ▼
                            Each entry contains a measure_tbl:
                              location │ value
                              ─────────┼────────
                              5.0      │ 0.002
                              5.1      │ 0.015
                              5.2      │ 0.089
                              ...      │ ...

This nested structure has several advantages:

Tidy organization: Each row is one sample with all its metadata and chromatogram(s)
Batch processing: Apply the same recipe to many samples at once
Multiple detectors: Store RI, UV, and MALS signals as separate nested columns
Metadata preservation: Sample properties like dn_dc or known_mw travel with the chromatogram

Example Dataset

The package includes sec_triple_detect, a synthetic multi-detector SEC dataset in long format (one row per time point). This is a good starting point for learning the workflow before analyzing your own data.

# Load the example dataset
data(sec_triple_detect, package = "measure.sec")

# View the structure - this is LONG format data (one row per time point)
# The signal columns (ri_signal, uv_signal, mals_signal) are numeric vectors
glimpse(sec_triple_detect)
#> Rows: 24,012
#> Columns: 11
#> $ sample_id        <chr> "PS-1K", "PS-1K", "PS-1K", "PS-1K", "PS-1K", "PS-1K",…
#> $ sample_type      <chr> "standard", "standard", "standard", "standard", "stan…
#> $ polymer_type     <chr> "polystyrene", "polystyrene", "polystyrene", "polysty…
#> $ elution_time     <dbl> 5.00, 5.01, 5.02, 5.03, 5.04, 5.05, 5.06, 5.07, 5.08,…
#> $ ri_signal        <dbl> 6.926392e-04, 0.000000e+00, 3.199253e-04, 4.197175e-0…
#> $ uv_signal        <dbl> 0.0002034583, 0.0000000000, 0.0000000000, 0.000000000…
#> $ mals_signal      <dbl> 3.370385e-05, 3.483481e-05, 3.102092e-05, 3.261962e-0…
#> $ known_mw         <dbl> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000,…
#> $ known_dispersity <dbl> 1.05, 1.05, 1.05, 1.05, 1.05, 1.05, 1.05, 1.05, 1.05,…
#> $ dn_dc            <dbl> 0.185, 0.185, 0.185, 0.185, 0.185, 0.185, 0.185, 0.18…
#> $ extinction_coef  <dbl> 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2…

The dataset contains:

12 polymer samples: polystyrene (PS), PMMA, PEG, and copolymers
~2,000 time points per sample: giving 24,012 total rows
Three detector signals: RI, UV, and MALS (as numeric columns)
Known molecular weights: For validating your analysis
Optical constants: dn/dc and extinction coefficients (needed for quantitative analysis)

# View the unique samples in the dataset
# Each sample_id represents one injection; the chromatogram spans many rows
sec_triple_detect |>
  distinct(sample_id, sample_type, polymer_type) |>
  print(n = 12)
#> # A tibble: 12 × 3
#>    sample_id sample_type polymer_type
#>    <chr>     <chr>       <chr>       
#>  1 PS-1K     standard    polystyrene 
#>  2 PS-10K    standard    polystyrene 
#>  3 PS-50K    standard    polystyrene 
#>  4 PS-100K   standard    polystyrene 
#>  5 PS-500K   standard    polystyrene 
#>  6 PMMA-Low  sample      pmma        
#>  7 PMMA-Med  sample      pmma        
#>  8 PMMA-High sample      pmma        
#>  9 PEG-5K    sample      peg         
#> 10 PEG-20K   sample      peg         
#> 11 Copoly-A  sample      copolymer   
#> 12 Copoly-B  sample      copolymer

Basic Workflow: RI Detector Analysis

Let’s walk through a complete analysis of a polystyrene sample using the RI detector. This demonstrates the core pattern you’ll use for all SEC analysis.

# Select a single polystyrene standard for this example
# In practice, you'd often process many samples at once
ps_sample <- sec_triple_detect |>
  filter(sample_id == "PS-50K")

# View sample info - note this is still long format (many rows per sample)
ps_sample |>
  select(sample_id, polymer_type, known_mw, elution_time, ri_signal) |>
  head()
#> # A tibble: 6 × 5
#>   sample_id polymer_type known_mw elution_time ri_signal
#>   <chr>     <chr>           <dbl>        <dbl>     <dbl>
#> 1 PS-50K    polystyrene     50000         5     0       
#> 2 PS-50K    polystyrene     50000         5.01  0.000279
#> 3 PS-50K    polystyrene     50000         5.02  0       
#> 4 PS-50K    polystyrene     50000         5.03  0       
#> 5 PS-50K    polystyrene     50000         5.04  0.000842
#> 6 PS-50K    polystyrene     50000         5.05  0.000483

Step 1: Create a Recipe

Recipes define a sequence of preprocessing steps. Think of a recipe as a blueprint for your analysis—it describes what to do, but doesn’t do it yet. This separation lets you define the workflow once and apply it to many samples.

# Start a recipe with your data
# The formula specifies: predictor columns ~ grouping column
# sample_id identifies which rows belong to each chromatogram
rec <- recipe(
  ri_signal + elution_time + dn_dc ~ sample_id,
  data = ps_sample
) |>
  update_role(sample_id, new_role = "id") |>
  # Convert the ri_signal column to measure format
  # This step tells recipes how to interpret your chromatogram data
  step_measure_input_long(
    ri_signal,
    location = vars(elution_time),
    col_name = "ri"
  )

Step 2: Add Preprocessing Steps

Chain additional steps using the pipe (|>). Each step transforms the data in sequence:

rec <- recipe(
  ri_signal + elution_time + dn_dc ~ sample_id,
  data = ps_sample
) |>
  update_role(sample_id, new_role = "id") |>
  # First: convert raw signal to measure format
  step_measure_input_long(
    ri_signal,
    location = vars(elution_time),
    col_name = "ri"
  ) |>
  # Second: correct the baseline (removes drift and offset)
  step_sec_baseline(measures = "ri") |>
  # Third: process RI signal using the sample's dn/dc value
  # Dividing by dn/dc converts the RI signal to concentration-proportional units
  step_sec_ri(measures = "ri", dn_dc_column = "dn_dc")

Step 3: Prep and Bake

Two functions execute your recipe:

prep(): Learns any required parameters from the training data (like baseline fit coefficients)
bake(): Applies the transformations to produce results

# Prep: Learn parameters from the data
prepped <- prep(rec)

# Bake: Apply transformations (new_data = NULL means use the training data)
result <- bake(prepped, new_data = NULL)

# View the processed data - ri now contains the baseline-corrected,
# concentration-converted chromatogram
result |>
  select(sample_id, ri)
#> # A tibble: 1 × 2
#>   sample_id          ri
#>   <chr>          <meas>
#> 1 PS-50K    [2,001 × 2]

Why two steps? This design lets you prep once on training data (like calibration standards), then bake on new samples without re-learning parameters. It also makes your analysis reproducible.

✓ Checkpoint: You’ve successfully converted raw detector data into the measure format and applied baseline correction. Your result tibble now contains a processed ri column ready for calibration.

Molecular Weight Averages

The most common outputs from SEC analysis are molecular weight averages: - Mn (number-average): Emphasizes lower MW species - Mw (weight-average): Emphasizes higher MW species - Mz (z-average): Even more sensitive to high MW species - Dispersity (Mw/Mn): Measures breadth of the MW distribution (1.0 = monodisperse)

Use step_sec_mw_averages() to calculate these. This step requires that the x-axis (location) values already represent log₁₀(MW)—which is what step_sec_conventional_cal() provides. See the Calibration section below for the complete workflow.

Calibration Curves

Most SEC analysis requires calibration to convert retention time to molecular weight. The most common approach uses narrow standards—polymers with known molecular weights and low dispersity—to build a calibration curve.

Note: If you have a light scattering detector (MALS), you can determine absolute molecular weights without calibration. See vignette("triple-detection") for details.

# Load polystyrene narrow standards
# These are well-characterized polymers used to build the calibration curve
data(sec_ps_standards, package = "measure.sec")

# View the standards - each has a known peak molecular weight (Mp)
sec_ps_standards |>
  select(standard_name, mp, log_mp, retention_time) |>
  print(n = 8)
#> # A tibble: 16 × 4
#>   standard_name      mp log_mp retention_time
#>   <chr>           <dbl>  <dbl>          <dbl>
#> 1 PS-3150000    3150000   6.50           11.2
#> 2 PS-1870000    1870000   6.27           11.6
#> 3 PS-1090000    1090000   6.04           12.1
#> 4 PS-630000      630000   5.80           12.6
#> 5 PS-430000      430000   5.63           13.2
#> 6 PS-216000      216000   5.33           13.8
#> 7 PS-120000      120000   5.08           14.3
#> 8 PS-67500        67500   4.83           15.0
#> # ℹ 8 more rows

# Visualize the calibration curve
# The relationship between log(MW) and retention time is typically linear
# or slightly curved, so we fit a polynomial
ggplot(sec_ps_standards, aes(retention_time, log_mp)) +
  geom_point(size = 3, color = "#2E86AB") +
  geom_smooth(
    method = "lm",
    formula = y ~ poly(x, 3),
    se = TRUE,
    color = "#A23B72",
    fill = "#A23B72",
    alpha = 0.2
  ) +
  labs(
    x = "Retention Time (min)",
    y = expression(log[10](M[p])),
    title = "Polystyrene Calibration Curve"
  ) +
  theme_minimal()

Apply the calibration using step_sec_conventional_cal():

# Prepare standards in the format expected by the calibration step
# Needs columns: retention (time/volume) and log_mw
ps_cal <- sec_ps_standards |>
  select(retention = retention_time, log_mw = log_mp)

rec_cal <- recipe(
  ri_signal + elution_time + dn_dc ~ sample_id,
  data = ps_sample
) |>
  update_role(sample_id, new_role = "id") |>
  step_measure_input_long(
    ri_signal,
    location = vars(elution_time),
    col_name = "ri"
  ) |>
  step_sec_baseline(measures = "ri") |>
  # Apply conventional calibration using polystyrene standards
  # This converts retention time to log10(MW) on the x-axis
  # fit_type options: "linear", "quadratic", "cubic" (most common)
  step_sec_conventional_cal(
    standards = ps_cal,
    fit_type = "cubic"
  ) |>
  # Calculate MW averages from the calibrated chromatogram
  # The calibration step converted location values to log10(MW)
  step_sec_mw_averages()

prepped_cal <- prep(rec_cal)
#> Warning: Standard at 12.58 has 14.4% MW deviation.
#> ℹ Consider removing outlier standards or using a different fit type.
#> Warning: 1037 points (51.8%) are outside calibration range.
#> ℹ Calibration range: 11.15 to 20.79
result_cal <- bake(prepped_cal, new_data = NULL)

# View molecular weight results
# New columns are added with mw_ prefix
result_cal |>
  select(sample_id, mw_mn, mw_mw, mw_mz, mw_dispersity)
#> # A tibble: 1 × 5
#>   sample_id      mw_mn   mw_mw   mw_mz mw_dispersity
#>   <chr>          <dbl>   <dbl>   <dbl>         <dbl>
#> 1 PS-50K    419927756. 8.24e21 7.81e23       1.96e13

Important: Conventional calibration assumes your sample has similar hydrodynamic behavior to your standards. Polystyrene standards work well for other flexible polymers in THF, but for proteins in aqueous SEC, use protein standards or light scattering.

✓ Checkpoint: You’ve completed a full SEC analysis! Your result_cal tibble contains molecular weight averages (Mn, Mw, Mz) and dispersity calculated from your chromatogram using conventional calibration.

Available Steps

The package provides a comprehensive set of recipe steps. Here’s a quick reference organized by function:

Preprocessing

step_sec_baseline(): SEC-optimized baseline correction
step_sec_detector_delay(): Correct inter-detector delays

Detector Processing

step_sec_ri(): RI detector with dn/dc
step_sec_uv(): UV detector with extinction coefficient
step_sec_mals(), step_sec_lals(), step_sec_rals(): Light scattering
step_sec_dls(): Dynamic light scattering
step_sec_viscometer(): Differential viscometer

Molecular Weight

step_sec_mw_averages(): Mn, Mw, Mz, dispersity
step_sec_mw_fractions(): MW fractions above/below cutoffs
step_sec_mw_distribution(): Differential/cumulative MWD
step_sec_conventional_cal(): Narrow standard calibration
step_sec_universal_cal(): Universal calibration

Composition & Protein

step_sec_uv_ri_ratio(): UV/RI ratio for heterogeneity
step_sec_composition(): Copolymer composition
step_sec_aggregates(): HMWS/monomer/LMWS quantitation
step_sec_protein(): Complete protein SEC workflow

Troubleshooting

Common issues and quick fixes:

Problem	Solution
“Column not found”	Check column names match exactly (case-sensitive)
“No measure columns found”	Add `step_measure_input_long()` at the start of your recipe
NA values in MW results	Check calibration range covers your retention times
Recipe won’t prep	Try prepping with fewer steps to isolate the issue

# Debugging tips:
names(your_data)                    # Check column names
measure::find_measure_cols(result)  # Find measure columns after bake
result$ri[[1]] |> summary()         # Inspect chromatogram data

Next Steps

Now that you understand the basics, explore these vignettes for specialized workflows:

Vignette	Use when you need to…
Multi-Detector SEC	Integrate multiple detectors (RI + UV + LS)
MALS Detection	Get absolute MW and radius of gyration
LALS/RALS Detection	Use single-angle light scattering
Protein SEC	Analyze aggregates (HMWS/monomer/LMWS)
Copolymer Composition	Determine composition via UV/RI ratio
Calibration Management	Save, load, and reuse calibrations
System Suitability	Set up QC checks and column monitoring
Exporting Results	Generate summary tables and reports

You can also browse all available functions with:

# See all SEC/GPC steps registered with measure
measure::measure_steps(techniques = "SEC/GPC")

Session Info

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_4.0.1          measure.sec_0.0.0.9000 measure_0.0.1.9002    
#> [4] recipes_1.3.1          dplyr_1.1.4           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6        xfun_0.56           bslib_0.10.0       
#>  [4] lattice_0.22-7      vctrs_0.7.1         tools_4.5.2        
#>  [7] generics_0.1.4      parallel_4.5.2      tibble_3.3.1       
#> [10] pkgconfig_2.0.3     Matrix_1.7-4        data.table_1.18.2.1
#> [13] RColorBrewer_1.1-3  S7_0.2.1            desc_1.4.3         
#> [16] lifecycle_1.0.5     compiler_4.5.2      farver_2.1.2       
#> [19] textshaping_1.0.4   codetools_0.2-20    htmltools_0.5.9    
#> [22] class_7.3-23        sass_0.4.10         yaml_2.3.12        
#> [25] prodlim_2025.04.28  tidyr_1.3.2         pillar_1.11.1      
#> [28] pkgdown_2.2.0       jquerylib_0.1.4     MASS_7.3-65        
#> [31] cachem_1.1.0        gower_1.0.2         rpart_4.1.24       
#> [34] nlme_3.1-168        parallelly_1.46.1   lava_1.8.2         
#> [37] tidyselect_1.2.1    digest_0.6.39       future_1.69.0      
#> [40] purrr_1.2.1         listenv_0.10.0      labeling_0.4.3     
#> [43] splines_4.5.2       fastmap_1.2.0       grid_4.5.2         
#> [46] cli_3.6.5           magrittr_2.0.4      utf8_1.2.6         
#> [49] survival_3.8-3      future.apply_1.20.1 withr_3.0.2        
#> [52] scales_1.4.0        lubridate_1.9.4     timechange_0.4.0   
#> [55] rmarkdown_2.30      globals_0.19.0      nnet_7.3-20        
#> [58] timeDate_4052.112   ragg_1.5.0          evaluate_1.0.5     
#> [61] knitr_1.51          hardhat_1.4.2       mgcv_1.9-3         
#> [64] rlang_1.1.7         Rcpp_1.1.1          glue_1.8.0         
#> [67] ipred_0.9-15        jsonlite_2.0.0      R6_2.6.1           
#> [70] systemfonts_1.3.1   fs_1.6.6