Skip to contents

This function scrapes all hyperlinks within a given URL and processes the data into a tibble format. It saves the resulting tibble into a parquet file.

Usage

crawl(
  url,
  index_create = TRUE,
  aggressive = FALSE,
  overwrite = FALSE,
  pkg_version = NULL,
  pkg_name = NULL,
  service = "openai"
)

Arguments

url

A character string with the URL to be scraped.

index_create

A logical value indicating whether to create an index. Default is TRUE.

aggressive

A logical value indicating whether to use aggressive link crawling. Default is FALSE.

overwrite

A logical value indicating whether to overwrite scraped pages and index if they already exist. Default is FALSE. parallel::detectCores() - 1

pkg_version

Package version number

pkg_name

Package name

service

The service to use for scraping. Default is "openai". Options are "openai" and "local".

Value

NULL. The resulting tibble is saved into a parquet file.