This function scrapes all hyperlinks within a given URL and processes the data into a tibble format. It saves the resulting tibble into a parquet file.
Usage
crawl(
url,
index_create = TRUE,
aggressive = FALSE,
overwrite = FALSE,
pkg_version = NULL,
pkg_name = NULL,
service = "openai"
)
Arguments
- url
A character string with the URL to be scraped.
- index_create
A logical value indicating whether to create an index. Default is TRUE.
- aggressive
A logical value indicating whether to use aggressive link crawling. Default is FALSE.
- overwrite
A logical value indicating whether to overwrite scraped pages and index if they already exist. Default is FALSE.
parallel::detectCores() - 1
- pkg_version
Package version number
- pkg_name
Package name
- service
The service to use for scraping. Default is "openai". Options are "openai" and "local".