Create Scrape
Scrape a url with provided configuration and get content.
Authorizations
Bearer authentication header of the form Bearer , where is your auth token.
Body
The URL to start scraping from.
Time to wait in milliseconds before starting the scraping.
Formats in which you want the content.
html, markdown, text, json, raw_pdf, screenshot Option to remove certain CSS selectors from the content. Optionally, you can also pass a JSON stringified array of specific selectors you want to remove. The CSS selectors removed when this option is set to default are ['nav','footer','script','style','noscript','svg',[role=alert],[role=banner],[role=dialog],[role=alertdialog],[role=region][aria-label*=skip i],[aria-modal=true]]
default, none, array Actions to perform on the page before getting the content.
- Wait
- Click
- Fill Input
- Scroll
Residential country to load the request from.
Supported values are:
- US (United States)
- CA (Canada)
- IT (Italy)
- IN (India)
- GB (England)
- JP (Japan)
- MX (Mexico)
- AU (Australia)
- ID (Indonesia)
- UA (UAE)
- RU (Russia)
- RANDOM
Some operations, like scraping Google Search and Google News, support all countries.
Specify the HTML transformer to use, if any. Postlight's Mercury Parser library is used to remove ads and other unwanted content from the scraped content.
postlight, none Option to remove images from the scraped content. Defaults to false.
List of class names to remove from the content.
When defining json as a format, you can use this parameter to specify the parser to use. Parsers are useful to extract structured content from web pages. Olostep has a few parsers built in for most common web pages, and you can also create your own parsers.
With this option, you can get all the links present on the page you scrape. Links are always returned as absolute URLs.
Configuration for screen size. Preset dimensions are available through screen_type: desktop (1920x1080), mobile (414x896), or default (768x1024).
User-defined metadata. Not supported yet
Response
Successful response with the scrape initiation details.
Scrape ID
The kind of object. "scrape" for this endpoint.
Created epoch
User-defined metadata.
The URL that was scraped.
Number of credits consumed by this request. Populated after execution completes. Credits are the source of truth for billing.
Estimated cost in USD for this request. Populated after execution completes. Calculated from credits consumed and your plan rate — 99% accurate, but credits_consumed is the authoritative value.