Skip to main content
Olostep is a Web search, scraping and crawling API — an API to search, extract and structure web data. This guide shows how to use Olostep with Apify Actors to build reliable web data pipelines end‑to‑end.

What you can build

Scrape Website

Extract content from any single URL in Markdown, HTML, JSON, or Text

Batch Scrape URLs

Process large lists of URLs in parallel with structured outputs

Create Crawl

Discover and scrape linked pages to build complete datasets

Create Map

Extract all URLs from a website (sitemap-like discovery)

AI-powered Answers

Ask questions and get structured JSON answers with sources

Quick start

1) Install Apify CLI

npm install -g apify-cli
apify --version

2) Get your Olostep API key

From the Olostep Dashboard → API Keys.

3) Run the Olostep Actor locally

cd olostep-tools/integrations/apify
apify run
Default local input file lives at: olostep-tools/integrations/apify/storage/key_value_stores/default/INPUT.json Example input:
{
  "operation": "scrape",
  "apiKey": "YOUR_OLostep_API_KEY",
  "url_to_scrape": "https://example.com",
  "formats": "markdown"
}

4) Deploy to Apify (cloud)

apify login
apify push
Then open Apify Console → Actors → run the actor with your desired input.

Run in Apify Console (step by step)

  1. Open your Actor in Apify Console → Source → Input.
  2. In the Manual tab you’ll see a visible “Olostep API Key” field. Paste your key from the Olostep Dashboard.
  3. Choose an operation (defaults to “scrape”).
  4. Fill the relevant fields (for “scrape”, set “URL to Scrape”).
  5. Click Save → Start.
  6. When the run finishes, open the Dataset tab to download results (JSON/CSV/Excel).
Notes:
  • For “URL to Scrape”, you can paste with or without scheme. If missing, the actor automatically prepends https://.
  • If a site is heavy in JavaScript and you see a timeout, set “Wait Before Scraping” to 2000–5000 ms and run again.

Available operations

Scrape Website

Extract content from a single URL. Great for page‑level automation.
operation
constant
default:"scrape"
Must be “scrape”
apiKey
string
required
Your Olostep API key (Bearer)
url_to_scrape
string
required
The URL to scrape (must include http:// or https://)
formats
dropdown
default:"markdown"
One of: Markdown, HTML, JSON, Text
country
string
Optional country code (e.g., “US”, “GB”, “CA”)
wait_before_scraping
integer
Optional wait time in ms for JavaScript rendering (0–10000)
parser
string
Optional parser ID (e.g., “@olostep/amazon-product”)
Output fields:
  • id, url, status, formats
  • markdown_content / html_content / json_content / text_content
  • hosted URLs (if available), page metadata

Batch Scrape URLs

Process many URLs at once with consistent formatting and structure.
operation
constant
default:"batch"
Must be “batch”
apiKey
string
required
Your Olostep API key
batch_array
text
required
JSON array of objects with url and optional custom_id
Example: [{"url":"https://example.com","custom_id":"site1"}]
formats
dropdown
default:"markdown"
One of: Markdown, HTML, JSON, Text
country
string
Optional country code
wait_before_scraping
integer
Optional wait time in ms for JS sites
parser
string
Optional parser ID
Output fields:
  • batch_id, status, total_urls, created_at, formats, country, parser, urls[]

Create Crawl

Follow links and scrape multiple pages from a start URL.
operation
constant
default:"crawl"
Must be “crawl”
apiKey
string
required
Your Olostep API key
start_url
string
required
Starting URL for the crawl
max_pages
integer
default:"10"
Max pages to crawl
Follow on‑page links
formats
dropdown
default:"markdown"
One of: Markdown, HTML, JSON, Text
country
string
Optional country code
parser
string
Optional parser ID
Output fields:
  • crawl_id, object, status, start_url, max_pages, follow_links, created, formats

Create Map

Discover all URLs on a website and prepare for later batch scraping.
operation
constant
default:"map"
Must be “map”
apiKey
string
required
Your Olostep API key
website_url
string
required
The website to map
search_query
string
Optional query filter
top_n
integer
Limit number of URLs
include_patterns
string
Include glob(s), e.g. “/products/**”
exclude_patterns
string
Exclude glob(s), e.g. “/admin/**”
Output fields:
  • map_id, object, website_url, total_urls, urls[], search_query, top_n

Copy‑paste JSON examples (Console → Input → JSON)

Scrape

{
  "operation": "scrape",
  "apiKey": "YOUR_OLOSTEP_API_KEY",
  "url_to_scrape": "https://www.wikipedia.org",
  "formats": "markdown",
  "wait_before_scraping": 2000
}

Batch

{
  "operation": "batch",
  "apiKey": "YOUR_OLOSTEP_API_KEY",
  "batch_array": "[{\"url\":\"https://example.com\",\"custom_id\":\"site1\"},{\"url\":\"https://olostep.com\",\"custom_id\":\"site2\"}]",
  "formats": "json"
}

Crawl

{
  "operation": "crawl",
  "apiKey": "YOUR_OLOSTEP_API_KEY",
  "start_url": "https://docs.example.com",
  "max_pages": 50,
  "follow_links": true,
  "formats": "markdown"
}

Map

{
  "operation": "map",
  "apiKey": "YOUR_OLOSTEP_API_KEY",
  "website_url": "https://example.com",
  "include_patterns": "/blog/**",
  "top_n": 200
}

Answers

{
  "operation": "answers",
  "apiKey": "YOUR_OLOSTEP_API_KEY",
  "task": "What is the latest funding round of Olostep? Provide company, round, date, amount.",
  "json": "{\"company\":\"\",\"round\":\"\",\"date\":\"\",\"amount\":\"\"}"
}

Example workflows

  1. Create Map → include “/products/**”
  2. Parse URLs → build batch array
  3. Batch Scrape URLs → formats: JSON
  4. Send to Google Sheets / Airtable
  1. Schedule actor (daily)
  2. Scrape Website → formats: Markdown
  3. Summarize with LLM
  4. Notify on Slack
  1. Create Crawl (blog/docs)
  2. Store outputs in Notion
  3. Refresh weekly with Schedule

Specialized parsers

Olostep supports parsers to structure data for popular sites.

Amazon Product

@olostep/amazon-product → title, price, rating, reviews, images, variants

Google Search

@olostep/google-search → results, titles, snippets, URLs

Google Maps

@olostep/google-maps → business info, reviews, ratings, location

More Parsers

Explore email extractors, social handle finders, calendar link extractors, and more

Best practices

Faster, cheaper, easier to monitor and respect rate limits.
JS‑heavy sites: increase wait_before_scraping (e.g., 2000–5000ms).
Avoid unnecessary tasks — check changes first, keep deduplication state.
Use hosted outputs to bypass payload size limits in Apify flows.
Batch/Crawl/Map return IDs; retrieve later or chain with a delay.
If you see a 504 or transient timeout, the actor automatically retries once with a short wait time.
You can also set “Wait Before Scraping” to 2000–5000 ms for JS‑heavy pages.

Troubleshooting

  • Check API key from dashboard
  • Remove trailing spaces
  • Re‑enter in Apify input form
  • Increase wait time
  • Verify URL is public / not login‑gated
  • Try different output format
  • Space runs via schedule
  • Prefer batch for many URLs
  • Upgrade Olostep plan if needed
  • Try country parameter
  • Adjust wait and parser
  • Contact support for guidance

Pricing

Olostep charges by API usage (independent of Apify):
  • Scrapes → per scrape
  • Batches → per URL
  • Crawls → per page
  • Maps → per operation
See https://olostep.com/pricing.

Security

  • Your API key is sent as Bearer token at runtime.
  • Do not commit keys to version control; Apify stores inputs in Key‑Value Store.
  • In local development, keep keys in storage/key_value_stores/default/INPUT.json (gitignored).

Support