Crawl Settings

SpiderPro offers extensive configuration options to customize your crawl.

Basic Settings

Max Pages

Maximum number of pages to crawl.

Plan	Limit
Free	500 pages
Pro	100,000 pages

User Agent

The browser identity sent to servers. Options:

Googlebot - Simulate Google crawler
Chrome - Standard browser (default)
Custom - Define your own

Concurrent Requests

Number of simultaneous connections (1-10).

WARNING

High values may trigger rate limiting. Start with 3-5.

Advanced Settings

JavaScript Rendering

Enable full JavaScript execution using Chromium.

When to enable:

React, Vue, Angular sites
Single Page Applications (SPA)
Dynamic content loading

When to disable:

Static HTML sites
Speed is priority
Server-side rendered sites

Custom Headers

Add custom HTTP headers:

json

{
  "Authorization": "Bearer token123",
  "Accept-Language": "en-US"
}

URL Filters

Include Patterns

Only crawl URLs matching these patterns:

/blog/*
/products/*

Exclude Patterns

Skip URLs matching these patterns:

/admin/*
/api/*
*.pdf

Extraction Settings

CSS Selectors

Extract content using CSS selectors:

css

h1, .product-title, meta[name="description"]

XPath

Extract using XPath expressions:

xpath

//div[@class="price"]/text()

Regex

Extract using regular expressions:

regex

\$[\d,]+\.?\d*

Performance Tips

Start small - Test with 100 pages first
Use filters - Exclude irrelevant sections
Disable JS - When not needed
Schedule wisely - Crawl during low-traffic hours

Crawl Settings ​

Basic Settings ​

Max Pages ​

User Agent ​

Concurrent Requests ​

Advanced Settings ​

JavaScript Rendering ​

Custom Headers ​

URL Filters ​

Include Patterns ​

Exclude Patterns ​

Extraction Settings ​

CSS Selectors ​

XPath ​

Regex ​

Performance Tips ​