Crawl Settings
SpiderPro offers extensive configuration options to customize your crawl.
Basic Settings
Max Pages
Maximum number of pages to crawl.
| Plan | Limit |
|---|---|
| Free | 500 pages |
| Pro | 100,000 pages |
User Agent
The browser identity sent to servers. Options:
- Googlebot - Simulate Google crawler
- Chrome - Standard browser (default)
- Custom - Define your own
Concurrent Requests
Number of simultaneous connections (1-10).
WARNING
High values may trigger rate limiting. Start with 3-5.
Advanced Settings
JavaScript Rendering
Enable full JavaScript execution using Chromium.
When to enable:
- React, Vue, Angular sites
- Single Page Applications (SPA)
- Dynamic content loading
When to disable:
- Static HTML sites
- Speed is priority
- Server-side rendered sites
Custom Headers
Add custom HTTP headers:
json
{
"Authorization": "Bearer token123",
"Accept-Language": "en-US"
}URL Filters
Include Patterns
Only crawl URLs matching these patterns:
/blog/*
/products/*Exclude Patterns
Skip URLs matching these patterns:
/admin/*
/api/*
*.pdfExtraction Settings
CSS Selectors
Extract content using CSS selectors:
css
h1, .product-title, meta[name="description"]XPath
Extract using XPath expressions:
xpath
//div[@class="price"]/text()Regex
Extract using regular expressions:
regex
\$[\d,]+\.?\d*Performance Tips
- Start small - Test with 100 pages first
- Use filters - Exclude irrelevant sections
- Disable JS - When not needed
- Schedule wisely - Crawl during low-traffic hours