Skip to content

Crawl Settings

SpiderPro offers extensive configuration options to customize your crawl.

Basic Settings

Max Pages

Maximum number of pages to crawl.

PlanLimit
Free500 pages
Pro100,000 pages

User Agent

The browser identity sent to servers. Options:

  • Googlebot - Simulate Google crawler
  • Chrome - Standard browser (default)
  • Custom - Define your own

Concurrent Requests

Number of simultaneous connections (1-10).

WARNING

High values may trigger rate limiting. Start with 3-5.

Advanced Settings

JavaScript Rendering

Enable full JavaScript execution using Chromium.

When to enable:

  • React, Vue, Angular sites
  • Single Page Applications (SPA)
  • Dynamic content loading

When to disable:

  • Static HTML sites
  • Speed is priority
  • Server-side rendered sites

Custom Headers

Add custom HTTP headers:

json
{
  "Authorization": "Bearer token123",
  "Accept-Language": "en-US"
}

URL Filters

Include Patterns

Only crawl URLs matching these patterns:

/blog/*
/products/*

Exclude Patterns

Skip URLs matching these patterns:

/admin/*
/api/*
*.pdf

Extraction Settings

CSS Selectors

Extract content using CSS selectors:

css
h1, .product-title, meta[name="description"]

XPath

Extract using XPath expressions:

xpath
//div[@class="price"]/text()

Regex

Extract using regular expressions:

regex
\$[\d,]+\.?\d*

Performance Tips

  1. Start small - Test with 100 pages first
  2. Use filters - Exclude irrelevant sections
  3. Disable JS - When not needed
  4. Schedule wisely - Crawl during low-traffic hours

SpiderPro - Le crawler SEO professionnel