The Pointer Crawler enables you to automatically gather and analyze content from your product, creating a comprehensive knowledge base for AI-powered features.

Prerequisites

Installation

Install the Pointer CLI globally using npm:
npm install -g pointer-cli
Verify the installation:
pointer --version

Authentication

Create an API key

1

Navigate to API Keys

Go to your Keys settings in the Pointer dashboard.
2

Generate new key

Click Create new key and provide:
  • Name: Descriptive identifier (e.g., “CLI Production”)
  • Description: Optional context about key usage
  • Expiration: Optional expiry date (defaults to never expire)
3

Copy your secret key

Save the generated key immediately - it won’t be shown again. Keys follow the format:
pt_sec_*****************************************

Configure authentication

Set your secret key using one of these methods:
export POINTER_SECRET_KEY="pt_sec_your_key_here"
Environment variables are recommended for security. Command-line options may expose keys in shell history.

Core workflow

Step 1: Initialize your website

Start by adding your website to the crawler configuration:
pointer init
The interactive prompt will guide you through:
  1. Entering a friendly name for identification
  2. Providing your website URL
  3. Confirming the configuration

Step 2: Scrape your content

Begin the automated content collection:
pointer scrape
Select between Headless (fast) or Browser (with authentication) scraping modes.
The CLI saves your progress automatically. If interrupted, it will offer to resume from where it left off.

Step 3: Upload for analysis

Send your scraped content to Pointer for processing:
pointer upload
The CLI will:
  1. Display a summary of collected data
  2. Confirm the upload scope
  3. Transfer content to your knowledge base

Command reference

Primary commands

CommandDescriptionAuthentication
pointer initAdd a website to crawlRequired
pointer scrapeCollect content from configured websitesRequired
pointer uploadTransfer scraped data to PointerRequired
pointer statusCheck crawl processing statusRequired
pointer listView local scraped dataNot required
pointer cleanupRemove all local dataNot required
pointer purgeDelete server-side crawl dataRequired

Global options

Available for all commands:
OptionDescription
-s, --secret-key <key>API secret key (overrides environment variable)
-v, --versionDisplay CLI version
--helpShow command help

Scraping options

Configure pointer scrape behavior:
OptionDescriptionDefault
--max-pages <number>Maximum pages to crawl200
--concurrency <number>Parallel page processing1
--fastUse fast crawl modeInteractive prompt
--no-pii-protectionDisable PII detectionPII protection enabled
--pii-sensitivity <level>Set detection level (low/medium/high)Interactive prompt
--exclude-routes <patterns>Comma-separated routes to excludeNone
--include-routes <patterns>Comma-separated routes to include (whitelist mode)None
--bearer-token <token>Bearer token for API authenticationNone
--headers <json>Custom headers as JSON stringNone
--cookies-file <path>Path to cookies JSON fileNone
--browser-path <path>Path to custom Chrome executableSystem default
--log-level <level>Logging verbosityinfo

Excluding routes

The --exclude-routes flag allows you to specify routes that should be excluded from scraping. This is useful for avoiding admin panels, API endpoints, or specific file types.
# Exclude a single route
pointer scrape --exclude-routes "/admin"

# Exclude multiple routes (comma-separated)
pointer scrape --exclude-routes "/admin,/api,/private"

# Use glob patterns to exclude multiple matching routes
pointer scrape --exclude-routes "/admin/*,/api/*,*.pdf"
Pattern types:
  • Exact match: /admin - excludes only the exact path
  • Wildcard patterns:
    • /admin/* - excludes all paths starting with /admin/
    • *.pdf - excludes all PDF files
    • /api/*/docs - excludes paths like /api/v1/docs, /api/v2/docs
The exclusion check is performed on the URL path only (not the full URL). Patterns are case-sensitive, and the start URL cannot be excluded.

Including routes (whitelist mode)

The --include-routes flag allows you to specify which routes should be included in scraping. When used, ONLY matching routes will be scraped.
# Only scrape product pages
pointer scrape --include-routes "/products/*"

# Only scrape specific sections
pointer scrape --include-routes "/blog/*,/docs/*,/tutorials/*"

# Combine with exclude for fine-grained control
pointer scrape --include-routes "/api/*" --exclude-routes "/api/internal/*,*.pdf"
Include vs Exclude Logic:
  • If --include-routes is specified, a URL must match at least one include pattern to be scraped
  • If both --include-routes and --exclude-routes are specified:
    1. URL must match an include pattern
    2. URL must NOT match any exclude pattern

Authentication options

Bearer token authentication

Use for APIs that require bearer token authentication:
pointer scrape --bearer-token "sk-proj-abc123xyz789"
This adds the header: Authorization: Bearer sk-proj-abc123xyz789

Custom headers

Add any custom headers required by the target website:
# Single header
pointer scrape --headers '{"X-API-Key": "my-api-key"}'

# Multiple headers
pointer scrape --headers '{"X-API-Key": "key123", "X-Client-ID": "client456"}'

# Headers with authentication
pointer scrape --headers '{"Authorization": "Basic dXNlcjpwYXNz", "Accept": "application/json"}'

Cookies file

Load cookies from a JSON file for session-based authentication:
pointer scrape --cookies-file ./cookies.json
The cookies file should be in Playwright’s cookie format:
[
  {
    "name": "session_id",
    "value": "abc123xyz",
    "domain": ".example.com",
    "path": "/",
    "expires": -1,
    "httpOnly": true,
    "secure": true,
    "sameSite": "Lax"
  },
  {
    "name": "auth_token", 
    "value": "token456",
    "domain": ".example.com",
    "path": "/"
  }
]

Combined examples

Scraping a protected API documentation

pointer scrape \
  --include-routes "/api/v2/docs/*" \
  --exclude-routes "*.pdf,*.zip" \
  --bearer-token "your-api-token" \
  --headers '{"Accept": "text/html"}' \
  --max-pages 100

Scraping an e-commerce site with login

  1. First, save your cookies after manual login:
# Use browser mode to login manually
pointer scrape --mode browser --save-session
  1. Then use the saved cookies for subsequent scrapes:
pointer scrape \
  --include-routes "/products/*,/categories/*" \
  --exclude-routes "/products/*/reviews,/checkout/*" \
  --cookies-file ./scraped-data/.auth/yoursite.json \
  --concurrency 5

Scraping with multiple authentication methods

pointer scrape \
  --bearer-token "api-token-123" \
  --headers '{"X-Client-Version": "2.0", "Accept-Language": "en-US"}' \
  --include-routes "/api/*" \
  --log-level debug

Using a custom browser executable

# Use a specific Chrome installation
pointer scrape \
  --browser-path "/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary" \
  --include-routes "/app/*" \
  --max-pages 50

# Linux example with custom Chromium build
pointer scrape \
  --browser-path "/usr/bin/chromium-browser" \
  --cookies-file ./cookies.json \
  --log-level debug

Best practices

Automation examples

While the CLI is designed for interactive use, automation is supported for CI/CD pipelines:
# Automated crawling with predetermined settings
pointer scrape --max-pages 100 --concurrency 5 --fast --no-pii-protection

# Direct status check for specific crawl
pointer status --crawl-id abc123 --pages

# Skip confirmations for scripted cleanup
pointer purge --crawl-id abc123 --force
Use automation options carefully. Interactive mode provides safety confirmations and validation that prevent common errors.

Troubleshooting

Authentication errors

If you encounter authentication issues:
  1. Verify your API key is valid in the dashboard
  2. Check environment variable is set correctly: echo $POINTER_SECRET_KEY
  3. Ensure the key hasn’t expired
  4. Confirm you have necessary permissions

Crawling interruptions

The crawler automatically saves progress. If interrupted:
pointer scrape
# Will prompt: "Resume from where it left off?"

Upload limitations

  • Maximum 500 pages per upload (API limit)
  • Large crawls are automatically truncated
  • Use --max-pages to control crawl size upfront

Next steps

After successfully crawling and uploading your content:
  1. View your enriched knowledge base in the Knowledge section
  2. Configure AI features to leverage the collected data
  3. Monitor analytics to understand content usage
  4. Set up regular crawls to keep knowledge current