Crawler

The Pointer Crawler enables you to automatically gather and analyze content from your product, creating a comprehensive knowledge base for AI-powered features.

Prerequisites

Node.js version 20 or higher
Access to Pointer dashboard

Installation

Install the Pointer CLI globally using npm:

npm install -g pointer-cli

Verify the installation:

pointer --version

Authentication

Create an API key

Navigate to API Keys

Go to your Keys settings in the Pointer dashboard.

Generate new key

Click Create new key and provide:

Name: Descriptive identifier (e.g., “CLI Production”)
Description: Optional context about key usage
Expiration: Optional expiry date (defaults to never expire)

Copy your secret key

Save the generated key immediately - it won’t be shown again. Keys follow the format:

pt_sec_*****************************************

Configure authentication

Set your secret key using one of these methods:

export POINTER_SECRET_KEY="pt_sec_your_key_here"

Environment variables are recommended for security. Command-line options may expose keys in shell history.

Core workflow

Step 1: Initialize your website

Start by adding your website to the crawler configuration:

pointer init

The interactive prompt will guide you through:

Entering a friendly name for identification
Providing your website URL
Confirming the configuration

Step 2: Scrape your content

Begin the automated content collection:

pointer scrape

Select between Headless (fast) or Browser (with authentication) scraping modes.

The CLI saves your progress automatically. If interrupted, it will offer to resume from where it left off.

Step 3: Upload for analysis

Send your scraped content to Pointer for processing:

pointer upload

The CLI will:

Display a summary of collected data
Confirm the upload scope
Transfer content to your knowledge base

Command reference

Primary commands

Command	Description	Authentication
`pointer init`	Add a website to crawl	Required
`pointer scrape`	Collect content from configured websites	Required
`pointer upload`	Transfer scraped data to Pointer	Required
`pointer status`	Check crawl processing status	Required
`pointer list`	View local scraped data	Not required
`pointer cleanup`	Remove all local data	Not required
`pointer purge`	Delete server-side crawl data	Required

Global options

Available for all commands:

Option	Description
`-s, --secret-key <key>`	API secret key (overrides environment variable)
`-v, --version`	Display CLI version
`--help`	Show command help

Scraping options

Configure pointer scrape behavior:

Option	Description	Default
`--max-pages <number>`	Maximum pages to crawl	200
`--concurrency <number>`	Parallel page processing	1
`--fast`	Use fast crawl mode	Interactive prompt
`--no-pii-protection`	Disable PII detection	PII protection enabled
`--pii-sensitivity <level>`	Set detection level (low/medium/high)	Interactive prompt
`--exclude-routes <patterns>`	Comma-separated routes to exclude	None
`--include-routes <patterns>`	Comma-separated routes to include (whitelist mode)	None
`--bearer-token <token>`	Bearer token for API authentication	None
`--headers <json>`	Custom headers as JSON string	None
`--cookies-file <path>`	Path to cookies JSON file	None
`--browser-path <path>`	Path to custom Chrome executable	System default
`--log-level <level>`	Logging verbosity	info

Excluding routes

The --exclude-routes flag allows you to specify routes that should be excluded from scraping. This is useful for avoiding admin panels, API endpoints, or specific file types.

# Exclude a single route
pointer scrape --exclude-routes "/admin"

# Exclude multiple routes (comma-separated)
pointer scrape --exclude-routes "/admin,/api,/private"

# Use glob patterns to exclude multiple matching routes
pointer scrape --exclude-routes "/admin/*,/api/*,*.pdf"

Pattern types:

Exact match: /admin - excludes only the exact path
Wildcard patterns:
- /admin/* - excludes all paths starting with /admin/
- *.pdf - excludes all PDF files
- /api/*/docs - excludes paths like /api/v1/docs, /api/v2/docs

The exclusion check is performed on the URL path only (not the full URL). Patterns are case-sensitive, and the start URL cannot be excluded.

Including routes (whitelist mode)

The --include-routes flag allows you to specify which routes should be included in scraping. When used, ONLY matching routes will be scraped.

# Only scrape product pages
pointer scrape --include-routes "/products/*"

# Only scrape specific sections
pointer scrape --include-routes "/blog/*,/docs/*,/tutorials/*"

# Combine with exclude for fine-grained control
pointer scrape --include-routes "/api/*" --exclude-routes "/api/internal/*,*.pdf"

Include vs Exclude Logic:

If --include-routes is specified, a URL must match at least one include pattern to be scraped
If both --include-routes and --exclude-routes are specified:
1. URL must match an include pattern
2. URL must NOT match any exclude pattern

Authentication options

Bearer token authentication

Use for APIs that require bearer token authentication:

pointer scrape --bearer-token "sk-proj-abc123xyz789"

This adds the header: Authorization: Bearer sk-proj-abc123xyz789

Custom headers

Add any custom headers required by the target website:

# Single header
pointer scrape --headers '{"X-API-Key": "my-api-key"}'

# Multiple headers
pointer scrape --headers '{"X-API-Key": "key123", "X-Client-ID": "client456"}'

# Headers with authentication
pointer scrape --headers '{"Authorization": "Basic dXNlcjpwYXNz", "Accept": "application/json"}'

Cookies file

Load cookies from a JSON file for session-based authentication:

pointer scrape --cookies-file ./cookies.json

The cookies file should be in Playwright’s cookie format:

[
  {
    "name": "session_id",
    "value": "abc123xyz",
    "domain": ".example.com",
    "path": "/",
    "expires": -1,
    "httpOnly": true,
    "secure": true,
    "sameSite": "Lax"
  },
  {
    "name": "auth_token", 
    "value": "token456",
    "domain": ".example.com",
    "path": "/"
  }
]

Combined examples

Scraping a protected API documentation

pointer scrape \
  --include-routes "/api/v2/docs/*" \
  --exclude-routes "*.pdf,*.zip" \
  --bearer-token "your-api-token" \
  --headers '{"Accept": "text/html"}' \
  --max-pages 100

First, save your cookies after manual login:

# Use browser mode to login manually
pointer scrape --mode browser --save-session

Then use the saved cookies for subsequent scrapes:

pointer scrape \
  --include-routes "/products/*,/categories/*" \
  --exclude-routes "/products/*/reviews,/checkout/*" \
  --cookies-file ./scraped-data/.auth/yoursite.json \
  --concurrency 5

Scraping with multiple authentication methods

pointer scrape \
  --bearer-token "api-token-123" \
  --headers '{"X-Client-Version": "2.0", "Accept-Language": "en-US"}' \
  --include-routes "/api/*" \
  --log-level debug

Using a custom browser executable

# Use a specific Chrome installation
pointer scrape \
  --browser-path "/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary" \
  --include-routes "/app/*" \
  --max-pages 50

# Linux example with custom Chromium build
pointer scrape \
  --browser-path "/usr/bin/chromium-browser" \
  --cookies-file ./cookies.json \
  --log-level debug

Best practices

Access control with test accounts

Use interactive mode

Run commands without options for guided workflows:

pointer init     # Interactive website setup
pointer scrape   # Guided crawling configuration
pointer upload   # Selection-based upload

The CLI provides clear prompts and smart defaults for all operations.

Secure your credentials

Optimize crawling

Manage your data

Advanced scraping tips

Automation examples

While the CLI is designed for interactive use, automation is supported for CI/CD pipelines:

# Automated crawling with predetermined settings
pointer scrape --max-pages 100 --concurrency 5 --fast --no-pii-protection

# Direct status check for specific crawl
pointer status --crawl-id abc123 --pages

# Skip confirmations for scripted cleanup
pointer purge --crawl-id abc123 --force

Use automation options carefully. Interactive mode provides safety confirmations and validation that prevent common errors.

Troubleshooting

Authentication errors

If you encounter authentication issues:

Verify your API key is valid in the dashboard
Check environment variable is set correctly: echo $POINTER_SECRET_KEY
Ensure the key hasn’t expired
Confirm you have necessary permissions

Crawling interruptions

The crawler automatically saves progress. If interrupted:

pointer scrape
# Will prompt: "Resume from where it left off?"

Upload limitations

Maximum 500 pages per upload (API limit)
Large crawls are automatically truncated
Use --max-pages to control crawl size upfront

Next steps

After successfully crawling and uploading your content:

View your enriched knowledge base in the Knowledge section
Configure AI features to leverage the collected data
Monitor analytics to understand content usage
Set up regular crawls to keep knowledge current

Overview

Add to your:

Core AI features

Knowledge

Analytics

Customization

FAQs

Prerequisites

Installation

Authentication

Create an API key

Configure authentication

Core workflow

Step 1: Initialize your website

Step 2: Scrape your content

Step 3: Upload for analysis

Command reference

Primary commands

Global options

Scraping options

Excluding routes

Including routes (whitelist mode)

Authentication options

Bearer token authentication

Custom headers

Cookies file

Combined examples

Scraping a protected API documentation

Scraping with multiple authentication methods

Using a custom browser executable

Best practices

Automation examples

Troubleshooting

Authentication errors

Crawling interruptions

Upload limitations

Next steps

Overview

Add to your:

Core AI features

Knowledge

Analytics

Customization

FAQs

​Prerequisites

​Installation

​Authentication

​Create an API key

​Configure authentication

​Core workflow

​Step 1: Initialize your website

​Step 2: Scrape your content

​Step 3: Upload for analysis

​Command reference

​Primary commands

​Global options

​Scraping options

​Excluding routes

​Including routes (whitelist mode)

​Authentication options

​Bearer token authentication

​Custom headers

​Cookies file

​Combined examples

​Scraping a protected API documentation

​Scraping an e-commerce site with login

​Scraping with multiple authentication methods

​Using a custom browser executable

​Best practices

​Automation examples

​Troubleshooting

​Authentication errors

​Crawling interruptions

​Upload limitations

​Next steps

Prerequisites

Installation

Authentication

Create an API key

Configure authentication

Core workflow

Step 1: Initialize your website

Step 2: Scrape your content

Step 3: Upload for analysis

Command reference

Primary commands

Global options

Scraping options

Excluding routes

Including routes (whitelist mode)

Authentication options

Bearer token authentication

Custom headers

Cookies file

Combined examples

Scraping a protected API documentation

Scraping an e-commerce site with login

Scraping with multiple authentication methods

Using a custom browser executable

Best practices

Automation examples

Troubleshooting

Authentication errors

Crawling interruptions

Upload limitations

Next steps