aproxy/scripts/ddos_protection_challenge.README.md

# DDoS Protection Challenge Module

A Cloudflare-style "Under Attack" mode for aproxy that protects your service from DDoS attacks, aggressive scraping, and automated bots.

## How It Works

This module implements a multi-layered defense system:

### 1. Challenge-Response System
When an unverified visitor (without a valid token) accesses your site, they see a security challenge page instead of the actual content. The visitor must click a "Verify I'm Human" button to prove they're not a bot.

### 2. Honeypot Detection
The challenge page includes a hidden link that's invisible to humans but may be discovered by automated scrapers and bots. If this link is accessed, the IP is immediately banned for the configured duration.

### 3. Token-Based Validation
Upon successfully completing the challenge, users receive a cookie with a cryptographic token. This token remains valid for the configured duration (default: 24 hours), so legitimate users don't have to solve challenges repeatedly.

### 4. IP Banning
IPs that trigger the honeypot are temporarily banned and cannot access your service. The ban duration is configurable.

## Why This Helps With DDoS/Scraping

- **Computational Cost**: Most DDoS attacks and scrapers make thousands of requests. Each request hitting your application has computational cost. This module intercepts requests before they reach your backend.
- **Bot Detection**: Automated tools often don't execute JavaScript or render pages properly. The challenge page requires interaction, filtering out most bots.
- **Honeypot Trap**: Scrapers that parse HTML for links will likely find and follow the honeypot link, getting themselves banned.
- **Rate Limiting Effect**: Even sophisticated bots that can solve the challenge have to do extra work, effectively rate-limiting them.

## Configuration

### Nginx Setup

**REQUIRED**: Add these shared dictionaries to your nginx/OpenResty configuration:

```nginx
http {
    # Shared dictionary for banned IPs
    lua_shared_dict aproxy_bans 10m;

    # Shared dictionary for valid tokens
    lua_shared_dict aproxy_tokens 10m;

    # ... rest of your config
}
```

### aproxy Configuration

Add to your `conf.lua`:

```lua
return {
    version = 1,
    wantedScripts = {
        ['ddos_protection_challenge'] = {
            ban_duration = 3600,        -- 1 hour ban for honeypot triggers
            token_duration = 86400,     -- 24 hour token validity
            cookie_name = 'aproxy_token',
            shared_dict_bans = 'aproxy_bans',
            shared_dict_tokens = 'aproxy_tokens',
            protected_paths = {         -- Optional: specific paths to protect
                '/api/.*',              -- Protect all API endpoints
                '/search',              -- Protect search endpoint
            },
        }
    }
}
```

**Protect Specific Paths Only**: By default, if `protected_paths` is not configured or is empty, the challenge applies to ALL requests. You can configure specific paths to protect expensive endpoints while leaving static assets unprotected:

```lua
-- Protect only expensive API endpoints
protected_paths = {'/api/.*', '/search'}

-- This allows static assets, images, etc. to pass through freely
-- while requiring challenge for costly operations
```

**Challenge Types**: Choose from three different challenge mechanisms:

```lua
-- Option 1: Simple button (default) - easiest for users
challenge_type = 'button'

-- Option 2: Multiple-choice question - better bot filtering
challenge_type = 'question'

-- Option 3: Proof-of-work - computational challenge, strongest protection
challenge_type = 'pow'
pow_difficulty = 4  -- Number of leading zeros (4 = ~1-3 seconds)
```

### Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `ban_duration` | number | 3600 | How long to ban IPs (in seconds) that trigger the honeypot |
| `token_duration` | number | 86400 | How long tokens remain valid after passing challenge (in seconds) |
| `cookie_name` | string | `aproxy_token` | Name of the validation cookie |
| `shared_dict_bans` | string | `aproxy_bans` | Name of nginx shared dict for banned IPs |
| `shared_dict_tokens` | string | `aproxy_tokens` | Name of nginx shared dict for valid tokens |
| `protected_paths` | list | `[]` (all paths) | List of PCRE regex patterns for paths to protect. If empty, all paths are protected |
| `challenge_type` | string | `button` | Type of challenge: `button`, `question`, or `pow` |
| `pow_difficulty` | number | 4 | Proof-of-work difficulty (leading zeros). Only used when `challenge_type` is `pow` |

## Special Endpoints

This module uses two special endpoints:

- `/__aproxy_challenge_verify` - Challenge form submission endpoint (POST)
- `/__aproxy_challenge_trap` - Honeypot link that bans IPs (GET)

⚠️ **Warning**: Don't create routes with these paths in your application.

## User Experience

### First Visit
1. User visits your site
2. Sees a security check page with a "Verify I'm Human" button
3. Clicks the button
4. Gets redirected to their original destination
5. Cookie is set for 24 hours (configurable)

### Subsequent Visits
- Users with valid cookies pass through immediately
- No challenge shown until cookie expires

### Bots/Scrapers
- Simple bots see the challenge page and likely fail to proceed
- HTML parsers might find and click the honeypot link → IP banned
- Sophisticated bots have to solve the challenge, slowing them down significantly

## Challenge Types

The module supports three different types of challenges, allowing you to experiment with different DDoS mitigation strategies:

### 1. Button Challenge (`challenge_type = 'button'`)

**How it works**: Users see a simple page with a "Verify I'm Human" button. Click the button to pass.

**Pros**:
- Easiest for legitimate users
- No friction for human visitors
- Fast (instant)

**Cons**:
- Can be bypassed by sophisticated bots that can interact with forms
- Minimal computational cost for attackers

**Best for**: General protection where UX is priority

```lua
challenge_type = 'button'
```

### 2. Question Challenge (`challenge_type = 'question'`)

**How it works**: Users must answer a simple multiple-choice question (e.g., "What is 7 + 5?", "How many days in a week?")

**Pros**:
- Harder for simple bots to bypass
- Still easy for humans
- Moderate filtering of automated tools

**Cons**:
- Requires human interaction
- Can be annoying if cookies expire frequently
- Sophisticated bots with NLP can solve these

**Best for**: Balancing security and UX, filtering out simple scrapers

```lua
challenge_type = 'question'
```

### 3. Proof-of-Work Challenge (`challenge_type = 'pow'`)

**How it works**: Client's browser must compute a SHA-256 hash with a specific number of leading zeros. JavaScript automatically solves this in the background.

**Pros**:
- Strong protection against volumetric attacks
- Requires actual computational cost from attacker
- Transparent to user (happens automatically in ~1-3 seconds)
- Bots must burn CPU time to access your site

**Cons**:
- Requires JavaScript enabled
- Uses client CPU (battery drain on mobile)
- Slower than other methods (configurable)
- Can be bypassed by distributed attackers (but at higher cost)

**Best for**: Sites under active attack, expensive endpoints, maximum protection

```lua
challenge_type = 'pow'
pow_difficulty = 4  -- Difficulty levels:
                     -- 3 = ~0.1 seconds (light)
                     -- 4 = ~1-3 seconds (moderate, default)
                     -- 5 = ~10-30 seconds (strong)
                     -- 6 = ~few minutes (very strong)
```

**How PoW difficulty works**: The `pow_difficulty` setting determines how many leading zeros the hash must have. Each additional zero makes the challenge ~16x harder:
- Difficulty 3: Client tries ~4,000 hashes (0.1s on modern device)
- Difficulty 4: Client tries ~65,000 hashes (1-3s)
- Difficulty 5: Client tries ~1,000,000 hashes (10-30s)

This creates real computational cost for attackers - a bot making 1000 requests/sec would need to spend 1000-3000 seconds of CPU time with difficulty 4.

**Security**: The server verifies the proof-of-work by computing `SHA-256(challenge + nonce)` and checking that it has the required leading zeros. Bots cannot bypass this by submitting random nonces.

## Path-Based Protection

You can configure the module to protect only specific paths, which is useful for:

- **Protecting expensive endpoints** while leaving static assets unrestricted
- **Selective protection** for API routes that cause high computational cost
- **Hybrid approach** where public pages are open but authenticated/search endpoints are protected

### Example Use Cases

**Protect only API endpoints:**
```lua
protected_paths = {'/api/.*'}
-- Static assets, homepage, etc. pass through freely
-- Only /api/* routes require the challenge
```

**Protect multiple expensive operations:**
```lua
protected_paths = {
    '/api/.*',              -- All API routes
    '/search',              -- Search endpoint
    '/.well-known/webfinger', -- Webfinger (can be DB-heavy)
}
```

**Protect everything (default):**
```lua
protected_paths = {}
-- OR simply omit the protected_paths config entirely
-- All requests require challenge verification
```

### Important Notes on Path Protection

1. **Special endpoints always work**: The challenge verification (`/__aproxy_challenge_verify`) and honeypot (`/__aproxy_challenge_trap`) endpoints always function regardless of `protected_paths` configuration.

2. **IP bans are path-specific**: If an IP is banned and tries to access an unprotected path, they can still access it. Bans only apply to protected paths. This is intentional - you probably don't want to prevent banned IPs from loading CSS/images.

3. **Token applies everywhere**: Once a user passes the challenge for a protected path, their token is valid for ALL protected paths. They don't need to solve the challenge separately for each path.

4. **Use PCRE regex**: Patterns are PCRE regular expressions, so you can use advanced patterns like `^/api/v[0-9]+/search$` for complex matching.

## Security Considerations

2. **Cookie Security**: Cookies are set with `HttpOnly` and `SameSite=Lax` flags for security. Consider adding `Secure` flag if you're running HTTPS only.

3. **Shared Dictionary Size**: Size the shared dictionaries appropriately:
   - Each banned IP takes ~100 bytes
   - Each token takes ~100 bytes
   - 10MB can store ~100,000 entries

4. **IP Address Source**: Uses `ngx.var.remote_addr`. If behind a proxy/load balancer, configure nginx to use the correct IP:
   ```nginx
   set_real_ip_from 10.0.0.0/8;  # Your proxy IP range
   real_ip_header X-Forwarded-For;
   ```