aproxy/scripts/ddos_protection_challenge.README.md
2025-11-23 17:48:57 -03:00

268 lines
10 KiB
Markdown

# DDoS Protection Challenge Module
A Cloudflare-style "Under Attack" mode for aproxy that protects your service from DDoS attacks, aggressive scraping, and automated bots.
## How It Works
This module implements a multi-layered defense system:
### 1. Challenge-Response System
When an unverified visitor (without a valid token) accesses your site, they see a security challenge page instead of the actual content. The visitor must click a "Verify I'm Human" button to prove they're not a bot.
### 2. Honeypot Detection
The challenge page includes a hidden link that's invisible to humans but may be discovered by automated scrapers and bots. If this link is accessed, the IP is immediately banned for the configured duration.
### 3. Token-Based Validation
Upon successfully completing the challenge, users receive a cookie with a cryptographic token. This token remains valid for the configured duration (default: 24 hours), so legitimate users don't have to solve challenges repeatedly.
### 4. IP Banning
IPs that trigger the honeypot are temporarily banned and cannot access your service. The ban duration is configurable.
## Why This Helps With DDoS/Scraping
- **Computational Cost**: Most DDoS attacks and scrapers make thousands of requests. Each request hitting your application has computational cost. This module intercepts requests before they reach your backend.
- **Bot Detection**: Automated tools often don't execute JavaScript or render pages properly. The challenge page requires interaction, filtering out most bots.
- **Honeypot Trap**: Scrapers that parse HTML for links will likely find and follow the honeypot link, getting themselves banned.
- **Rate Limiting Effect**: Even sophisticated bots that can solve the challenge have to do extra work, effectively rate-limiting them.
## Configuration
### Nginx Setup
**REQUIRED**: Add these shared dictionaries to your nginx/OpenResty configuration:
```nginx
http {
# Shared dictionary for banned IPs
lua_shared_dict aproxy_bans 10m;
# Shared dictionary for valid tokens
lua_shared_dict aproxy_tokens 10m;
# ... rest of your config
}
```
### aproxy Configuration
Add to your `conf.lua`:
```lua
return {
version = 1,
wantedScripts = {
['ddos_protection_challenge'] = {
ban_duration = 3600, -- 1 hour ban for honeypot triggers
token_duration = 86400, -- 24 hour token validity
cookie_name = 'aproxy_token',
shared_dict_bans = 'aproxy_bans',
shared_dict_tokens = 'aproxy_tokens',
protected_paths = { -- Optional: specific paths to protect
'/api/.*', -- Protect all API endpoints
'/search', -- Protect search endpoint
},
}
}
}
```
**Protect Specific Paths Only**: By default, if `protected_paths` is not configured or is empty, the challenge applies to ALL requests. You can configure specific paths to protect expensive endpoints while leaving static assets unprotected:
```lua
-- Protect only expensive API endpoints
protected_paths = {'/api/.*', '/search'}
-- This allows static assets, images, etc. to pass through freely
-- while requiring challenge for costly operations
```
**Challenge Types**: Choose from three different challenge mechanisms:
```lua
-- Option 1: Simple button (default) - easiest for users
challenge_type = 'button'
-- Option 2: Multiple-choice question - better bot filtering
challenge_type = 'question'
-- Option 3: Proof-of-work - computational challenge, strongest protection
challenge_type = 'pow'
pow_difficulty = 4 -- Number of leading zeros (4 = ~1-3 seconds)
```
### Configuration Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `ban_duration` | number | 3600 | How long to ban IPs (in seconds) that trigger the honeypot |
| `token_duration` | number | 86400 | How long tokens remain valid after passing challenge (in seconds) |
| `cookie_name` | string | `aproxy_token` | Name of the validation cookie |
| `shared_dict_bans` | string | `aproxy_bans` | Name of nginx shared dict for banned IPs |
| `shared_dict_tokens` | string | `aproxy_tokens` | Name of nginx shared dict for valid tokens |
| `protected_paths` | list | `[]` (all paths) | List of PCRE regex patterns for paths to protect. If empty, all paths are protected |
| `challenge_type` | string | `button` | Type of challenge: `button`, `question`, or `pow` |
| `pow_difficulty` | number | 4 | Proof-of-work difficulty (leading zeros). Only used when `challenge_type` is `pow` |
## Special Endpoints
This module uses two special endpoints:
- `/__aproxy_challenge_verify` - Challenge form submission endpoint (POST)
- `/__aproxy_challenge_trap` - Honeypot link that bans IPs (GET)
⚠️ **Warning**: Don't create routes with these paths in your application.
## User Experience
### First Visit
1. User visits your site
2. Sees a security check page with a "Verify I'm Human" button
3. Clicks the button
4. Gets redirected to their original destination
5. Cookie is set for 24 hours (configurable)
### Subsequent Visits
- Users with valid cookies pass through immediately
- No challenge shown until cookie expires
### Bots/Scrapers
- Simple bots see the challenge page and likely fail to proceed
- HTML parsers might find and click the honeypot link → IP banned
- Sophisticated bots have to solve the challenge, slowing them down significantly
## Challenge Types
The module supports three different types of challenges, allowing you to experiment with different DDoS mitigation strategies:
### 1. Button Challenge (`challenge_type = 'button'`)
**How it works**: Users see a simple page with a "Verify I'm Human" button. Click the button to pass.
**Pros**:
- Easiest for legitimate users
- No friction for human visitors
- Fast (instant)
**Cons**:
- Can be bypassed by sophisticated bots that can interact with forms
- Minimal computational cost for attackers
**Best for**: General protection where UX is priority
```lua
challenge_type = 'button'
```
### 2. Question Challenge (`challenge_type = 'question'`)
**How it works**: Users must answer a simple multiple-choice question (e.g., "What is 7 + 5?", "How many days in a week?")
**Pros**:
- Harder for simple bots to bypass
- Still easy for humans
- Moderate filtering of automated tools
**Cons**:
- Requires human interaction
- Can be annoying if cookies expire frequently
- Sophisticated bots with NLP can solve these
**Best for**: Balancing security and UX, filtering out simple scrapers
```lua
challenge_type = 'question'
```
### 3. Proof-of-Work Challenge (`challenge_type = 'pow'`)
**How it works**: Client's browser must compute a SHA-256 hash with a specific number of leading zeros. JavaScript automatically solves this in the background.
**Pros**:
- Strong protection against volumetric attacks
- Requires actual computational cost from attacker
- Transparent to user (happens automatically in ~1-3 seconds)
- Bots must burn CPU time to access your site
**Cons**:
- Requires JavaScript enabled
- Uses client CPU (battery drain on mobile)
- Slower than other methods (configurable)
- Can be bypassed by distributed attackers (but at higher cost)
**Best for**: Sites under active attack, expensive endpoints, maximum protection
```lua
challenge_type = 'pow'
pow_difficulty = 4 -- Difficulty levels:
-- 3 = ~0.1 seconds (light)
-- 4 = ~1-3 seconds (moderate, default)
-- 5 = ~10-30 seconds (strong)
-- 6 = ~few minutes (very strong)
```
**How PoW difficulty works**: The `pow_difficulty` setting determines how many leading zeros the hash must have. Each additional zero makes the challenge ~16x harder:
- Difficulty 3: Client tries ~4,000 hashes (0.1s on modern device)
- Difficulty 4: Client tries ~65,000 hashes (1-3s)
- Difficulty 5: Client tries ~1,000,000 hashes (10-30s)
This creates real computational cost for attackers - a bot making 1000 requests/sec would need to spend 1000-3000 seconds of CPU time with difficulty 4.
**Security**: The server verifies the proof-of-work by computing `SHA-256(challenge + nonce)` and checking that it has the required leading zeros. Bots cannot bypass this by submitting random nonces.
## Path-Based Protection
You can configure the module to protect only specific paths, which is useful for:
- **Protecting expensive endpoints** while leaving static assets unrestricted
- **Selective protection** for API routes that cause high computational cost
- **Hybrid approach** where public pages are open but authenticated/search endpoints are protected
### Example Use Cases
**Protect only API endpoints:**
```lua
protected_paths = {'/api/.*'}
-- Static assets, homepage, etc. pass through freely
-- Only /api/* routes require the challenge
```
**Protect multiple expensive operations:**
```lua
protected_paths = {
'/api/.*', -- All API routes
'/search', -- Search endpoint
'/.well-known/webfinger', -- Webfinger (can be DB-heavy)
}
```
**Protect everything (default):**
```lua
protected_paths = {}
-- OR simply omit the protected_paths config entirely
-- All requests require challenge verification
```
### Important Notes on Path Protection
1. **Special endpoints always work**: The challenge verification (`/__aproxy_challenge_verify`) and honeypot (`/__aproxy_challenge_trap`) endpoints always function regardless of `protected_paths` configuration.
2. **IP bans are path-specific**: If an IP is banned and tries to access an unprotected path, they can still access it. Bans only apply to protected paths. This is intentional - you probably don't want to prevent banned IPs from loading CSS/images.
3. **Token applies everywhere**: Once a user passes the challenge for a protected path, their token is valid for ALL protected paths. They don't need to solve the challenge separately for each path.
4. **Use PCRE regex**: Patterns are PCRE regular expressions, so you can use advanced patterns like `^/api/v[0-9]+/search$` for complex matching.
## Security Considerations
2. **Cookie Security**: Cookies are set with `HttpOnly` and `SameSite=Lax` flags for security. Consider adding `Secure` flag if you're running HTTPS only.
3. **Shared Dictionary Size**: Size the shared dictionaries appropriately:
- Each banned IP takes ~100 bytes
- Each token takes ~100 bytes
- 10MB can store ~100,000 entries
4. **IP Address Source**: Uses `ngx.var.remote_addr`. If behind a proxy/load balancer, configure nginx to use the correct IP:
```nginx
set_real_ip_from 10.0.0.0/8; # Your proxy IP range
real_ip_header X-Forwarded-For;
```