Built-in Data Protection
Overview
Webrix provides built-in data protection guardrails powered by advanced pattern detection and machine learning models. These guardrails help you automatically detect and protect sensitive data, prevent prompt injection attacks, and maintain security compliance - all without requiring external API keys or third-party services.
Key Features
The built-in guardrails system provides comprehensive protection across multiple categories:
- Prompt Safety: Detect and block prompt injection and system prompt extraction attempts
- Secrets Protection: Automatically identify and mask or block API keys, passwords, and authentication tokens
- PII Detection: Recognize and protect personally identifiable information like emails and phone numbers
- Financial Data: Detect and secure credit card numbers and other financial information
- Government IDs: Protect sensitive government-issued identifiers like Social Security Numbers
- Network Information: Identify and mask IP addresses and other network infrastructure data
Getting Started
Accessing Guardrails Settings
- Log in to your Webrix admin panel
- Navigate to Settings → Guardrails & Data Protection
- Toggle on Enable Data Protection to activate the guardrails system
All settings are applied in real-time and do not require a restart of your services.
Configuration Guide
Enabling Data Protection
The master toggle at the top of the page enables or disables all guardrails. When disabled, no data protection rules are applied.
Prompt Safety
Prompt Safety guardrails use pattern matching and heuristic analysis to detect malicious attempts to compromise your AI system.
Prompt Injection Protection
Detects attempts to inject malicious instructions that could override system behavior.
- Status: Disabled by default
- Default Threshold: 0.7
- Action: Blocks requests when confidence exceeds threshold
- Use Cases:
- Prevent users from overriding system instructions
- Block attempts to extract sensitive prompts
- Protect against jailbreak attempts
Configuration:
Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)
System Prompt Extraction
Detects attempts to extract or leak system prompts and internal instructions.
- Status: Disabled by default
- Default Threshold: 0.7
- Action: Blocks requests when confidence exceeds threshold
- Use Cases:
- Protect proprietary system instructions
- Prevent exposure of internal logic
- Block prompt leakage attempts
Configuration:
Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)
Secrets Protection
Automatically detects API keys, passwords, authentication tokens, and other credentials using regex patterns and entropy analysis.
API Keys, Passwords and Secrets
- Status: Enabled by default
- Default Mode: Mask
- Detection Methods:
- Pattern matching for common API key formats (AWS, GitHub, Stripe, etc.)
- High-entropy string detection
- Bearer token identification
- JWT detection
Modes:
- Mask: Replaces detected secrets with
[REDACTED-SECRET]tokens - Block: Rejects the entire request when secrets are detected
Use Cases:
- Prevent accidental credential exposure in logs
- Protect API keys from being leaked in tool outputs
- Maintain compliance with security policies
Personal Identity (PII Guard)
Detects personally identifiable information using regex patterns and format validation.
Email Addresses
- Status: Disabled by default
- Default Mode: Mask
- Detection: Standard RFC 5322 email format
- Replacement:
[REDACTED-EMAIL]
Use Cases:
- GDPR compliance
- Protect user privacy in logs
- Prevent email harvesting
Phone Numbers
- Status: Disabled by default
- Default Mode: Mask
- Detection: US, UK, and international formats (E.164)
- Replacement:
[REDACTED-PHONE] - Supported Formats:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- +1-555-123-4567
Use Cases:
- Protect customer contact information
- TCPA compliance
- Privacy law adherence
Financial Information
Detects and protects financial data to maintain PCI DSS compliance.
Credit Card Numbers
- Status: Disabled by default
- Default Mode: Mask
- Detection: Luhn algorithm validation for major card networks
- Replacement:
[REDACTED-CREDIT-CARD] - Supported Cards:
- Visa
- Mastercard
- American Express
- Discover
Use Cases:
- PCI DSS compliance
- Prevent card data exposure
- Protect against data breaches
Government IDs
Protects government-issued identification numbers.
Social Security Numbers
- Status: Enabled by default
- Default Mode: Mask
- Detection: SSN format validation (XXX-XX-XXXX)
- Replacement:
[REDACTED-SSN] - Supported Formats:
- 123-45-6789
- 123456789
- 123.45.6789
Use Cases:
- HIPAA compliance
- Prevent identity theft
- Protect sensitive personal data
Network & Infrastructure
Detects network-related information that could pose security risks.
IP Addresses
- Status: Disabled by default
- Default Mode: Mask
- Detection: IPv4 address format
- Replacement:
[REDACTED-IP-ADDRESS]
Use Cases:
- Protect internal network topology
- Prevent infrastructure reconnaissance
- Security through obscurity
How It Works
Detection Flow
- Input Validation: When a tool is called, input parameters are analyzed by enabled guardrails
- Pattern Matching: Content is checked against regex patterns and ML models
- Confidence Scoring: For threshold-based guards, a confidence score is calculated
- Action Execution: Based on the mode (mask/block) and threshold, appropriate action is taken
- Output Validation: Tool responses are also validated before being returned
Mask Mode vs Block Mode
Mask Mode (Transform):
- Replaces sensitive data with redaction tokens
- Request continues processing with masked data
- Useful for logging and debugging while maintaining privacy
Block Mode:
- Immediately rejects the request
- Returns an error to the user
- Useful for strict compliance requirements
Threshold Configuration
For Prompt Safety guards, the threshold determines sensitivity:
- 0.0 - 0.4: Low sensitivity - Only blocks obvious attacks
- 0.5 - 0.7: Medium sensitivity - Balanced detection (recommended)
- 0.8 - 1.0: High sensitivity - May produce false positives
Best Practices
Initial Configuration
- Start Conservative: Enable with mask mode first to understand impact
- Review Logs: Monitor what gets detected before switching to block mode
- Tune Thresholds: Adjust Prompt Safety thresholds based on false positive rates
- Gradual Rollout: Enable categories one at a time in production
Performance Optimization
- Enable Only What You Need: Disable unused categories to reduce processing overhead
- Use Mask Mode: Mask mode is faster than block mode for most use cases
- Monitor Metrics: Track detection rates and processing times
Compliance
- Document Policies: Clearly document which guardrails are enabled and why
- Regular Audits: Review guardrail logs periodically
- Update Patterns: Stay current with new data protection regulations
- Test Regularly: Verify guardrails work as expected with test data
Security
- Defense in Depth: Use guardrails as one layer of security, not the only one
- Combine Protections: Use built-in guardrails alongside Active Fence or custom webhooks
- Monitor Bypasses: Watch for attempts to bypass guardrails
- Update Regularly: Keep Webrix updated to get latest detection patterns
Advanced Scenarios
Combining Multiple Guardrails
You can enable multiple guardrail providers simultaneously:
✓ Built-in Guardrails (Data Protection)
✓ Active Fence (Content Safety)
✓ Custom Webhook (Business Logic)
All enabled guardrails run in sequence. If any guardrail blocks a request, processing stops immediately.
Custom Detection Patterns
For custom detection requirements, consider using the Custom Webhook integration alongside built-in guardrails.
Troubleshooting
False Positives
Problem: Legitimate content is being blocked or masked
Solutions:
- Lower the threshold for Prompt Safety guards
- Switch from block mode to mask mode
- Review detection logs to identify patterns
- Consider disabling specific categories that aren't needed
False Negatives
Problem: Sensitive data is not being detected
Solutions:
- Increase threshold for Prompt Safety guards
- Verify the data format matches detection patterns
- Check that the appropriate guard is enabled
- Review logs to confirm guardrails are running
Performance Impact
Problem: Guardrails are slowing down requests
Solutions:
- Disable unused categories
- Profile which guards are taking the most time
- Consider using mask mode instead of block mode
- Ensure you're running the latest Webrix version
Monitoring and Observability
Logging
Guardrail events are automatically logged with:
- Detection type (injection, PII, secrets, etc.)
- Action taken (allow, block, transform)
- Timestamp and request context
View guardrail logs in:
- Admin panel → Users → Activity Logs
- External log aggregators (if configured)
Metrics to Monitor
- Detection Rate: Percentage of requests triggering guardrails
- Block Rate: Percentage of blocked requests
- False Positive Rate: Legitimate requests incorrectly blocked
- Performance Impact: Average latency added by guardrails
FAQ
Do built-in guardrails require external API calls?
No, all detection runs locally on your Webrix infrastructure. No data is sent to external services.
Can I use built-in guardrails with Active Fence?
Yes, you can enable both simultaneously. They work in sequence - built-in guardrails run first, then Active Fence.
What happens to performance with all guardrails enabled?
Typical overhead is 50-200ms per request depending on content size and number of enabled guards.
Can I customize detection patterns?
Detection patterns are built into Webrix. For custom patterns, use the Custom Webhook integration.
Are guardrails applied to all requests?
Yes, when enabled, guardrails apply to all tool calls through your MCP gateway.
Can I test guardrails without blocking real users?
Yes, start with mask mode only to see what would be detected without blocking requests.
What data is included in guardrail logs?
Logs include detection type, action taken, and request metadata. Sensitive data itself is never logged.
Do guardrails work in observe-only mode?
For observe-only mode, use mask mode - it transforms data without blocking requests.