Built-in Data Protection

Overview

Webrix provides built-in data protection guardrails powered by advanced pattern detection and machine learning models. These guardrails help you automatically detect and protect sensitive data, prevent prompt injection attacks, and maintain security compliance - all without requiring external API keys or third-party services.

Key Features

The built-in guardrails system provides comprehensive protection across multiple categories:

Prompt Safety: Detect and block prompt injection and system prompt extraction attempts
Secrets Protection: Automatically identify and mask or block API keys, passwords, and authentication tokens
PII Detection: Recognize and protect personally identifiable information like emails and phone numbers
Financial Data: Detect and secure credit card numbers and other financial information
Government IDs: Protect sensitive government-issued identifiers like Social Security Numbers
Network Information: Identify and mask IP addresses and other network infrastructure data

Getting Started

Accessing Guardrails Settings

Log in to your Webrix admin panel
Navigate to Settings → Guardrails & Data Protection
Toggle on Enable Data Protection to activate the guardrails system

All settings are applied in real-time and do not require a restart of your services.

Configuration Guide

Enabling Data Protection

The master toggle at the top of the page enables or disables all guardrails. When disabled, no data protection rules are applied.

Prompt Safety

Prompt Safety guardrails use pattern matching and heuristic analysis to detect malicious attempts to compromise your AI system.

Prompt Injection Protection

Detects attempts to inject malicious instructions that could override system behavior.

Status: Disabled by default
Default Threshold: 0.7
Action: Blocks requests when confidence exceeds threshold
Use Cases:
- Prevent users from overriding system instructions
- Block attempts to extract sensitive prompts
- Protect against jailbreak attempts

Configuration:

Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)

System Prompt Extraction

Detects attempts to extract or leak system prompts and internal instructions.

Status: Disabled by default
Default Threshold: 0.7
Action: Blocks requests when confidence exceeds threshold
Use Cases:
- Protect proprietary system instructions
- Prevent exposure of internal logic
- Block prompt leakage attempts

Configuration:

Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)

Secrets Protection

Automatically detects API keys, passwords, authentication tokens, and other credentials using regex patterns and entropy analysis.

API Keys, Passwords and Secrets

Status: Enabled by default
Default Mode: Mask
Detection Methods:
- Pattern matching for common API key formats (AWS, GitHub, Stripe, etc.)
- High-entropy string detection
- Bearer token identification
- JWT detection

Modes:

Mask: Replaces detected secrets with [REDACTED-SECRET] tokens
Block: Rejects the entire request when secrets are detected

Use Cases:

Prevent accidental credential exposure in logs
Protect API keys from being leaked in tool outputs
Maintain compliance with security policies

Personal Identity (PII Guard)

Detects personally identifiable information using regex patterns and format validation.

Email Addresses

Status: Disabled by default
Default Mode: Mask
Detection: Standard RFC 5322 email format
Replacement: [REDACTED-EMAIL]

Use Cases:

GDPR compliance
Protect user privacy in logs
Prevent email harvesting

Phone Numbers

Status: Disabled by default
Default Mode: Mask
Detection: US, UK, and international formats (E.164)
Replacement: [REDACTED-PHONE]
Supported Formats:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- +1-555-123-4567

Use Cases:

Protect customer contact information
TCPA compliance
Privacy law adherence

Financial Information

Detects and protects financial data to maintain PCI DSS compliance.

Credit Card Numbers

Status: Disabled by default
Default Mode: Mask
Detection: Luhn algorithm validation for major card networks
Replacement: [REDACTED-CREDIT-CARD]
Supported Cards:
- Visa
- Mastercard
- American Express
- Discover

Use Cases:

PCI DSS compliance
Prevent card data exposure
Protect against data breaches

Government IDs

Protects government-issued identification numbers.

Status: Enabled by default
Default Mode: Mask
Detection: SSN format validation (XXX-XX-XXXX)
Replacement: [REDACTED-SSN]
Supported Formats:
- 123-45-6789
- 123456789
- 123.45.6789

Use Cases:

HIPAA compliance
Prevent identity theft
Protect sensitive personal data

Network & Infrastructure

Detects network-related information that could pose security risks.

IP Addresses

Status: Disabled by default
Default Mode: Mask
Detection: IPv4 address format
Replacement: [REDACTED-IP-ADDRESS]

Use Cases:

Protect internal network topology
Prevent infrastructure reconnaissance
Security through obscurity

How It Works

Detection Flow

Input Validation: When a tool is called, input parameters are analyzed by enabled guardrails
Pattern Matching: Content is checked against regex patterns and ML models
Confidence Scoring: For threshold-based guards, a confidence score is calculated
Action Execution: Based on the mode (mask/block) and threshold, appropriate action is taken
Output Validation: Tool responses are also validated before being returned

Mask Mode vs Block Mode

Mask Mode (Transform):

Replaces sensitive data with redaction tokens
Request continues processing with masked data
Useful for logging and debugging while maintaining privacy

Block Mode:

Immediately rejects the request
Returns an error to the user
Useful for strict compliance requirements

Threshold Configuration

For Prompt Safety guards, the threshold determines sensitivity:

0.0 - 0.4: Low sensitivity - Only blocks obvious attacks
0.5 - 0.7: Medium sensitivity - Balanced detection (recommended)
0.8 - 1.0: High sensitivity - May produce false positives

Best Practices

Initial Configuration

Start Conservative: Enable with mask mode first to understand impact
Review Logs: Monitor what gets detected before switching to block mode
Tune Thresholds: Adjust Prompt Safety thresholds based on false positive rates
Gradual Rollout: Enable categories one at a time in production

Performance Optimization

Enable Only What You Need: Disable unused categories to reduce processing overhead
Use Mask Mode: Mask mode is faster than block mode for most use cases
Monitor Metrics: Track detection rates and processing times

Compliance

Document Policies: Clearly document which guardrails are enabled and why
Regular Audits: Review guardrail logs periodically
Update Patterns: Stay current with new data protection regulations
Test Regularly: Verify guardrails work as expected with test data

Security

Defense in Depth: Use guardrails as one layer of security, not the only one
Combine Protections: Use built-in guardrails alongside Active Fence or custom webhooks
Monitor Bypasses: Watch for attempts to bypass guardrails
Update Regularly: Keep Webrix updated to get latest detection patterns

Advanced Scenarios

Combining Multiple Guardrails

You can enable multiple guardrail providers simultaneously:

✓ Built-in Guardrails (Data Protection)
✓ Active Fence (Content Safety)
✓ Custom Webhook (Business Logic)

All enabled guardrails run in sequence. If any guardrail blocks a request, processing stops immediately.

Custom Detection Patterns

For custom detection requirements, consider using the Custom Webhook integration alongside built-in guardrails.

Troubleshooting

False Positives

Problem: Legitimate content is being blocked or masked

Solutions:

Lower the threshold for Prompt Safety guards
Switch from block mode to mask mode
Review detection logs to identify patterns
Consider disabling specific categories that aren't needed

False Negatives

Problem: Sensitive data is not being detected

Solutions:

Increase threshold for Prompt Safety guards
Verify the data format matches detection patterns
Check that the appropriate guard is enabled
Review logs to confirm guardrails are running

Performance Impact

Problem: Guardrails are slowing down requests

Solutions:

Disable unused categories
Profile which guards are taking the most time
Consider using mask mode instead of block mode
Ensure you're running the latest Webrix version

Monitoring and Observability

Logging

Guardrail events are automatically logged with:

Detection type (injection, PII, secrets, etc.)
Action taken (allow, block, transform)
Timestamp and request context

View guardrail logs in:

Admin panel → Users → Activity Logs
External log aggregators (if configured)

Metrics to Monitor

Detection Rate: Percentage of requests triggering guardrails
Block Rate: Percentage of blocked requests
False Positive Rate: Legitimate requests incorrectly blocked
Performance Impact: Average latency added by guardrails

FAQ

Do built-in guardrails require external API calls?

No, all detection runs locally on your Webrix infrastructure. No data is sent to external services.

Can I use built-in guardrails with Active Fence?

Yes, you can enable both simultaneously. They work in sequence - built-in guardrails run first, then Active Fence.

What happens to performance with all guardrails enabled?

Typical overhead is 50-200ms per request depending on content size and number of enabled guards.

Can I customize detection patterns?

Detection patterns are built into Webrix. For custom patterns, use the Custom Webhook integration.

Are guardrails applied to all requests?

Yes, when enabled, guardrails apply to all tool calls through your MCP gateway.

Can I test guardrails without blocking real users?

Yes, start with mask mode only to see what would be detected without blocking requests.

What data is included in guardrail logs?

Logs include detection type, action taken, and request metadata. Sensitive data itself is never logged.

Do guardrails work in observe-only mode?

For observe-only mode, use mask mode - it transforms data without blocking requests.

Overview​

Key Features​

Getting Started​

Accessing Guardrails Settings​

Configuration Guide​

Enabling Data Protection​

Prompt Safety​

Prompt Injection Protection​

System Prompt Extraction​

Secrets Protection​

API Keys, Passwords and Secrets​

Personal Identity (PII Guard)​

Email Addresses​

Phone Numbers​

Financial Information​

Credit Card Numbers​

Government IDs​

Social Security Numbers​

Network & Infrastructure​

IP Addresses​

How It Works​

Detection Flow​

Mask Mode vs Block Mode​

Threshold Configuration​

Best Practices​

Initial Configuration​

Performance Optimization​

Compliance​

Security​

Advanced Scenarios​

Combining Multiple Guardrails​

Custom Detection Patterns​

Troubleshooting​

False Positives​

False Negatives​

Performance Impact​

Monitoring and Observability​

Logging​

Metrics to Monitor​

FAQ​

Do built-in guardrails require external API calls?​

Can I use built-in guardrails with Active Fence?​

What happens to performance with all guardrails enabled?​

Can I customize detection patterns?​

Are guardrails applied to all requests?​

Can I test guardrails without blocking real users?​

What data is included in guardrail logs?​

Do guardrails work in observe-only mode?​

Additional Resources​

Overview

Key Features

Getting Started

Accessing Guardrails Settings

Configuration Guide

Enabling Data Protection

Prompt Safety

Prompt Injection Protection

System Prompt Extraction

Secrets Protection

API Keys, Passwords and Secrets

Personal Identity (PII Guard)

Email Addresses

Phone Numbers

Financial Information

Credit Card Numbers

Government IDs

Social Security Numbers

Network & Infrastructure

IP Addresses

How It Works

Detection Flow

Mask Mode vs Block Mode

Threshold Configuration

Best Practices

Initial Configuration

Performance Optimization

Compliance

Security

Advanced Scenarios

Combining Multiple Guardrails

Custom Detection Patterns

Troubleshooting

False Positives

False Negatives

Performance Impact

Monitoring and Observability

Logging

Metrics to Monitor

FAQ

Do built-in guardrails require external API calls?

Can I use built-in guardrails with Active Fence?

What happens to performance with all guardrails enabled?

Can I customize detection patterns?

Are guardrails applied to all requests?

Can I test guardrails without blocking real users?

What data is included in guardrail logs?

Do guardrails work in observe-only mode?

Additional Resources