Skip to main content

Built-in Data Protection

Overview

Webrix provides built-in data protection guardrails powered by advanced pattern detection and machine learning models. These guardrails help you automatically detect and protect sensitive data, prevent prompt injection attacks, and maintain security compliance - all without requiring external API keys or third-party services.

Key Features

The built-in guardrails system provides comprehensive protection across multiple categories:

  • Prompt Safety: Detect and block prompt injection and system prompt extraction attempts
  • Secrets Protection: Automatically identify and mask or block API keys, passwords, and authentication tokens
  • PII Detection: Recognize and protect personally identifiable information like emails and phone numbers
  • Financial Data: Detect and secure credit card numbers and other financial information
  • Government IDs: Protect sensitive government-issued identifiers like Social Security Numbers
  • Network Information: Identify and mask IP addresses and other network infrastructure data

Getting Started

Accessing Guardrails Settings

  1. Log in to your Webrix admin panel
  2. Navigate to SettingsGuardrails & Data Protection
  3. Toggle on Enable Data Protection to activate the guardrails system

All settings are applied in real-time and do not require a restart of your services.

Configuration Guide

Enabling Data Protection

The master toggle at the top of the page enables or disables all guardrails. When disabled, no data protection rules are applied.

Prompt Safety

Prompt Safety guardrails use pattern matching and heuristic analysis to detect malicious attempts to compromise your AI system.

Prompt Injection Protection

Detects attempts to inject malicious instructions that could override system behavior.

  • Status: Disabled by default
  • Default Threshold: 0.7
  • Action: Blocks requests when confidence exceeds threshold
  • Use Cases:
    • Prevent users from overriding system instructions
    • Block attempts to extract sensitive prompts
    • Protect against jailbreak attempts

Configuration:

Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)

System Prompt Extraction

Detects attempts to extract or leak system prompts and internal instructions.

  • Status: Disabled by default
  • Default Threshold: 0.7
  • Action: Blocks requests when confidence exceeds threshold
  • Use Cases:
    • Protect proprietary system instructions
    • Prevent exposure of internal logic
    • Block prompt leakage attempts

Configuration:

Enable: Toggle on/off
Threshold: 0.0 - 1.0 (higher = more strict)

Secrets Protection

Automatically detects API keys, passwords, authentication tokens, and other credentials using regex patterns and entropy analysis.

API Keys, Passwords and Secrets

  • Status: Enabled by default
  • Default Mode: Mask
  • Detection Methods:
    • Pattern matching for common API key formats (AWS, GitHub, Stripe, etc.)
    • High-entropy string detection
    • Bearer token identification
    • JWT detection

Modes:

  • Mask: Replaces detected secrets with [REDACTED-SECRET] tokens
  • Block: Rejects the entire request when secrets are detected

Use Cases:

  • Prevent accidental credential exposure in logs
  • Protect API keys from being leaked in tool outputs
  • Maintain compliance with security policies

Personal Identity (PII Guard)

Detects personally identifiable information using regex patterns and format validation.

Email Addresses

  • Status: Disabled by default
  • Default Mode: Mask
  • Detection: Standard RFC 5322 email format
  • Replacement: [REDACTED-EMAIL]

Use Cases:

  • GDPR compliance
  • Protect user privacy in logs
  • Prevent email harvesting

Phone Numbers

  • Status: Disabled by default
  • Default Mode: Mask
  • Detection: US, UK, and international formats (E.164)
  • Replacement: [REDACTED-PHONE]
  • Supported Formats:
    • (555) 123-4567
    • 555-123-4567
    • 555.123.4567
    • +1-555-123-4567

Use Cases:

  • Protect customer contact information
  • TCPA compliance
  • Privacy law adherence

Financial Information

Detects and protects financial data to maintain PCI DSS compliance.

Credit Card Numbers

  • Status: Disabled by default
  • Default Mode: Mask
  • Detection: Luhn algorithm validation for major card networks
  • Replacement: [REDACTED-CREDIT-CARD]
  • Supported Cards:
    • Visa
    • Mastercard
    • American Express
    • Discover

Use Cases:

  • PCI DSS compliance
  • Prevent card data exposure
  • Protect against data breaches

Government IDs

Protects government-issued identification numbers.

Social Security Numbers

  • Status: Enabled by default
  • Default Mode: Mask
  • Detection: SSN format validation (XXX-XX-XXXX)
  • Replacement: [REDACTED-SSN]
  • Supported Formats:
    • 123-45-6789
    • 123456789
    • 123.45.6789

Use Cases:

  • HIPAA compliance
  • Prevent identity theft
  • Protect sensitive personal data

Network & Infrastructure

Detects network-related information that could pose security risks.

IP Addresses

  • Status: Disabled by default
  • Default Mode: Mask
  • Detection: IPv4 address format
  • Replacement: [REDACTED-IP-ADDRESS]

Use Cases:

  • Protect internal network topology
  • Prevent infrastructure reconnaissance
  • Security through obscurity

How It Works

Detection Flow

  1. Input Validation: When a tool is called, input parameters are analyzed by enabled guardrails
  2. Pattern Matching: Content is checked against regex patterns and ML models
  3. Confidence Scoring: For threshold-based guards, a confidence score is calculated
  4. Action Execution: Based on the mode (mask/block) and threshold, appropriate action is taken
  5. Output Validation: Tool responses are also validated before being returned

Mask Mode vs Block Mode

Mask Mode (Transform):

  • Replaces sensitive data with redaction tokens
  • Request continues processing with masked data
  • Useful for logging and debugging while maintaining privacy

Block Mode:

  • Immediately rejects the request
  • Returns an error to the user
  • Useful for strict compliance requirements

Threshold Configuration

For Prompt Safety guards, the threshold determines sensitivity:

  • 0.0 - 0.4: Low sensitivity - Only blocks obvious attacks
  • 0.5 - 0.7: Medium sensitivity - Balanced detection (recommended)
  • 0.8 - 1.0: High sensitivity - May produce false positives

Best Practices

Initial Configuration

  1. Start Conservative: Enable with mask mode first to understand impact
  2. Review Logs: Monitor what gets detected before switching to block mode
  3. Tune Thresholds: Adjust Prompt Safety thresholds based on false positive rates
  4. Gradual Rollout: Enable categories one at a time in production

Performance Optimization

  • Enable Only What You Need: Disable unused categories to reduce processing overhead
  • Use Mask Mode: Mask mode is faster than block mode for most use cases
  • Monitor Metrics: Track detection rates and processing times

Compliance

  • Document Policies: Clearly document which guardrails are enabled and why
  • Regular Audits: Review guardrail logs periodically
  • Update Patterns: Stay current with new data protection regulations
  • Test Regularly: Verify guardrails work as expected with test data

Security

  • Defense in Depth: Use guardrails as one layer of security, not the only one
  • Combine Protections: Use built-in guardrails alongside Active Fence or custom webhooks
  • Monitor Bypasses: Watch for attempts to bypass guardrails
  • Update Regularly: Keep Webrix updated to get latest detection patterns

Advanced Scenarios

Combining Multiple Guardrails

You can enable multiple guardrail providers simultaneously:

✓ Built-in Guardrails (Data Protection)
✓ Active Fence (Content Safety)
✓ Custom Webhook (Business Logic)

All enabled guardrails run in sequence. If any guardrail blocks a request, processing stops immediately.

Custom Detection Patterns

For custom detection requirements, consider using the Custom Webhook integration alongside built-in guardrails.

Troubleshooting

False Positives

Problem: Legitimate content is being blocked or masked

Solutions:

  1. Lower the threshold for Prompt Safety guards
  2. Switch from block mode to mask mode
  3. Review detection logs to identify patterns
  4. Consider disabling specific categories that aren't needed

False Negatives

Problem: Sensitive data is not being detected

Solutions:

  1. Increase threshold for Prompt Safety guards
  2. Verify the data format matches detection patterns
  3. Check that the appropriate guard is enabled
  4. Review logs to confirm guardrails are running

Performance Impact

Problem: Guardrails are slowing down requests

Solutions:

  1. Disable unused categories
  2. Profile which guards are taking the most time
  3. Consider using mask mode instead of block mode
  4. Ensure you're running the latest Webrix version

Monitoring and Observability

Logging

Guardrail events are automatically logged with:

  • Detection type (injection, PII, secrets, etc.)
  • Action taken (allow, block, transform)
  • Timestamp and request context

View guardrail logs in:

  • Admin panel → UsersActivity Logs
  • External log aggregators (if configured)

Metrics to Monitor

  • Detection Rate: Percentage of requests triggering guardrails
  • Block Rate: Percentage of blocked requests
  • False Positive Rate: Legitimate requests incorrectly blocked
  • Performance Impact: Average latency added by guardrails

FAQ

Do built-in guardrails require external API calls?

No, all detection runs locally on your Webrix infrastructure. No data is sent to external services.

Can I use built-in guardrails with Active Fence?

Yes, you can enable both simultaneously. They work in sequence - built-in guardrails run first, then Active Fence.

What happens to performance with all guardrails enabled?

Typical overhead is 50-200ms per request depending on content size and number of enabled guards.

Can I customize detection patterns?

Detection patterns are built into Webrix. For custom patterns, use the Custom Webhook integration.

Are guardrails applied to all requests?

Yes, when enabled, guardrails apply to all tool calls through your MCP gateway.

Can I test guardrails without blocking real users?

Yes, start with mask mode only to see what would be detected without blocking requests.

What data is included in guardrail logs?

Logs include detection type, action taken, and request metadata. Sensitive data itself is never logged.

Do guardrails work in observe-only mode?

For observe-only mode, use mask mode - it transforms data without blocking requests.

Additional Resources