Skip to main content

Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects and stores metrics as time series data, with each time series identified by a metric name and key-value pairs called labels. Prometheus provides powerful query capabilities through PromQL (Prometheus Query Language), enabling real-time analysis of metrics, alerting on conditions, and operational insights.

Authentication Types

Prometheus supports 1 authentication method:

  • API Key (Basic Auth) - Username and password authentication using HTTP Basic Authentication
    • Pros: Simple to set up, widely supported, standard HTTP authentication
    • Cons: Credentials are base64-encoded (not encrypted), requires HTTPS for security
    • Best for: Internal deployments, authenticated access to Prometheus instances

General Settings

Before using the connector, you need to configure:

  • Prometheus Instance URL - The base URL of your Prometheus server (e.g., https://prometheus.example.com or http://localhost:9090)

Setting up API Key (Basic Auth)

To use Prometheus with Webrix, you need to configure Basic Authentication if your Prometheus instance requires it.

1. Configure Basic Auth on Prometheus (if not already done)

  1. Generate a bcrypt-hashed password:
htpasswd -nBC 10 "" | tr -d ':\n'
  1. Create a web.yml configuration file:
basic_auth_users:
admin: <bcrypt_hashed_password>
  1. Start Prometheus with the web config file:
prometheus --web.config.file=web.yml

2. Enable Admin API (Optional)

If you want to use administrative features (snapshots, series deletion, config reload), enable these flags when starting Prometheus:

prometheus \
--web.config.file=web.yml \
--web.enable-admin-api \
--web.enable-lifecycle
  • --web.enable-admin-api - Enables TSDB admin operations (snapshot, delete, clean)
  • --web.enable-lifecycle - Enables config reload endpoint
tip

Admin endpoints should only be enabled in trusted environments as they allow destructive operations and configuration changes.

3. Configure in Webrix

  1. In Webrix, go to IntegrationsNewBuilt-in

  2. Select Prometheus and click Use

  3. Under General Settings, enter your Prometheus Instance URL

    • Example: https://prometheus.example.com
    • Example: http://localhost:9090
  4. Under Authentication Type, select API Key

  5. Enter your Username (e.g., admin)

  6. Enter your Password (the plain text password, not the bcrypt hash)

  7. Click Save Changes

4. Test the Connection

  1. After saving, click Connect to test the authentication

  2. Try running a simple query like "Execute Instant Query" with query: up

  3. You should see metrics data returned successfully

Common Use Cases

Querying Metrics

Use the query tools to retrieve metrics data:

  • Execute Instant Query - Get current metric values

    • Example: up - Check which targets are up
    • Example: rate(http_requests_total[5m]) - Request rate over last 5 minutes
    • Example: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes - Memory utilization
  • Execute Range Query - Get metrics over a time period

    • Example: Query cpu_usage from 1 hour ago to now with 1-minute resolution
    • Perfect for creating graphs and dashboards

Discovering Metrics

Explore what metrics and labels are available:

  • List All Label Names - See all available labels (job, instance, status_code, etc.)
  • Get Label Values - Find all values for a specific label (e.g., all job names)
  • Find Series by Label Matchers - Discover time series matching specific criteria
  • List Metric Metadata - Get descriptions and types for metrics

Monitoring Operations

Check the health and status of your monitoring infrastructure:

  • List Scrape Targets - See all targets being scraped and their status
  • List Active Alerts - View currently firing alerts
  • List Alerting and Recording Rules - Audit configured rules
  • List Alertmanagers - Check Alertmanager discovery status

Administration

Manage your Prometheus instance:

  • Create TSDB Snapshot - Backup your metrics data
  • Delete Time Series - Remove unwanted metrics
  • Reload Configuration - Apply config changes without restart
  • Get TSDB Statistics - Analyze cardinality and resource usage

Troubleshooting

Authentication Failed (401 Unauthorized)

You receive 401 errors when trying to query Prometheus.

Cause: Incorrect username or password, or Basic Auth not configured on Prometheus.

Solution:

  1. Verify your username and password are correct
  2. Check that Prometheus is started with --web.config.file pointing to your auth config
  3. Test authentication manually:
    curl -u username:password http://prometheus-url/api/v1/query?query=up
  4. Ensure the password in Webrix matches the plain text password (not the bcrypt hash)

Connection Refused or Timeout

Cannot connect to Prometheus instance.

Cause: Incorrect instance URL, Prometheus not running, or network issues.

Solution:

  1. Verify the Instance URL is correct and includes the protocol (http:// or https://)
  2. Check that Prometheus is running:
    curl http://localhost:9090/-/healthy
  3. Ensure no firewall rules are blocking access
  4. For HTTPS instances, ensure the certificate is valid

Admin API Disabled (501 Not Implemented)

Error when trying to use snapshot, delete, or config reload tools.

Cause: Admin API endpoints are not enabled on the Prometheus server.

Solution:

  1. Restart Prometheus with the appropriate flags:
    prometheus --web.enable-admin-api --web.enable-lifecycle
  2. These flags enable:
    • --web.enable-admin-api: Snapshot, delete series, clean tombstones
    • --web.enable-lifecycle: Config reload
  3. Note: Only enable these in trusted environments

Query Timeout (503 Service Unavailable)

Queries fail with timeout errors.

Cause: Query is too expensive or takes too long to execute.

Solution:

  1. Simplify your query or reduce the time range
  2. Add a timeout parameter to your query (e.g., "2m")
  3. Increase the query timeout on Prometheus server:
    prometheus --query.timeout=5m
  4. Check if high cardinality is causing performance issues using "Get TSDB Statistics"
  5. Consider adding more specific label matchers to reduce data scanned

Invalid Query (400 Bad Request)

Query fails with parsing or validation errors.

Cause: Syntax error in PromQL expression.

Solution:

  1. Use the "Format Query" tool to check query syntax
  2. Use the "Parse Query" tool to see how Prometheus interprets your query
  3. Common mistakes:
    • Missing closing brackets or parentheses
    • Invalid label matchers (use =, !=, =~, !~)
    • Invalid duration formats (use 5m, 1h, 30s)
  4. Refer to PromQL documentation

No Data Returned

Query succeeds but returns empty results.

Cause: No matching time series, or querying outside the retention period.

Solution:

  1. Use "Find Series by Label Matchers" to verify the series exists
  2. Check the time range - metrics may have been deleted or expired
  3. Verify label matchers are correct (case-sensitive)
  4. Use "List Scrape Targets" to ensure targets are being scraped successfully
  5. Check for relabeling issues using "Get Relabel Steps"

High Cardinality Warnings

Prometheus performance degrading or high memory usage.

Cause: Too many unique time series being created.

Solution:

  1. Use "Get TSDB Statistics" to identify high cardinality labels
  2. Avoid labels with unbounded values (user IDs, timestamps, etc.)
  3. Use relabeling to drop or aggregate high cardinality labels
  4. Consider using recording rules to pre-aggregate data
  5. Review "List Metric Metadata" to audit your metrics

Cannot Delete Series

Series deletion fails or doesn't free up space.

Cause: Admin API not enabled, or tombstones not cleaned.

Solution:

  1. Ensure --web.enable-admin-api flag is set
  2. After "Delete Time Series", run "Clean Tombstones" to reclaim disk space
  3. Note: Deletion only marks data as deleted initially
  4. Create a snapshot before deletion for safety
  5. Verify deletion with "Find Series by Label Matchers"

Important Notes

Security Considerations

  • Always use HTTPS in production to protect Basic Auth credentials
  • Limit access to admin endpoints (--web.enable-admin-api) to trusted users only
  • Consider network-level access controls for Prometheus
  • Regular backups using "Create TSDB Snapshot" are recommended

Performance Best Practices

  • Use specific label matchers in queries to reduce data scanned
  • Monitor cardinality with "Get TSDB Statistics"
  • Use recording rules for frequently-queried expensive aggregations
  • Set appropriate retention periods to balance storage and query performance

PromQL Tips

  • rate() and irate() for counter metrics
  • increase() for cumulative counters over time
  • histogram_quantile() for histogram metrics
  • Use by and without for aggregations
  • Label matchers: = (equal), != (not equal), =~ (regex match), !~ (regex not match)

Additional Resources