Live Monitoring: Real-Time Anomaly Detection

Running DeepSentry in production: streaming logs, real-time scoring, alerts, and integration patterns.

What is Live Monitoring?

After training your models, you want to continuously monitor new logs. Live monitoring:

  1. Reads incoming log entries in real-time
  2. Encodes each message using the trained text autoencoder
  3. Scores each sequence using the anomaly detector
  4. Alerts when anomaly scores exceed a threshold
  5. Maintains rolling statistics for adaptive thresholding
┌──────────────────────────────────────────────────────────┐ │ LIVE MONITORING PIPELINE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Incoming Logs Processing │ │ ────────────── ────────── │ │ │ │ "14:23:45 Got block" │ │ "14:23:46 BlockReport" [READ LOG] │ │ "14:23:47 Verification" │ │ │ ▼ │ │ (rotating through logs) [ENCODE TEXT] │ │ Compress to 128-D │ │ │ │ │ ▼ │ │ [MANAGE SEQUENCE WINDOW] │ │ Sliding window of 10 vectors │ │ │ │ │ ▼ │ │ [ANOMALY SCORING] │ │ Reconstruction Error │ │ │ │ │ Threshold = mean + 2.5 × std ▼ │ │ [COMPARE TO THRESHOLD] │ │ Score < Threshold? ✓ NORMAL │ │ Score > Threshold? ⚠ ANOMALY │ │ │ │ │ Alert Destinations ◄──────── ▼ │ │ ├─ Slack webhook [ALERT ON ANOMALY] │ │ ├─ PagerDuty Msg + Score + Time │ │ ├─ Email │ │ └─ Syslog │ │ │ ├──────────────────────────────────────────────────────────┤ │ Status: Running, 10K msgs/sec, 120s latency │ │ Last anomaly: 5 mins ago (score: 4.8) │ └──────────────────────────────────────────────────────────┘

Running Live Monitoring

Basic Execution

bash dockerrun/run_live_monitoring.sh

The live monitor starts and reads logs according to configuration. Output looks like:

[14:23:45] Normal - score: 0.12 - "Got block report from..."
[14:23:46] Normal - score: 0.09 - "Verification complete"
[14:23:47] ANOMALY - score: 3.45 - "Unexpected error" ⚠️
[14:23:48] Normal - score: 0.14 - "Retrying connection..."
        

Input Sources

Configure in live_monitoring_config.yml:


# From a file
log_file: /var/log/myapp.log
tail_mode: true   # Start from end (watch for new entries)

# From stdin (pipe logs in)
log_file: /dev/stdin

# From network socket (with socat)
log_file: /dev/tcp/localhost/9999
        

Piping Logs in Real-Time

Forward logs from a source:


# From syslog
tail -f /var/log/syslog | bash dockerrun/run_live_monitoring.sh

# From application stderr
./myapp 2>&1 | bash dockerrun/run_live_monitoring.sh

# From remote server via SSH
ssh user@remote "tail -f /var/log/app.log" | bash dockerrun/run_live_monitoring.sh
        

Adaptive Thresholding

The live monitor doesn't use a fixed threshold. Instead, it maintains running statistics:

  • Window: Last N scores (e.g., 100 most recent)
  • Mean: Average of recent scores
  • Std Dev: How much scores vary
  • Threshold: mean + 2.5 * std_dev

Why this matters: Even if "normal" anomaly scores slowly increase (due to seasonal changes or system evolution), the threshold adapts.

Example

Hour 1: Scores are [0.1, 0.2, 0.15, 0.3, ...]. Mean=0.17, Std=0.08. Threshold = 0.17 + 2.5*0.08 = 0.37

Hour 2: Scores are [0.2, 0.25, 0.22, 0.4, ...]. Mean=0.27, Std=0.09. Threshold = 0.27 + 2.5*0.09 = 0.495

The threshold moved up because baseline scores increased. This prevents "threshold exhaustion" where all scores become anomalies.

Output and Alerting

Console Output

By default, the live monitor outputs to console:

[14:23:45] Normal  - score: 0.21 (threshold: 2.45)
[14:23:46] ANOMALY - score: 3.67 (threshold: 2.45) ⚠️
[14:23:47] Normal  - score: 0.18 (threshold: 2.45)

File Output

Configure in live_monitoring_config.yml:


output_file: /var/log/deepsentry-alerts.log
        

Anomalies are written with full context:

TIMESTAMP=2022-01-15T14:23:47Z
SCORE=3.67
THRESHOLD=2.45
MESSAGE="Unexpected error in database connection"
CONTEXT_WINDOW="Got block report... → Verification complete → Unexpected error"
        

Integration with Monitoring Systems

Direct integration patterns:


# Send alerts to syslog
bash dockerrun/run_live_monitoring.sh | \
  grep ANOMALY | \
  logger -t deepsentry -p user.alert

# Send to Prometheus (custom exporter)
bash dockerrun/run_live_monitoring.sh | \
  ./alert_to_prometheus.py

# Send to monitoring webhook
bash dockerrun/run_live_monitoring.sh | \
  grep ANOMALY | \
  while read line; do
    curl -X POST https://alerts.example.com/webhook \
      -d "$line"
  done
        

Tuning Live Monitoring

Key Parameters

Parameter Default Impact
threshold_multiplier 2.5 How many std devs above mean triggers alert. Higher = fewer alerts.
window_size 100 How many recent scores to use for statistics. Larger = more stable threshold.
sequence_length 10 Must match training. How many vectors in a sequence.
batch_interval 1.0 (seconds) How often to score new logs. Smaller = more frequent scoring.

Tuning Strategy

Start with defaults, then adjust based on your alerts:

  • Too many false positives: Increase threshold_multiplier (e.g., 3.0 or 3.5)
  • Missing real anomalies: Decrease threshold_multiplier (e.g., 2.0 or 1.5)
  • Threshold too volatile: Increase window_size (e.g., 200 or 500)
  • Slow to adapt to changes: Decrease window_size (e.g., 50)

Production Deployment

Systemd Service

Run as a systemd service for automatic restart:


# /etc/systemd/system/deepsentry-live.service

[Unit]
Description=DeepSentry Live Anomaly Detection
After=network.target

[Service]
Type=simple
User=deepsentry
WorkingDirectory=/opt/deepsentry
ExecStart=bash dockerrun/run_live_monitoring.sh
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
        

Enable and start:

sudo systemctl enable deepsentry-live
sudo systemctl start deepsentry-live
sudo systemctl status deepsentry-live
        

Docker Container

Run live monitoring in Docker:


docker run -d \
  --name deepsentry-live \
  -v /var/log:/var/log:ro \
  -v /data/deepsentry:/data \
  --restart unless-stopped \
  deepsentry:latest \
  python src/live/main.py
        

Kubernetes Deployment

For cloud deployments:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: deepsentry-live
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deepsentry-live
  template:
    metadata:
      labels:
        app: deepsentry-live
    spec:
      containers:
      - name: live
        image: deepsentry:latest
        volumeMounts:
        - name: logs
          mountPath: /var/log
          readOnly: true
        - name: models
          mountPath: /data/models
          readOnly: true
        env:
        - name: LOG_FILE
          value: /var/log/app.log
        - name: ANOMALY_MODEL
          value: /data/models/detector.h5
      volumes:
      - name: logs
        hostPath:
          path: /var/log
      - name: models
        configMap:
          name: deepsentry-models
        

Monitoring the Monitor

Keep an eye on the live monitoring process itself:


# Check if process is running
ps aux | grep deepsentry

# Monitor resource usage
watch -n 1 'docker stats deepsentry-live'

# Check for errors in logs
tail -f /var/log/deepsentry-alerts.log | grep ERROR

# Alert stats
tail -f /var/log/deepsentry-alerts.log | grep ANOMALY | wc -l
        

Common Issues and Solutions

Problem: No alerts even for obvious anomalies

Check:

  • Models are loaded correctly (check logs for errors)
  • Sequence length matches training config
  • Log format is correct (YYMMDD HHMMSS MESSAGE)
  • Threshold is too high

Problem: Too many false positive alerts

Solutions:

  • Increase threshold_multiplier in config
  • Increase window_size for more stable baseline
  • Check that training data was clean (no anomalies)
  • Verify test logs are similar distribution to training logs

Problem: Memory or CPU usage is high

Optimizations:

  • Reduce sequence_length if possible
  • Use smaller models (reduce embedding_size)
  • Increase batch_interval to score less frequently
  • Use GPU acceleration if available

Next Steps

Once live monitoring is running:

  • Set up alerting to send anomalies to your on-call team
  • Create dashboards showing anomaly detection rate and latency
  • Periodically retrain models with new data to stay current
  • Investigate flagged anomalies to understand patterns