Examples and Real-World Use Cases

Working examples and how different organizations use DeepSentry.

Example 1: HDFS Cluster Monitoring (Reference Implementation)

Scenario

A large data center runs a Hadoop HDFS cluster with 100+ nodes. They want to detect failures and anomalies early.

Implementation


# Step 1: Collect 2 weeks of clean HDFS logs
ssh hadoop-master "tar -czf hdfs_logs.tar.gz /var/log/hdfs/"

# Step 2: Extract to data directory
tar -xzf hdfs_logs.tar.gz -C /data/hdfs_logs/

# Step 3: Prepare data
python src/tx/prepare.py \
  --config dockerconfig/text_autoencoder_prepare_data.yml \
  --data-dir /data/hdfs_logs

# Step 4: Train models
bash dockerrun/run_text_autoencoder_train.sh
bash dockerrun/run_text_autoencoder_encode_dataset.sh
bash dockerrun/run_anomaly_detector_train.sh

# Step 5: Deploy live monitoring
nohup bash dockerrun/run_live_monitoring.sh > /var/log/deepsentry.log 2>&1 &
        

Results

  • Baseline: 0.2 anomaly score for normal operations
  • Detected failures: 3.5+ anomaly scores, 95% accuracy
  • False positive rate: 2-3% under normal conditions
  • Mean detection latency: 30 seconds after anomaly starts

Example 2: Microservices API Monitoring

Scenario

A SaaS platform has 20+ microservices writing to centralized logs. They want to detect service degradation and unusual request patterns.

Custom Setup


# Logs are in JSON format from API gateway
# {"timestamp":"2022-01-15T14:23:45Z","service":"user-api","level":"INFO","message":"Request received"}

# Custom log parser
# src/microservices_parser.py

import json
from datetime import datetime

class MicroservicesLogParser:
    def parse_message(self, line):
        data = json.loads(line)
        timestamp = datetime.fromisoformat(data['timestamp']).strftime("%y%m%d %H%M%S")
        
        # Combine service and message for richer context
        message = f"{data['service']}: {data['message']}"
        return timestamp, message

# Training configuration (tuned for API logs)
# dockerconfig/text_autoencoder_train.yml
parameters:
  embedding_size: 32              # APIs logs are consistent
  encoder_output_size: 64
  epochs: 5                       # Less training data needed
  batch_size: 64
  bidirectional: false            # Logs are mostly sequential

# Anomaly detection config (APIs are fast)
parameters:
  sequence_length: 5              # Shorter sequences
  lstm_hidden_size: 32
  epochs: 8
  bidirectional: true             # Need context both ways
        

Results

  • Detects service restarts and crashes within 10 seconds
  • Catches unusual API request patterns (e.g., spike in 404s)
  • Alerts on database connection failures
  • Distinguishes between normal traffic spikes and actual anomalies

Example 3: Database Server Monitoring

Scenario

PostgreSQL database server generating detailed query and transaction logs. They need to catch performance degradation and errors.

Log Extraction


# PostgreSQL logs go to syslog
# Extract with proper timestamp format

cat /var/log/postgresql.log | awk '{
  # Convert to Utah format: YYMMDD HHMMSS MESSAGE
  print strftime("%y%m%d %H%M%S", $1) " " $0
}' > /data/postgres.log

# Or directly from postgres logs with log_line_prefix configured
# log_line_prefix = '[%t] %u@%d %p: '
        

Tuning for Databases


# Databases have very structured logs
# Tune for high accuracy

parameters:
  embedding_size: 64
  encoder_output_size: 128        # Richer representation
  epochs: 15                      # More training for precision
  batch_size: 32
  bidirectional: true             # Transaction context matters
  dropout: 0.3                    # Prevent overfitting

# Anomaly detection
parameters:
  sequence_length: 15             # Longer sequences for transaction context
  lstm_hidden_size: 128           # Larger capacity
  epochs: 20                      # More training
  threshold_multiplier: 3.0       # Be conservative with alerts
        

Detected Anomalies

  • Queries taking longer than normal
  • Unusual connection patterns
  • Transaction deadlocks
  • Index corruption or missing indexes
  • Disk space issues

Example 4: Kubernetes Cluster Monitoring

Scenario

Kubernetes cluster with 50+ nodes and 500+ pods. They want to detect pod crashes, network issues, and resource exhaustion.

Log Aggregation


# Collect logs from kubelet
cat /var/log/kubelet.log | awk '{print strftime("%y%m%d %H%M%S") " " $0}' > k8s.log

# Or from container runtime (Docker)
journalctl -u docker --no-pager | \
  awk '{print strftime("%y%m%d %H%M%S") " " $0}' > container.log

# Combine multiple sources
cat k8s.log container.log | sort -k1,2 > combined.log
        

K8s-Specific Configuration


# Kubernetes logs are verbose with lots of repetition
# Tune to filter noise

parameters:
  root_dir: {ROOT_DATA_DIR}
  
  # Pre-filter unimportant logs
  min_word_count: 3               # Ignore very rare events
  max_vocab_size: 5000            # K8s has fewer unique patterns than you'd think
  
  # Training
  epochs: 12
  embedding_size: 48              # K8s logs are somewhat uniform
  
  # Anomaly detection focuses on rare patterns
  sequence_length: 8
  lstm_hidden_size: 64
  epochs: 15
  threshold_multiplier: 2.2       # K8s noise is high, be permissive
        

Detected Issues

  • Pod CrashLoopBackOff patterns
  • Node NotReady and MemoryPressure conditions
  • Network plugin failures
  • Scheduler backoff and binding failures
  • Volume mount and storage errors

Example 5: Web Application Log Monitoring

Scenario

Django/Flask web application generating logs from multiple processes. They want to detect:

  • Internal server errors (500s)
  • Authentication failures
  • Database query timeouts
  • Unusual request volumes

Application Instrumentation


# Flask example with structured logging

import logging
import json
from datetime import datetime

class LogFormatter(logging.Formatter):
    def format(self, record):
        timestamp = datetime.utcnow().strftime("%y%m%d %H%M%S")
        message = record.getMessage()
        return f"{timestamp} {message}"

# Configure app logging
handler = logging.FileHandler('app.log')
handler.setFormatter(LogFormatter())
logger = logging.getLogger()
logger.addHandler(handler)

# Log events with consistent format
logger.info("POST /api/users - 201 Created")
logger.error("Database connection timeout: 30s")
logger.warning("Authentication failed: IP 192.168.1.100")
        

Baseline Configuration


# Web app logs are simple and consistent

parameters:
  embedding_size: 32              # Applications speak uniformly
  encoder_output_size: 48
  epochs: 8
  batch_size: 64
  
  # Anomaly detection
  sequence_length: 5              # HTTP requests are quick
  lstm_hidden_size: 32
  threshold_multiplier: 2.0       # Web patterns are distinctive
        

Example 6: Time-Series Sensor Data (Non-Traditional)

Scenario

Industrial monitoring system with sensor logs. While not traditional text logs, the approach works:


# Sensor logs look like:
220115 14:23:45 SENSOR_001: temperature=42.3°C status=OK
220115 14:23:46 SENSOR_002: pressure=101.2kPa status=OK
220115 14:23:47 SENSOR_001: temperature=42.5°C status=WARN

# They can be processed like text
# The text encoder learns that:
# - temperatures in 40-45°C range are normal
# - status=WARN is less common than OK
# - pressure values cluster around 101kPa

# The anomaly detector learns:
# - SENSOR_001 usually sees gradual temperature changes
# - Rapid jumps (e.g., 42.3 → 48.1) are anomalous
# - Status should not jump from OK to ERROR without WARN
        

Comparison: Configuration Across Use Cases

Use Case embedding_size sequence_length epochs threshold
HDFS (complex) 128 20 15 2.5
Microservices (fast) 32 5 8 2.0
Database (precise) 64 15 20 3.0
Kubernetes (noisy) 48 8 12 2.2
Web App (simple) 32 5 8 2.0

Pattern: Getting Started with Your System

  1. Characterize your logs: How many messages per day? What's message complexity?
  2. Collect baseline: 2+ weeks of clean operations
  3. Start simple: Use default config, run one pipeline
  4. Evaluate: Check AUC and other metrics
  5. Tune: Adjust parameters based on results
  6. Iterate: Retrain as you learn more