Examples and Real-World Use Cases
Example 1: HDFS Cluster Monitoring (Reference Implementation)
Scenario
A large data center runs a Hadoop HDFS cluster with 100+ nodes. They want to detect failures and anomalies early.
Implementation
# Step 1: Collect 2 weeks of clean HDFS logs
ssh hadoop-master "tar -czf hdfs_logs.tar.gz /var/log/hdfs/"
# Step 2: Extract to data directory
tar -xzf hdfs_logs.tar.gz -C /data/hdfs_logs/
# Step 3: Prepare data
python src/tx/prepare.py \
--config dockerconfig/text_autoencoder_prepare_data.yml \
--data-dir /data/hdfs_logs
# Step 4: Train models
bash dockerrun/run_text_autoencoder_train.sh
bash dockerrun/run_text_autoencoder_encode_dataset.sh
bash dockerrun/run_anomaly_detector_train.sh
# Step 5: Deploy live monitoring
nohup bash dockerrun/run_live_monitoring.sh > /var/log/deepsentry.log 2>&1 &
Results
- Baseline: 0.2 anomaly score for normal operations
- Detected failures: 3.5+ anomaly scores, 95% accuracy
- False positive rate: 2-3% under normal conditions
- Mean detection latency: 30 seconds after anomaly starts
Example 2: Microservices API Monitoring
Scenario
A SaaS platform has 20+ microservices writing to centralized logs. They want to detect service degradation and unusual request patterns.
Custom Setup
# Logs are in JSON format from API gateway
# {"timestamp":"2022-01-15T14:23:45Z","service":"user-api","level":"INFO","message":"Request received"}
# Custom log parser
# src/microservices_parser.py
import json
from datetime import datetime
class MicroservicesLogParser:
def parse_message(self, line):
data = json.loads(line)
timestamp = datetime.fromisoformat(data['timestamp']).strftime("%y%m%d %H%M%S")
# Combine service and message for richer context
message = f"{data['service']}: {data['message']}"
return timestamp, message
# Training configuration (tuned for API logs)
# dockerconfig/text_autoencoder_train.yml
parameters:
embedding_size: 32 # APIs logs are consistent
encoder_output_size: 64
epochs: 5 # Less training data needed
batch_size: 64
bidirectional: false # Logs are mostly sequential
# Anomaly detection config (APIs are fast)
parameters:
sequence_length: 5 # Shorter sequences
lstm_hidden_size: 32
epochs: 8
bidirectional: true # Need context both ways
Results
- Detects service restarts and crashes within 10 seconds
- Catches unusual API request patterns (e.g., spike in 404s)
- Alerts on database connection failures
- Distinguishes between normal traffic spikes and actual anomalies
Example 3: Database Server Monitoring
Scenario
PostgreSQL database server generating detailed query and transaction logs. They need to catch performance degradation and errors.
Log Extraction
# PostgreSQL logs go to syslog
# Extract with proper timestamp format
cat /var/log/postgresql.log | awk '{
# Convert to Utah format: YYMMDD HHMMSS MESSAGE
print strftime("%y%m%d %H%M%S", $1) " " $0
}' > /data/postgres.log
# Or directly from postgres logs with log_line_prefix configured
# log_line_prefix = '[%t] %u@%d %p: '
Tuning for Databases
# Databases have very structured logs
# Tune for high accuracy
parameters:
embedding_size: 64
encoder_output_size: 128 # Richer representation
epochs: 15 # More training for precision
batch_size: 32
bidirectional: true # Transaction context matters
dropout: 0.3 # Prevent overfitting
# Anomaly detection
parameters:
sequence_length: 15 # Longer sequences for transaction context
lstm_hidden_size: 128 # Larger capacity
epochs: 20 # More training
threshold_multiplier: 3.0 # Be conservative with alerts
Detected Anomalies
- Queries taking longer than normal
- Unusual connection patterns
- Transaction deadlocks
- Index corruption or missing indexes
- Disk space issues
Example 4: Kubernetes Cluster Monitoring
Scenario
Kubernetes cluster with 50+ nodes and 500+ pods. They want to detect pod crashes, network issues, and resource exhaustion.
Log Aggregation
# Collect logs from kubelet
cat /var/log/kubelet.log | awk '{print strftime("%y%m%d %H%M%S") " " $0}' > k8s.log
# Or from container runtime (Docker)
journalctl -u docker --no-pager | \
awk '{print strftime("%y%m%d %H%M%S") " " $0}' > container.log
# Combine multiple sources
cat k8s.log container.log | sort -k1,2 > combined.log
K8s-Specific Configuration
# Kubernetes logs are verbose with lots of repetition
# Tune to filter noise
parameters:
root_dir: {ROOT_DATA_DIR}
# Pre-filter unimportant logs
min_word_count: 3 # Ignore very rare events
max_vocab_size: 5000 # K8s has fewer unique patterns than you'd think
# Training
epochs: 12
embedding_size: 48 # K8s logs are somewhat uniform
# Anomaly detection focuses on rare patterns
sequence_length: 8
lstm_hidden_size: 64
epochs: 15
threshold_multiplier: 2.2 # K8s noise is high, be permissive
Detected Issues
- Pod CrashLoopBackOff patterns
- Node NotReady and MemoryPressure conditions
- Network plugin failures
- Scheduler backoff and binding failures
- Volume mount and storage errors
Example 5: Web Application Log Monitoring
Scenario
Django/Flask web application generating logs from multiple processes. They want to detect:
- Internal server errors (500s)
- Authentication failures
- Database query timeouts
- Unusual request volumes
Application Instrumentation
# Flask example with structured logging
import logging
import json
from datetime import datetime
class LogFormatter(logging.Formatter):
def format(self, record):
timestamp = datetime.utcnow().strftime("%y%m%d %H%M%S")
message = record.getMessage()
return f"{timestamp} {message}"
# Configure app logging
handler = logging.FileHandler('app.log')
handler.setFormatter(LogFormatter())
logger = logging.getLogger()
logger.addHandler(handler)
# Log events with consistent format
logger.info("POST /api/users - 201 Created")
logger.error("Database connection timeout: 30s")
logger.warning("Authentication failed: IP 192.168.1.100")
Baseline Configuration
# Web app logs are simple and consistent
parameters:
embedding_size: 32 # Applications speak uniformly
encoder_output_size: 48
epochs: 8
batch_size: 64
# Anomaly detection
sequence_length: 5 # HTTP requests are quick
lstm_hidden_size: 32
threshold_multiplier: 2.0 # Web patterns are distinctive
Example 6: Time-Series Sensor Data (Non-Traditional)
Scenario
Industrial monitoring system with sensor logs. While not traditional text logs, the approach works:
# Sensor logs look like:
220115 14:23:45 SENSOR_001: temperature=42.3°C status=OK
220115 14:23:46 SENSOR_002: pressure=101.2kPa status=OK
220115 14:23:47 SENSOR_001: temperature=42.5°C status=WARN
# They can be processed like text
# The text encoder learns that:
# - temperatures in 40-45°C range are normal
# - status=WARN is less common than OK
# - pressure values cluster around 101kPa
# The anomaly detector learns:
# - SENSOR_001 usually sees gradual temperature changes
# - Rapid jumps (e.g., 42.3 → 48.1) are anomalous
# - Status should not jump from OK to ERROR without WARN
Comparison: Configuration Across Use Cases
| Use Case | embedding_size | sequence_length | epochs | threshold |
|---|---|---|---|---|
| HDFS (complex) | 128 | 20 | 15 | 2.5 |
| Microservices (fast) | 32 | 5 | 8 | 2.0 |
| Database (precise) | 64 | 15 | 20 | 3.0 |
| Kubernetes (noisy) | 48 | 8 | 12 | 2.2 |
| Web App (simple) | 32 | 5 | 8 | 2.0 |
Pattern: Getting Started with Your System
- Characterize your logs: How many messages per day? What's message complexity?
- Collect baseline: 2+ weeks of clean operations
- Start simple: Use default config, run one pipeline
- Evaluate: Check AUC and other metrics
- Tune: Adjust parameters based on results
- Iterate: Retrain as you learn more