DeepSentry: Unsupervised Anomaly Detection
Welcome to the complete DeepSentry documentation. This manual covers everything from installation to production deployment. Whether you're a DevOps engineer, data scientist, or security professional, you'll find practical guidance for deploying anomaly detection on your logs.
What You'll Learn
- How anomaly detection works without labeled training data
- The complete pipeline: from raw logs to anomaly scores
- How to configure the system for your specific logs
- How to deploy and monitor in production
- Troubleshooting common issues
- Advanced techniques for your specific use case
How to Use This Manual
New to DeepSentry? Start with Chapter 1 (Introduction) and work through sequentially. Each chapter builds on the previous one.
Setting up for the first time? Follow Chapters 2-4 for installation and understanding the pipeline.
Running on your logs? Focus on Chapters 3, 6, and 11 for data preparation, configuration, and real examples.
Deploying to production? Read Chapter 7 (Live Monitoring) and Chapter 12 (Production Best Practices).
Debugging an issue? Jump to Chapter 9 (Troubleshooting) or use the search function.
What This Project Includes
- Text Autoencoder: Compresses log messages into semantic vectors
- Anomaly Detector: Bidirectional LSTM for detecting deviations from normal patterns
- Live Monitor: Real-time scoring with adaptive thresholding
- Evaluation Tools: Metrics, ROC curves, and analysis
- Docker Deployment: Complete containerized pipeline
- Comprehensive Docs: 12-chapter manual with examples and best practices
Chapter Overview
Chapter 1: Introduction – Why unsupervised anomaly detection? How does it work? What will you learn?
Chapter 2: Installation – Get set up with Docker, local Python, or GPU support in 5 minutes.
Chapter 3: Data and Logs – Understand log formats, prepare your data, and meet the HDFS dataset.
Chapter 4: The Pipeline – Walk through all 6 stages from raw logs to live anomalies with detailed diagrams.
Chapter 5: Model Architecture – Deep dive into LSTMs, autoencoders, and why this architecture works.
Chapter 6: Configuration – Tune every parameter; understand impact; optimize for your logs.
Chapter 7: Live Monitoring – Deploy real-time detection; integrate with alerting; handle production scenarios.
Chapter 8: API Reference – CLI commands, Python APIs, file formats, project structure.
Chapter 9: Troubleshooting – Common errors and their solutions; debugging strategies.
Chapter 10: Advanced Topics – Transfer learning, custom formats, optimization, ensemble methods.
Chapter 11: Examples – Real-world use cases: HDFS, microservices, databases, Kubernetes, web apps.
Chapter 12: Production Best Practices – Deployment architecture, security, monitoring, incident response, cost optimization.
Key Concepts at a Glance
| Concept | Explanation | Chapter |
|---|---|---|
| Unsupervised Learning | Learning from unlabeled data; discovers patterns without explicit annotation | 1, 5 |
| Text Autoencoder | Neural network that compresses log messages to fixed-size vectors | 5 |
| LSTM | Recurrent neural network good at learning sequences and temporal patterns | 5 |
| Reconstruction Error | How well the model predicts each step; high error = anomaly | 5 |
| Pipeline | 6-stage process: prepare → text train → encode → anomaly train → eval → live | 4 |
| Configuration | YAML files that control every aspect of training and monitoring | 6 |
| Live Monitoring | Real-time scoring of incoming logs with adaptive anomaly thresholds | 7 |
System Requirements
- CPU: 2+ cores (4+ recommended for faster training)
- Memory: 4GB minimum (8GB+ recommended)
- Storage: 10GB for models and logs
- Python: 3.6+ (3.8+ recommended)
- GPU (optional): NVIDIA GPU + CUDA 11.0+ for 5-10x faster training
- Read Chapter 1 (15 minutes) to understand the approach
- Follow Chapter 2 (10 minutes) to install
- Skim Chapter 3 (10 minutes) to understand data
- Run Chapter 4's pipeline (1-2 hours) on sample HDFS logs
About This Manual
This documentation is written in the style of the Rust Programming Language book: comprehensive, example-rich, and designed to be read sequentially while remaining useful as a reference.
The manual is organized in three tiers:
- Getting Started (Chapters 1-3): High-level concepts and setup
- Core Usage (Chapters 4-7): Running the system and understanding how it works
- Mastery (Chapters 8-12): Reference material, advanced techniques, and production deployment
Each chapter includes:
- Clear explanations with real examples
- ASCII diagrams showing how components work
- Code snippets you can copy and run
- Common pitfalls and how to avoid them
- Links to related chapters
Next Steps
Ready to dive in? Start with Chapter 1: Introduction.
Have a specific goal? Jump to the relevant chapter:
- Install DeepSentry → Chapter 2
- Prepare your logs → Chapter 3
- Run the pipeline → Chapter 4
- Understand the models → Chapter 5
- Configure for your logs → Chapter 6
- Deploy to production → Chapter 7 + Chapter 12
- Fix an issue → Chapter 9
- See real examples → Chapter 11