DeepSentry Book

DeepSentry: Unsupervised Anomaly Detection

Welcome to the complete DeepSentry documentation. This manual covers everything from installation to production deployment. Whether you're a DevOps engineer, data scientist, or security professional, you'll find practical guidance for deploying anomaly detection on your logs.

What You'll Learn

How anomaly detection works without labeled training data
The complete pipeline: from raw logs to anomaly scores
How to configure the system for your specific logs
How to deploy and monitor in production
Troubleshooting common issues
Advanced techniques for your specific use case

How to Use This Manual

New to DeepSentry? Start with Chapter 1 (Introduction) and work through sequentially. Each chapter builds on the previous one.

Setting up for the first time? Follow Chapters 2-4 for installation and understanding the pipeline.

Running on your logs? Focus on Chapters 3, 6, and 11 for data preparation, configuration, and real examples.

Deploying to production? Read Chapter 7 (Live Monitoring) and Chapter 12 (Production Best Practices).

Debugging an issue? Jump to Chapter 9 (Troubleshooting) or use the search function.

What This Project Includes

Text Autoencoder: Compresses log messages into semantic vectors
Anomaly Detector: Bidirectional LSTM for detecting deviations from normal patterns
Live Monitor: Real-time scoring with adaptive thresholding
Evaluation Tools: Metrics, ROC curves, and analysis
Docker Deployment: Complete containerized pipeline
Comprehensive Docs: 12-chapter manual with examples and best practices

Chapter Overview

Chapter 1: Introduction – Why unsupervised anomaly detection? How does it work? What will you learn?

Chapter 2: Installation – Get set up with Docker, local Python, or GPU support in 5 minutes.

Chapter 3: Data and Logs – Understand log formats, prepare your data, and meet the HDFS dataset.

Chapter 4: The Pipeline – Walk through all 6 stages from raw logs to live anomalies with detailed diagrams.

Chapter 5: Model Architecture – Deep dive into LSTMs, autoencoders, and why this architecture works.

Chapter 6: Configuration – Tune every parameter; understand impact; optimize for your logs.

Chapter 7: Live Monitoring – Deploy real-time detection; integrate with alerting; handle production scenarios.

Chapter 8: API Reference – CLI commands, Python APIs, file formats, project structure.

Chapter 9: Troubleshooting – Common errors and their solutions; debugging strategies.

Chapter 10: Advanced Topics – Transfer learning, custom formats, optimization, ensemble methods.

Chapter 11: Examples – Real-world use cases: HDFS, microservices, databases, Kubernetes, web apps.

Chapter 12: Production Best Practices – Deployment architecture, security, monitoring, incident response, cost optimization.

Key Concepts at a Glance

Concept	Explanation	Chapter
Unsupervised Learning	Learning from unlabeled data; discovers patterns without explicit annotation	1, 5
Text Autoencoder	Neural network that compresses log messages to fixed-size vectors	5
LSTM	Recurrent neural network good at learning sequences and temporal patterns	5
Reconstruction Error	How well the model predicts each step; high error = anomaly	5
Pipeline	6-stage process: prepare → text train → encode → anomaly train → eval → live	4
Configuration	YAML files that control every aspect of training and monitoring	6
Live Monitoring	Real-time scoring of incoming logs with adaptive anomaly thresholds	7

System Requirements

CPU: 2+ cores (4+ recommended for faster training)
Memory: 4GB minimum (8GB+ recommended)
Storage: 10GB for models and logs
Python: 3.6+ (3.8+ recommended)
GPU (optional): NVIDIA GPU + CUDA 11.0+ for 5-10x faster training

Getting Started Right Now?

Read Chapter 1 (15 minutes) to understand the approach
Follow Chapter 2 (10 minutes) to install
Skim Chapter 3 (10 minutes) to understand data
Run Chapter 4's pipeline (1-2 hours) on sample HDFS logs

That's it! You'll have a working anomaly detector by evening.

About This Manual

This documentation is written in the style of the Rust Programming Language book: comprehensive, example-rich, and designed to be read sequentially while remaining useful as a reference.

The manual is organized in three tiers:

Getting Started (Chapters 1-3): High-level concepts and setup
Core Usage (Chapters 4-7): Running the system and understanding how it works
Mastery (Chapters 8-12): Reference material, advanced techniques, and production deployment

Each chapter includes:

Clear explanations with real examples
ASCII diagrams showing how components work
Code snippets you can copy and run
Common pitfalls and how to avoid them
Links to related chapters

Next Steps

Ready to dive in? Start with Chapter 1: Introduction.

Have a specific goal? Jump to the relevant chapter:

Install DeepSentry → Chapter 2
Prepare your logs → Chapter 3
Run the pipeline → Chapter 4
Understand the models → Chapter 5
Configure for your logs → Chapter 6
Deploy to production → Chapter 7 + Chapter 12
Fix an issue → Chapter 9
See real examples → Chapter 11