DeepSentry: Unsupervised Anomaly Detection

Welcome to the complete DeepSentry documentation. This manual covers everything from installation to production deployment. Whether you're a DevOps engineer, data scientist, or security professional, you'll find practical guidance for deploying anomaly detection on your logs.

What You'll Learn

  • How anomaly detection works without labeled training data
  • The complete pipeline: from raw logs to anomaly scores
  • How to configure the system for your specific logs
  • How to deploy and monitor in production
  • Troubleshooting common issues
  • Advanced techniques for your specific use case

How to Use This Manual

New to DeepSentry? Start with Chapter 1 (Introduction) and work through sequentially. Each chapter builds on the previous one.

Setting up for the first time? Follow Chapters 2-4 for installation and understanding the pipeline.

Running on your logs? Focus on Chapters 3, 6, and 11 for data preparation, configuration, and real examples.

Deploying to production? Read Chapter 7 (Live Monitoring) and Chapter 12 (Production Best Practices).

Debugging an issue? Jump to Chapter 9 (Troubleshooting) or use the search function.

What This Project Includes

  • Text Autoencoder: Compresses log messages into semantic vectors
  • Anomaly Detector: Bidirectional LSTM for detecting deviations from normal patterns
  • Live Monitor: Real-time scoring with adaptive thresholding
  • Evaluation Tools: Metrics, ROC curves, and analysis
  • Docker Deployment: Complete containerized pipeline
  • Comprehensive Docs: 12-chapter manual with examples and best practices

Chapter Overview

Chapter 1: Introduction – Why unsupervised anomaly detection? How does it work? What will you learn?

Chapter 2: Installation – Get set up with Docker, local Python, or GPU support in 5 minutes.

Chapter 3: Data and Logs – Understand log formats, prepare your data, and meet the HDFS dataset.

Chapter 4: The Pipeline – Walk through all 6 stages from raw logs to live anomalies with detailed diagrams.

Chapter 5: Model Architecture – Deep dive into LSTMs, autoencoders, and why this architecture works.

Chapter 6: Configuration – Tune every parameter; understand impact; optimize for your logs.

Chapter 7: Live Monitoring – Deploy real-time detection; integrate with alerting; handle production scenarios.

Chapter 8: API Reference – CLI commands, Python APIs, file formats, project structure.

Chapter 9: Troubleshooting – Common errors and their solutions; debugging strategies.

Chapter 10: Advanced Topics – Transfer learning, custom formats, optimization, ensemble methods.

Chapter 11: Examples – Real-world use cases: HDFS, microservices, databases, Kubernetes, web apps.

Chapter 12: Production Best Practices – Deployment architecture, security, monitoring, incident response, cost optimization.

Key Concepts at a Glance

Concept Explanation Chapter
Unsupervised Learning Learning from unlabeled data; discovers patterns without explicit annotation 1, 5
Text Autoencoder Neural network that compresses log messages to fixed-size vectors 5
LSTM Recurrent neural network good at learning sequences and temporal patterns 5
Reconstruction Error How well the model predicts each step; high error = anomaly 5
Pipeline 6-stage process: prepare → text train → encode → anomaly train → eval → live 4
Configuration YAML files that control every aspect of training and monitoring 6
Live Monitoring Real-time scoring of incoming logs with adaptive anomaly thresholds 7

System Requirements

  • CPU: 2+ cores (4+ recommended for faster training)
  • Memory: 4GB minimum (8GB+ recommended)
  • Storage: 10GB for models and logs
  • Python: 3.6+ (3.8+ recommended)
  • GPU (optional): NVIDIA GPU + CUDA 11.0+ for 5-10x faster training
Getting Started Right Now?
  1. Read Chapter 1 (15 minutes) to understand the approach
  2. Follow Chapter 2 (10 minutes) to install
  3. Skim Chapter 3 (10 minutes) to understand data
  4. Run Chapter 4's pipeline (1-2 hours) on sample HDFS logs
That's it! You'll have a working anomaly detector by evening.

About This Manual

This documentation is written in the style of the Rust Programming Language book: comprehensive, example-rich, and designed to be read sequentially while remaining useful as a reference.

The manual is organized in three tiers:

  • Getting Started (Chapters 1-3): High-level concepts and setup
  • Core Usage (Chapters 4-7): Running the system and understanding how it works
  • Mastery (Chapters 8-12): Reference material, advanced techniques, and production deployment

Each chapter includes:

  • Clear explanations with real examples
  • ASCII diagrams showing how components work
  • Code snippets you can copy and run
  • Common pitfalls and how to avoid them
  • Links to related chapters

Next Steps

Ready to dive in? Start with Chapter 1: Introduction.

Have a specific goal? Jump to the relevant chapter: