Skip to content

Architecture Overview

Alexander R Izquierdo edited this page Dec 21, 2024 · 1 revision

Architecture Overview

The SDXL Training Framework is organized into modular components that handle different aspects of the training pipeline. This document outlines the core architecture and explains how different components interact.

Core Components

1. Core System (src/core/)

The core system provides fundamental infrastructure:

Memory Management (core/memory/)

  • memory.py - Base memory management utilities
  • tensor.py - Tensor optimization operations
  • layer_offload.py - Layer-wise model offloading
  • throughput.py - Training throughput optimization
  • optimizations.py - Memory optimization strategies

Logging & Metrics (core/logging/)

  • wandb.py - Weights & Biases integration
  • metrics.py - Training metrics collection
  • logging.py - General logging infrastructure

Validation (core/validation/)

  • text_to_image.py - Image generation validation

2. Data Pipeline (src/data/)

Handles dataset management and preprocessing:

Preprocessing (data/preprocessing/)

  • pipeline.py - Main preprocessing pipeline
  • latents.py - Latent space handling
  • cache_manager.py - Dataset caching
  • tag_weighter.py - Text prompt weighting
  • Custom exceptions in exceptions.py

Dataset Management

  • dataset.py - Dataset implementation
  • config.py - Dataset configuration
  • Path utilities in utils/paths.py

3. Model Architecture (src/models/)

Model implementations and adaptations:

  • sdxl.py - Main SDXL model implementation
  • base.py - Base model classes
  • Adapters (LoRA) in adapters/lora.py
  • CLIP encoders in encoders/clip.py

4. Training Methods (src/training/)

Training implementations and scheduling:

  • trainer.py - Main trainer implementation
  • Methods:
    • methods/base.py - Base trainer class
    • methods/ddpm_trainer.py - DDPM implementation
    • methods/flow_matching_trainer.py - Flow Matching implementation
  • Schedulers:
    • schedulers/noise_scheduler.py - Noise scheduling

System Flow

graph TD
    A[Data Pipeline] --> B[Preprocessing]
    B --> C[Training Pipeline]
    C --> D[Model]
    D --> E[Memory Management]
    C --> F[Logging & Metrics]
    D --> G[Validation]
Loading

Key Features

Memory Management

  • Dynamic layer offloading
  • Tensor optimization
  • Throughput optimization
  • Memory usage tracking

Training Pipeline

  • Modular trainer implementation
  • Support for multiple training methods
  • Configurable noise scheduling
  • Distributed training support

Monitoring & Validation

  • Comprehensive W&B integration
  • Real-time metric tracking
  • Text-to-image validation
  • Performance profiling

Configuration

The system is configured through src/config.yaml, which controls:

  • Training parameters
  • Model architecture
  • Memory optimizations
  • Logging settings
  • Dataset configuration

Design Principles

  1. Modularity: Components are designed to be independent and replaceable
  2. Extensibility: Easy to add new training methods or model architectures
  3. Memory Efficiency: Built-in memory optimization at all levels
  4. Research Focus: Comprehensive logging and validation for experiments

Future Architecture Plans

Areas planned for architectural expansion:

  1. Additional training methods
  2. Enhanced memory optimizations
  3. Extended validation metrics
  4. Improved caching strategies

Contributing

When adding new components:

  1. Follow the existing module structure
  2. Add appropriate logging hooks
  3. Implement memory optimizations
  4. Include validation methods
  5. Update configuration schemas

See Development Setup for environment setup and Research Guidelines for contribution standards.


Next: Explore Training Pipeline for detailed training process documentation.