Skip to content

NovelAI V3 Methods

Alexander R Izquierdo edited this page Dec 21, 2024 · 1 revision

NovelAI V3 Methods

This document outlines the key methodological improvements implemented in NovelAI V3, with particular emphasis on noise scaling for high-resolution coherence.

Maximum Noise Level (σmax) Scaling

The choice of maximum noise level (σmax) critically affects global image coherence, particularly at high resolutions. SDXL's default σmax = 14.6 proves insufficient for maintaining coherence in high-resolution images, leading to artifacts such as multi-body generation issues.

Noise Scaling Theory

The relationship between noise levels and image resolution follows a fundamental scaling principle:

For a resolution increase by factor k:

σ_new = k · σ_base  (length scaling)
σ_variance = k² · σ_base  (area scaling)

This scaling maintains the signal-to-noise ratio (SNR) across resolutions. The quadratic relationship arises from the assumption that signal redundancy scales with image area.

Mathematical Basis

Given an image x₀ with resolution R:

x_t = α_tx₀ + σ_tε, where ε ~ N(0,I)
SNR = ||α_tx₀||² / ||σ_tε||²

To maintain consistent SNR when scaling resolution:

SNR_new = ||α_t(kx₀)||² / ||σ_new_tε||² = SNR_original

Therefore:

σ_new = k · σ_base   (for dimension scaling)
σ_new = k² · σ_base  (for area scaling, assuming full redundancy)

Empirical Results

At standard SDXL resolutions:

  • σmax = 14.6 (default): Shows multi-body artifacts
  • σmax = 29.0 (2x): Resolves global coherence issues
  • σmax ≈ 20000 (∞): Enables proper mean color prediction

Progressive noise sequence example:

σmax = 14.6: [14.6 → 10.8 → 8.3 → 6.6 → 5.4]
σmax = 29.0: [29.0 → 17.8 → 12.4 → 9.2 → 7.2]

Implementation Rule

For practical implementation, follow this scaling rule:

  • When doubling canvas length (4x area): Double σmax
  • This represents an upper bound assuming full signal redundancy
  • The approximation improves at higher resolutions
  • Particularly effective for resolutions > 1024²

Integration with Other Methods

The σmax scaling works in conjunction with:

  • v-prediction parameterization
  • Zero Terminal SNR training
  • Karras noise scheduling (ρ = 7.0)

Together, these methods ensure both local detail preservation and global coherence across all image resolutions.


Note: These methods represent a distinct approach from Flow Matching techniques. See Flow Matching for that alternative approach.