-
Notifications
You must be signed in to change notification settings - Fork 0
NovelAI V3 Methods
This document outlines the key methodological improvements implemented in NovelAI V3, with particular emphasis on noise scaling for high-resolution coherence.
The choice of maximum noise level (σmax) critically affects global image coherence, particularly at high resolutions. SDXL's default σmax = 14.6 proves insufficient for maintaining coherence in high-resolution images, leading to artifacts such as multi-body generation issues.
The relationship between noise levels and image resolution follows a fundamental scaling principle:
For a resolution increase by factor k:
σ_new = k · σ_base (length scaling)
σ_variance = k² · σ_base (area scaling)
This scaling maintains the signal-to-noise ratio (SNR) across resolutions. The quadratic relationship arises from the assumption that signal redundancy scales with image area.
Given an image x₀ with resolution R:
x_t = α_tx₀ + σ_tε, where ε ~ N(0,I)
SNR = ||α_tx₀||² / ||σ_tε||²
To maintain consistent SNR when scaling resolution:
SNR_new = ||α_t(kx₀)||² / ||σ_new_tε||² = SNR_original
Therefore:
σ_new = k · σ_base (for dimension scaling)
σ_new = k² · σ_base (for area scaling, assuming full redundancy)
At standard SDXL resolutions:
- σmax = 14.6 (default): Shows multi-body artifacts
- σmax = 29.0 (2x): Resolves global coherence issues
- σmax ≈ 20000 (∞): Enables proper mean color prediction
Progressive noise sequence example:
σmax = 14.6: [14.6 → 10.8 → 8.3 → 6.6 → 5.4]
σmax = 29.0: [29.0 → 17.8 → 12.4 → 9.2 → 7.2]
For practical implementation, follow this scaling rule:
- When doubling canvas length (4x area): Double σmax
- This represents an upper bound assuming full signal redundancy
- The approximation improves at higher resolutions
- Particularly effective for resolutions > 1024²
The σmax scaling works in conjunction with:
- v-prediction parameterization
- Zero Terminal SNR training
- Karras noise scheduling (ρ = 7.0)
Together, these methods ensure both local detail preservation and global coherence across all image resolutions.
Note: These methods represent a distinct approach from Flow Matching techniques. See Flow Matching for that alternative approach.