-
Notifications
You must be signed in to change notification settings - Fork 16
Prompt Interpolation
Original wiki text from AUTOMATIC webui wiki
Prompt interpolation makes it possible to interpolate between the embeddings of multiple prompts over time. The syntax is a list of prompts followed by a list of step numbers. Each number indicates the specific step at which the corresponding prompt should be used:
[prompt1:prompt2:prompt3:...:when1,when2,when3,...]
prompt1
, prompt2
, prompt3
and so on are separate prompts, and when1
, when2
, when3
and so on are numbers that indicate at which steps which prompt should be used. If the sampling step is in-between when1
and when2
, for example, then an embedding that is a weighted average of both when1
and when2
is used instead.
A smaller number for whenN
has a bigger impact on the structure and placement of all objects in the final image, because it will be sampled earlier. A greater number will have a more fine-grained impact on the whole picture, affecting mainly the specific details and features of the image.
The further away the interpolated embeddings are from each other, the more sampling steps is required to get good results.
In case there are at least 3 prompts used in the interpolation, one of the following interpolation curves can be used to interpolate between the embeddings of the prompts:
-
linear
: interpolate linearly between the embeddings corresponding to each prompt -
catmull
: an artificial embeddings is created at each step that follows a catmull-rom curve that passes through all prompts' embeddings -
bezier
: an artificial embeddings is created at each step that follows a bezier curve with every prompt embedding used as a control point
By default, the interpolation curve is linear. If you want to use another function, write its name as an additional argument to the interpolation expression.
Here is an example with prompts p1
, p2
, p3
at steps 0
, 1
, 9
using catmull
as the curve type:
[p1 : p2 : p3 : -1, 0, 8 : catmull]
Please note that step numbers are off by one n-1
to stay coherent with the existing prompt features for AUTOMATIC webui.
It is also possible to use step values between 0 and 1 to indicate a ratio of the total number of samples. For example, if the total number of steps is 20, [a : b : .25, .75]
will start interpolating at step 5 (inclusive) and stop at step 15 (exclusive).
Using multiple interpolations in the same prompt is possible. There are two ways to achieve this:
You can put multiple interpolations side by side:
a supersonic [ rocket : starship : 2, 7 ] leaving behind [ fire : gas : 1, 5 ]
This will create an "interpolatable embedding" that can represent all possible prompts. In this case, the possible prompts are:
-
a supersonic
rocket
leaving behind
fire
-
a supersonic
rocket
leaving behind
gas
-
a supersonic
starship
leaving behind
fire
-
a supersonic
starship
leaving behind
gas
The interpolatable embedding is then sampled once for each step of the sampling process to generate the actual embeddings that will be used to produce the output image. In this case, if sampler has 10 steps (=> steps 0 to 9):
You can use as many side-by-side interpolations as you want. Note that as of right now, n
concurrent interpolations will generate an exponential number of possible promts (at least O(2^n)
in time and space complexity). This means that using more than 8 or 9 concurrent interpolations will probably hang the sampling process before it even has a chance to start (and so stopping it with the "Interrupt" button will not work). On my i5-12600KF, it took ~20 seconds with 10 concurrent interpolations.
You can also nest multiple interpolations:
a supersonic [ rocket : [ starship : massive structure : 0.1, 0.6 ] : 0.1, 0.6 ]
One way to understand this construction is as a linear interpolation where one of the control points is fixed (rocket
) and the other is moving ([ starship : massive structure : 0.1, 0.6 ]
).
This feature can also be understood as a generalization of a bezier curve, where each control point is arbitrarily chosen. If the points are chosen carefully, then nested interpolations can effectively be made to create a bezier curve.
For example, these prompts are equivalent (except for differences caused by precision errors):
[ a : b : c : , , : bezier]
[ [ a : b : , ]
: [ b : c : , ]
: , ]
-
[to:when]
- addsto
to the prompt after a fixed number of steps (when
) -
[from::when]
- removesfrom
from the prompt after a fixed number of steps (when
) -
[from:to:when]
- starts withfrom
and switches toto
after a fixed number of steps (when
) -
[a:b:from,to]
- interpolate froma
tob
starting at stepfrom
and stopping at stepto
-
[a:b:,when]
- same as[a:b:-1,when]
; interpolate froma
tob
from the beginning of the sampling process towhen
-
[a:b:when,]
- same as[a:b:when,total]
wheretotal
is the total number of steps of the sampling process -
[a:b:,]
- same as[a:b:-1,total]
; interpolate during the whole sampling process froma
tob
-
[a:b:c:,,]
- conceptually equivalent to[a:b:c:-1,(total + 1)/2 - 1,total]
; interpolate during the whole sampling process froma
tob
toc
, whereb
appears at the middle sampling step -
[a:b:c:d:from,,,to]
- same as[a:b:c:d:from,when_b,when_c,to]
wherefrom
,when_b
,when_c
andto
are evenly spaced across the sampling schedule
Prompt: a [fantasy:cyberpunk:3,7] landscape
- At start, the model will be drawing a fantasy landscape.
- After step 3 (so at step 4) still only includes fantasy.
- Between step 4 (exclusive) and step 8 (exclusive), the embeddings corresponding to
fantasy
andcyberpunk
are interpolated linearly:- At step 5, it will be
.75*embeddings('a fantasy landscape') + .25*embeddings('a cyberpunk landscape')
. - At step 6, it will be
.50*embeddings('a fantasy landscape') + .50*embeddings('a cyberpunk landscape')
. - At step 7, it will be
.25*embeddings('a fantasy landscape') + .75*embeddings('a cyberpunk landscape')
.
- At step 5, it will be
- After step 7 (so at step 8), it will switch to drawing a cyberpunk landscape, continuing from where it stopped at step 7.
Prompt:
masterpiece, best quality,
[exploding huge bright fire ball in the night sky
:a magical landscape filled with fairies and dragons
:a mysterious and menacing medieval landscape
:,0,18
:catmull]
(sampler has 20 steps)
- at start (after step -1 (since it is omitted), so step 0):
masterpiece, best quality, exploding huge bright fire ball in the night sky
- after step 0 (so at step 1):
masterpiece, best quality, a magical landscape filled with fairies and dragons
- between step 2 and step 18 (both included), sample the catmull curve passing by all prompt embeddings from
t=2/19
tot=18/19
- after step 18 (so at step 19):
masterpiece, best quality, a mysterious and menacing medieval landscape