C21: Mastering Image-to-Image Prompting and CFG Scale in ComfyUI

Introduction

In our previous article, “Image-to-Image Transformation in ComfyUI”, we explored the fundamentals of building an image-to-image workflow. We discussed how latent space diffusion enables creative transformations while preserving essential elements of the input image. If you’re new to ComfyUI or image-to-image workflows, we recommend reading that article first to understand the basics.

In this follow-up guide, we’ll dive deeper into prompt crafting, Classifier-Free Guidance (CFG) Scale, and denoise strength—three critical factors that influence the quality and style of your generated images. By optimizing these parameters, you can achieve a perfect balance between artistic creativity and adherence to the original image.

What Is CFG Scale and Why Does It Matter?

Classifier-Free Guidance (CFG) Scale is a key parameter in generative AI workflows that determines how strongly the model adheres to the provided prompt. A higher CFG scale gives the prompt more influence, while a lower CFG scale allows the model to incorporate more randomness from its latent space.

CFG scale is especially important in image-to-image workflows where you want to retain specific details from the input image while introducing stylistic changes.

Key Parameters in Image-to-Image Prompting

1. Crafting Effective Prompts

When creating prompts for image-to-image workflows, it’s essential to include both content and style elements. A well-crafted prompt ensures that the model generates images that align with your creative vision.

Example Prompts:

Content Prompt:
Majestic mountain peaks, lush green valleys, golden sunlight streaming through clouds, atmospheric mist, serene and tranquil.
Style Prompt:
Watercolor painting, soft brush strokes, vibrant yet natural color palette, artistic impression.
Negative Prompt:
No buildings, no vehicles, no people, no artificial objects.

Negative prompts are crucial for eliminating unwanted elements that could detract from the intended composition.

2. Denoise Strength

The denoise strength parameter controls how much of the original image is overridden by the generative process.

High Denoise (e.g., 1.0): Completely transforms the input image based on the prompt.
Medium Denoise (e.g., 0.5): Balances the input image with stylistic modifications.
Low Denoise (e.g., 0.0–0.3): Retains most of the original image, applying minimal artistic changes.

3. Scheduler and Sampler Settings

ComfyUI offers a variety of samplers and schedulers to fine-tune the image generation process:

DPM++2M Sampler: Known for precision and high-quality outputs.
Karras Scheduler: Optimized for faster results with fewer steps, focusing on the initial stages of diffusion.

Workflow Optimization

Step 1: Adjusting CFG Scale

Increasing the CFG scale amplifies the influence of the prompt, allowing for more stylized outputs. For example:

CFG Scale of 7–8: Produces balanced results that blend the prompt and the original image.
CFG Scale of 10–12: Strong prompt adherence, but may lead to overfitting or hallucinations.

When using high CFG scales, it’s essential to pair them with appropriate denoise values to avoid excessive deviations from the input image.

Step 2: Refining Denoise Settings

To achieve the desired balance between the original image and artistic style:

Start with a denoise value of 0.5 for moderate diffusion.
Gradually lower the denoise value in small increments (e.g., 0.45, 0.44) to refine details while preserving the original image.
Avoid values near 0.0, as they will produce results almost identical to the input image.

Step 3: Using Karras Scheduler for Efficiency

The Karras Scheduler allows for fewer steps while maintaining high-quality results. For instance:

30 Steps: Produces detailed, refined outputs.
8 Steps: Generates faster, more randomized compositions.

Example Results

Comparing Denoise Values

Using an image of a mountain landscape, the following results were observed:

Denoise 1.0: Fully stylized watercolor painting with no resemblance to the original image.

Denoise 0.5: Balanced artistic rendering with recognizable pinnacles.

Denoise 0.3: Almost identical to the original image, with subtle artistic enhancements.

Denoise 0.0: Purely resampled image, indistinguishable from the input.

CFG Scale Impact

With a CFG Scale of 10 and a denoise of 0.4, the final image retained the artistic style while closely matching the original photo’s details. However, increasing CFG too much (e.g., CFG 12 with denoise 1.0) resulted in overfitting, producing unrealistic results.

Tips for Fine-Tuning

Experiment with CFG Scale:
- Start with a moderate value (e.g., 8) and adjust based on your desired output.
- Avoid excessively high values unless the prompt requires strong influence.
Use Negative Prompts:
- Eliminate unwanted elements like buildings, vehicles, or artificial objects.
Optimize Steps with Karras Scheduler:
- Use fewer steps (e.g., 8) for faster results.
- Increase steps (e.g., 30) for detailed outputs.
Iterate Denoise Values:
- Refine denoise settings in small increments (e.g., 0.5 → 0.45 → 0.44) for the best balance.

Conclusion

Prompting and CFG scale are powerful tools in image-to-image workflows, enabling users to control the artistic style and fidelity of generated images. By carefully adjusting parameters like denoise strength, CFG scale, and sampler settings, users can create stunning transformations that respect the original image while incorporating creative enhancements.

If you’re looking to dive deeper into image-to-image workflows, revisit our previous article and experiment with the techniques outlined here to refine your results further.