Learning to Stylize by Learning to Destylize

A Scalable Paradigm for Supervised Style Transfer

Ye Wang¹, Zili Yi², Yibo Zhang^1,3, Peng Zheng^1,3, Xuping Xie¹, Jiang Lin², Yijun Li⁴, Yilin Wang^*†4, and Rui Ma^*1,5

¹ Jilin University

² Nanjing University

³ Shanghai Innovation Institute

⁴ Adobe

⁵ Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China

* Corresponding authors

† Project lead

Main Paper Hugging-Face Demo GitHub Hugging-Face Dataset

Overview

Teaser: diverse stylization results across artistic styles — Figure 1. Diverse stylization results of our method across various artistic styles (from the paper).

This project introduces a scalable paradigm for supervised style transfer by reversing the stylization process.

Instead of directly learning to stylize, we propose to learn to destylize, i.e., removing stylistic elements from artistic images to recover their natural counterparts. This enables the construction of authentic, pixel-aligned training pairs at scale.

Key insight

Authentic supervision from real artworks is critical for high-fidelity style transfer.

Motivation

Style transfer aims to render an image in a target artistic style while preserving its semantic content. However, it remains fundamentally ill-posed due to the absence of definitive ground-truth stylization pairs.

1) Existing methods often start from handcrafted losses, which are simple to deploy, but they usually provide weak and indirect supervision for style faithfulness.

2) Feature-statistics-based transfer can capture global appearance trends as a lightweight prior, yet it frequently causes inaccurate style representation and local artifacts.

3) Data-centric pipelines with pseudo stylized pairs offer larger training scale, but they still suffer from content leakage and unreliable pseudo-supervision signals.

Even recent data-centric methods are still bounded by the quality ceiling of pseudo supervision generated by existing models.

Key idea: learn stylize via destylize

Stylization-based vs destylization-based pipelines — Figure 2. Stylization-based vs. destylization-based data generation pipelines.

Instead of synthesizing stylized images, we start from real artworks and remove style to recover natural content.

1) Input: destylized image (content).

2) Target: original artistic image (style).

3) This formulation yields authentic supervision, improved generalization, and elimination of pseudo-supervision artifacts.

Method: DeStylePipe

DeStylePipe framework overview — Figure 3. Overview of the DeStylePipe framework.

We propose DeStylePipe, a progressive multi-stage destylization framework.

Stage 1: Global general destylization

Universal instruction-based destylization
Handles simple styles efficiently

Stage 2: Category-wise instruction adaptation

Style-aware prompt rewriting
Improved performance on complex styles

Stage 3: Specialized model adaptation

Fine-tuning for difficult styles (e.g., clay, origami)

At each stage, outputs are filtered for quality control.

DestyleCoT-Filter

DestyleCoT-Filter structured reasoning — Figure 4. DestyleCoT-Filter with structured reasoning for evaluation.

We introduce DestyleCoT-Filter, a Chain-of-Thought-based filtering mechanism.

Content preservation

Region-level object identification
Object consistency checking

Style removal

Style attribute decomposition
Residual style analysis

Filtering rule

Content score ≥ 4
Style removal score ≥ 4

This ensures high-quality supervision data.

Dataset: DeStyle-350K

DeStyle-350K samples — Figure 5. Representative samples of DeStyle-350K.

We construct DeStyle-350K, a large-scale dataset:

350K triplets
500+ styles
100+ content categories

Each triplet:

destylized image (content)
reference image
style image (target)

This dataset provides authentic supervision compared to synthetic alternatives.

Benchmark: BCS-Bench

BCS-Bench distribution — Figure 6. Content and style distribution of BCS-Bench.

We propose BCS-Bench, featuring:

81 styles × 60 contents
7 semantic categories
1024 resolution
4,860 pairs

It ensures balanced evaluation across content and style diversity.

Destylization results

Destylization results — Figure 7. High-quality destylization with preserved fine details.

Our method:

removes diverse artistic styles
preserves fine-grained structures
maintains pixel-level alignment

Comparison with style transfer methods

Comparison with style transfer methods — Figure 9. Comparison with state-of-the-art style transfer methods.

Our method:

avoids content leakage
preserves identity and structure
achieves stronger style fidelity

Comparison with image editing models

Comparison with image editing models — Figure 10. Comparison with image editing models.

Compared to existing models:

better content preservation
less semantic leakage
more accurate style transfer

Quantitative results

Quantitative comparison table 1 — Table 1. Evaluation of destylization quality.

Quantitative comparison table 2 — Table 2. Comparison between OmniStyle150K and our DeStyle-350K.

Quantitative comparison table 3 — Table 3. Comparison of existing style transfer benchmarks and our proposed BCSBench.

Quantitative comparison table 4 — Table 4. Quantitative comparison of style transfer methods across multiple metrics.

Quantitative comparison table 5 — Table 5. Quantitative comparison of image editing methods across multiple metrics.

Quantitative comparison table 6 — Table 6. Extra comparison under different settings.

Robustness

Robustness in challenging scenarios — Figure 12. Performance in challenging scenarios.

Our method maintains:

structural consistency in complex scenes
robustness to occlusion and fine details

Ablation: multi-stage refinement

Multi-stage destylization — Figure 8. Progressive improvement across destylization stages.

Stage 1: incomplete removal
Stage 2: improved results
Stage 3: fully natural outputs for complex styles

Conclusion

We introduce a new paradigm for supervised style transfer:

Learn stylization via destylization
Build authentic large-scale supervision
Achieve superior stylization quality

This work demonstrates that destylization is a scalable and reliable solution for style transfer.

More results

Additional qualitative result 1 — Result 1

Additional qualitative result 2 — Result 2

Additional qualitative result 3 — Result 3

Additional qualitative result 4 — Result 4

Additional qualitative result 5 — Result 5

Additional qualitative result 6 — Result 6

Additional qualitative result 7 — Result 7

Additional qualitative result 8 — Result 8

Additional qualitative result 9 — Result 9

Additional qualitative result 10 — Result 10