← Ye Wang · Homepage

Learning to Stylize by Learning to Destylize

A Scalable Paradigm for Supervised Style Transfer

Ye Wang1, Zili Yi2, Yibo Zhang1,3, Peng Zheng1,3, Xuping Xie1, Jiang Lin2, Yijun Li4, Yilin Wang*†4, and Rui Ma*1,5

1 Jilin University

2 Nanjing University

3 Shanghai Innovation Institute

4 Adobe

5 Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China

* Corresponding authors

† Project lead

Main Paper Supplementary Material Hugging-Face Demo GitHub Hugging-Face Dataset

Overview

Teaser: diverse stylization results across artistic styles
Figure 1. Diverse stylization results of our method across various artistic styles (from the paper).

This project introduces a scalable paradigm for supervised style transfer by reversing the stylization process.

Instead of directly learning to stylize, we propose to learn to destylize, i.e., removing stylistic elements from artistic images to recover their natural counterparts. This enables the construction of authentic, pixel-aligned training pairs at scale.

Key insight

Authentic supervision from real artworks is critical for high-fidelity style transfer.

Motivation

Style transfer aims to render an image in a target artistic style while preserving its semantic content. However, it remains fundamentally ill-posed due to the absence of definitive ground-truth stylization pairs.

1) Existing methods often start from handcrafted losses, which are simple to deploy, but they usually provide weak and indirect supervision for style faithfulness.

2) Feature-statistics-based transfer can capture global appearance trends as a lightweight prior, yet it frequently causes inaccurate style representation and local artifacts.

3) Data-centric pipelines with pseudo stylized pairs offer larger training scale, but they still suffer from content leakage and unreliable pseudo-supervision signals.

Even recent data-centric methods are still bounded by the quality ceiling of pseudo supervision generated by existing models.

Key idea: learn stylize via destylize

Stylization-based vs destylization-based pipelines
Figure 2. Stylization-based vs. destylization-based data generation pipelines.

Instead of synthesizing stylized images, we start from real artworks and remove style to recover natural content.

1) Input: destylized image (content).

2) Target: original artistic image (style).

3) This formulation yields authentic supervision, improved generalization, and elimination of pseudo-supervision artifacts.

Method: DeStylePipe

DeStylePipe framework overview
Figure 3. Overview of the DeStylePipe framework.

We propose DeStylePipe, a progressive multi-stage destylization framework.

Stage 1: Global general destylization

Stage 2: Category-wise instruction adaptation

Stage 3: Specialized model adaptation

At each stage, outputs are filtered for quality control.

DestyleCoT-Filter

DestyleCoT-Filter structured reasoning
Figure 4. DestyleCoT-Filter with structured reasoning for evaluation.

We introduce DestyleCoT-Filter, a Chain-of-Thought-based filtering mechanism.

Content preservation

Style removal

Filtering rule

This ensures high-quality supervision data.

Dataset: DeStyle-350K

DeStyle-350K samples
Figure 5. Representative samples of DeStyle-350K.

We construct DeStyle-350K, a large-scale dataset:

Each triplet:

This dataset provides authentic supervision compared to synthetic alternatives.

Benchmark: BCS-Bench

BCS-Bench distribution
Figure 6. Content and style distribution of BCS-Bench.

We propose BCS-Bench, featuring:

It ensures balanced evaluation across content and style diversity.

Destylization results

Destylization results
Figure 7. High-quality destylization with preserved fine details.

Our method:

Comparison with style transfer methods

Comparison with style transfer methods
Figure 9. Comparison with state-of-the-art style transfer methods.

Our method:

Comparison with image editing models

Comparison with image editing models
Figure 10. Comparison with image editing models.

Compared to existing models:

Quantitative results

Quantitative comparison table 1
Table 1. Evaluation of destylization quality.
Quantitative comparison table 2
Table 2. Comparison between OmniStyle150K and our DeStyle-350K.
Quantitative comparison table 3
Table 3. Comparison of existing style transfer benchmarks and our proposed BCSBench.
Quantitative comparison table 4
Table 4. Quantitative comparison of style transfer methods across multiple metrics.
Quantitative comparison table 5
Table 5. Quantitative comparison of image editing methods across multiple metrics.
Quantitative comparison table 6
Table 6. Extra comparison under different settings.

Robustness

Robustness in challenging scenarios
Figure 12. Performance in challenging scenarios.

Our method maintains:

Ablation: multi-stage refinement

Multi-stage destylization
Figure 8. Progressive improvement across destylization stages.

Conclusion

We introduce a new paradigm for supervised style transfer:

This work demonstrates that destylization is a scalable and reliable solution for style transfer.

More results

Additional qualitative result 1
Result 1
Additional qualitative result 2
Result 2
Additional qualitative result 3
Result 3
Additional qualitative result 4
Result 4
Additional qualitative result 5
Result 5
Additional qualitative result 6
Result 6
Additional qualitative result 7
Result 7
Additional qualitative result 8
Result 8
Additional qualitative result 9
Result 9
Additional qualitative result 10
Result 10