OmniStyle2

Scalable and High Quality Artistic Style Transfer Data Generation via Destylization

1Jilin University   2Nanjing University   3Shanghai Innovation Institute
4Adobe   5Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
*Corresponding authors   Project lead
DST-100K Overview

Comparison of stylization results between OmniStyle2 and representative closed- and open-source image editing models across diverse styles, including 3D origami art, watercolor, poster design, pill-and-candy mosaic, and artistic painting. Please zoom in for details and view it in color.

Abstract

OmniStyle2 introduces a novel approach to artistic style transfer by reframing it as a data problem. Our key insight is destylization, reversing style transfer by removing stylistic elements from artworks to recover natural, style-free counterparts. This yields DST-100K, a large-scale dataset that provides authentic supervision signals by aligning real artistic styles with their underlying content. To build DST-100K, we develop (1) DST, a text-guided destylization model that reconstructs style-free content, and (2) DST-Filter, a multi-stage evaluation model that employs Chain-of-Thought reasoning to automatically discard low-quality pairs while ensuring content fidelity and style accuracy. Leveraging DST-100K, we train OmniStyle2, a simple feed-forward model based on FLUX.1-dev. Despite its simplicity, OmniStyle2 consistently surpasses state-of-the-art methods across both qualitative and quantitative benchmarks. Our results demonstrate that scalable data generation via destylization provides a reliable supervision paradigm, overcoming the fundamental challenge posed by the lack of ground-truth data in artistic style transfer.

Why Destylization?

Destylization vs. Stylization

Destylization Process

(a) Stylization-based data generation pipeline. (b) Destylization-based data generation pipeline (ours). Our method enables authentic supervision with high-quality and style-faithful data, in contrast to stylization-based pipelines that rely on pseudo-supervision, often artifacts-prone and style-unfaithful.

Destylization

DST: Text-Guided Destylization

Destylization Process

(a) Destylization Dataset Construction: we use high-resolution images from HQ-50K and FFHQ as content images, covering six categories: humans, animals, plants, objects, scenes, and architecture. These images are stylized by four models, and captions are generated using InternVL2.5-7B. This yields triplets in the form of stylized-content-caption. (b) The architecture of DST model.

Image Pool

(a) Style image collection and (b) text-guided destylization pipeline.

DST-Filter

Multi-Stage Evaluation Pipeline

DST-Filter Pipeline

The pipeline of DST-Filter. DST-Filter assesses each <style, destylized> pair from two aspects: content preservation and style discrepancy, using GPT-4o with region-level and attribute-level Chain-of-Thought reasoning.

Dataset Overview

DST-100K Dataset Statistics

100K
Image Triplets
669
Artists
117
Art Movements
65
Digital Styles
1K
Resolution
DST-100K Overview

Overview of DST-100K dataset.

Quantitative Results

Quantitative comparison of different style transfer methods

Quantitative Results

Quantitative comparison of the image editing methods

Quantitative Results

Qualitative Comparison

Qualitative comparison with different style transfer methods

Qualitative Comparison

Qualitative comparison with different image editing models

Method Comparison Method Comparison

More Results

Diverse Style Transfer Results

More Results

Our method produces a broader range of stylized results across diverse style categories, including 2D styles such as flat design, PS1 game style, cartoon, line art, illustration, and classic artworks, as well as 3D styles such as origami art, 3D voxel art and 3D low poly rendering.