Neta Art XL

Name: Neta Art XL - v1.0
Rating: 0 (0 reviews)
Author: R_CHET

CHECKPOINT

转载

R_CHET

⚠️ Attention: Please use CFG = 11 as default for testing.

For full release note: https://nieta-art.feishu.cn/wiki/PpwqwVDzjiNE5kkUhRtcEsn6nmh

I. Overview

Introducing Neta Art XL V1.0, the easiest-to-use SDXL Anime model so far.
Keywords: Best Character Coverage, Vivid storytelling, Diverse styles, Stable anatomy.
Major motivation:
- Better stability and anatomy for character visual storytelling purpose:
  - Ordered prompt guide for model to easier follow prompts;
  - A very good balance between better knowledge and stability.
- Maintain a high ceiling standard for aesthetics across versatile anime art styles, while keeping the baseline of output appealing for general users.
- Less loras for characters / styles / artists, so we make better use of static model acceleration techniques.

Characters Coverage - refer to both A3.1 lists and release note.

Prompting Guide

To avoid possible ambiguity in text prompt, and leave room for very complicated scene such as multi-character, we found enforcing an ordering in prompts leads to better instruct-following behaviors (Learn from NAI3 / Animagine3 / AIDXL). Specifically, we use the following order in Neta Art XL:

Tag Order: subject (1boy / 1girl) -> character (a girl named frieren from sousou no frieren series) -> Artist trigger (by xxx) -> race (elf) -> composition (cowboy shot) -> style (impasto style) -> theme (fantasy theme) -> main environment (in the forest, at day) -> background (gradient background) -> action (sitting on ground) -> expression (is expressionless) -> main characteristics (white hair) -> body characteristics (twintails, green eyes, parted lip) -> clothing (wearing a white dress) -> clothing accessories (frills) -> other items (a cat) -> secondary environment (grass, sunshine) -> aesthetics (beautiful color, detailed, aesthetic) -> quality ((best quality:1.3))

Negative prompts: (worst quality:1.3), low quality, lowres, messy, abstract, ugly, disfigured, bad anatomy, draft, deformed hands, fused fingers, signature, text, multi views

Sampler: Eular a normal as default, 28+ steps recommended.

One additional merit of Neta Art XL is that it supports a very wide range of CFGs (5 - 20 compared to 7 - 9 of previous models). While we empirically found higher CFG leads to more details and higher contrast, generally CFG 9 - 14 (important!) can be used for best results.

II. Highlight: Style Versatility

We carefully selected 13 style keys with good orthogonality and are commonly used in many scenarios, justified by usage data from Nieta AI (30M+ generations).

Having orthogonal styles means each style is effectively different from the others, allowing you to easily combine and create new styles without interference.

Neta Art XL also includes a long list of artist styles, activated through by xxx clause.

Please refer to https://civitai.com/models/124189/anime-illust-diffusion-xl for a complete list of supported artists.

III. Expression, Posing, and Camera Angles

Compared to other models, Neta excels at maintaining stability, prompt following ability, and anatomical accuracy even with challenging poses or camera angles that would cause degradation in other models. We compared our results to the second-best candidate models to highlight Neta's advantages in these areas.

IV. Multi-Character Scenes

Neta Art XL demonstrated good stability for multi-character scenes.

V. Text & Typography

Neta Art XL demonstrates good ability to keep poster-like text in good success rate.

VI. Training

Data annotation combining multiple sources (Original prompt, CogVLM captions, WaifuTagger tags)
Post-processing techniques like semantic deduplication and hierarchical tag organization
- Semantic Deduplication: This removed redundant tags by intelligently detecting when a higher-level tag (e.g. very long hair) semantically covered a lower-level one (e.g. long hair).
- Tag Layering Algorithm: Tags were organized into hierarchical layers based on their priorities and related semantics (eg. by wlop influence the whole picture styling, while frills influence a small fraction). More dominant tags were placed in higher layers to prioritize their influence during training.

Dataset management tool from https://github.com/Eugeoter/waifuset

Using high-quality regularization data from AIDXL: High-quality regular datasets with "best" and "amazing" quality ratings from AIDXL. These datasets are manually selected and come with detailed annotations and natural language descriptions.
Finetuning on more knowledgeable base models like AAM, blending with AnimagineXL 3.1 Character Knowledge.

Challenges Faced:

Imbalance in learning different styles
Poor generalization for some styles to diverse scenes
Lack of details/texture in generations
Trigger word overlap with base model knowledge

Solutions Explored:

Data reweighting to balance style learning, and supplement diverse data per style.
Tuning sampling hyperparameters like minimum gamma and rectified flow. Rectified Flow is a training parameter that increases the sampling frequency in the middle time steps but weakens the weight of the model's learning ability for small noises in the low time steps. This technique helps to improve the model's ability to restore styles but requires the use of a knowledge-rich base model.
Randomizing / drop off trigger words during training.

VII. Evaluation

Neta XL Art excels other models in

See https://nieta-art.feishu.cn/wiki/PpwqwVDzjiNE5kkUhRtcEsn6nmh for full evaluation

VIII. License

Developed with ❤️ by: Neta.art Lab - https://civitai.com/user/nieta_art
In collaboration with:
- Euge: https://civitai.com/user/Euge_
- 汤人烂: https://space.bilibili.com/8594480
- Chenkin: https://civitai.com/user/Chenkin
- Bo Dai: https://daibo.info/
Thanks to:
Model type: Diffusion-based text-to-image generative model
License: We merged 0.05 CLIP and 0.15 UNet input layers from Animagine 3.1, thus Fair AI Public License 1.0-SD

IX. Conclusion and Future Work

Shortcomings:

Some characters are underfitted.
Styles are not activated well with long prompts.
Certain styles appear grayish at low CFG and short prompts. Partly explained in https://civitai.com/articles/4969.

Future Work:

Prepare larger training sets and more knowledge-based data to improve character, style, and detail handling.
Welcome others to join discussions, provide suggestions, and contribute to model advancement.

Neta Art XL 2.0 is on the way. Stay tuned with us, and test our product for FREE: http://neta.art/

Discord: https://discord.gg/AtRtbe9W8w

Twitter: https://twitter.com/netaart_ai

Civitai：https://civitai.com/user/nieta_art