1
3
submitted 22 hours ago* (last edited 22 hours ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com
2
4
3
4
4
9
5
8
submitted 1 week ago* (last edited 1 week ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

Quoted from Reddit:

Hello r/StableDiffusion --

A sincere thanks to the overwhelming engagement and insightful discussions following our announcement yesterday of the Open Model Initiative. If you missed it, check it out here.

We know there are a lot of questions, and some healthy skepticism about the task ahead. We'll share more details as plans are formalized -- We're taking things step by step, seeing who's committed to participating over the long haul, and charting the course forwards. 

That all said - With as much community and financial/compute support as is being offered, I have no hesitation that we have the fuel needed to get where we all aim for this to take us. We just need to align and coordinate the work to execute on that vision.

We also wanted to officially announce and welcome some folks to the initiative, who will support with their expertise on model finetuning, datasets, and model training:

  • AstraliteHeart, founder of PurpleSmartAI and creator of the very popular PonyXL models
  • Some of the best model finetuners including Robbert "Zavy" van Keppel and Zovya
  • Simo Ryu, u/cloneofsimo, a well-known contributor to Open Source AI 
  • Austin, u/AutoMeta, Founder of Alignment Lab AI
  • Vladmandic & SD.Next
  • And over 100 other community volunteers, ML researchers, and creators who have submitted their request to support the project

Due to voiced community concern, we’ve discussed with LAION and agreed to remove them from formal participation with the initiative at their request. Based on conversations occurring within the community we’re confident that we’ll be able to effectively curate the datasets needed to support our work. 


Frequently Asked Questions (FAQs) for the Open Model Initiative

We’ve compiled a FAQ to address some of the questions that were coming up over the past 24 hours.

How will the initiative ensure the models are competitive with proprietary ones?

We are committed to developing models that are not only open but also competitive in terms of capability and performance. This includes leveraging cutting-edge technology, pooling resources and expertise from leading organizations, and continuous community feedback to improve the models. 

The community is passionate. We have many AI researchers who have reached out in the last 24 hours who believe in the mission, and who are willing and eager to make this a reality. In the past year, open-source innovation has driven the majority of interesting capabilities in this space.

We’ve got this.

What does ethical really mean? 

We recognize that there’s a healthy sense of skepticism any time words like “Safety” “Ethics” or “Responsibility” are used in relation to AI. 

With respect to the model that the OMI will aim to train, the intent is to provide a capable base model that is not pre-trained with the following capabilities:

  • Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts
  • Generating the likeness of unconsented individuals
  • The production of AI Generated Child Sexual Abuse Material (CSAM).

There may be those in the community who chafe at the above restrictions being imposed on the model. It is our stance that these are capabilities that don’t belong in a base foundation model designed to serve everyone.

The model will be designed and optimized for fine-tuning, and individuals can make personal values decisions (as well as take the responsibility) for any training built into that foundation. We will also explore tooling that helps creators reference styles without the use of artist names.

Okay, but what exactly do the next 3 months look like? What are the steps to get from today to a usable/testable model?

We have 100+ volunteers we need to coordinate and organize into productive participants of the effort. While this will be a community effort, it will need some organizational hierarchy in order to operate effectively - With our core group growing, we will decide on a governance structure, as well as engage the various partners who have offered support for access to compute and infrastructure. 

We’ll make some decisions on architecture (Comfy is inclined to leverage a better designed SD3), and then begin curating datasets with community assistance.

What is the anticipated cost of developing these models, and how will the initiative manage funding? 

The cost of model development can vary, but it mostly boils down to the time of participants and compute/infrastructure. Each of the initial initiative members have business models that support actively pursuing open research, and in addition the OMI has already received verbal support from multiple compute providers for the initiative. We will formalize those into agreements once we better define the compute needs of the project.

This gives us confidence we can achieve what is needed with the supplemental support of the community volunteers who have offered to support data preparation, research, and development. 

Will the initiative create limitations on the models' abilities, especially concerning NSFW content? 

It is not our intent to make the model incapable of NSFW material. “Safety” as we’ve defined it above, is not restricting NSFW outputs. Our approach is to provide a model that is capable of understanding and generating a broad range of content. 

We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM.

What license will the model and model weights have?

TBD, but we’ve mostly settled between an MIT or Apache 2 license.

What measures are in place to ensure transparency in the initiative’s operations?

We plan to regularly update the community on our progress, challenges, and changes through the official Discord channel. As we evolve, we’ll evaluate other communication channels.

Looking Forward

We don’t want to inundate this subreddit so we’ll make sure to only update here when there are milestone updates. In the meantime, you can join our Discord for more regular updates.

If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI. 

Thank you for your support and enthusiasm!

Sincerely, 

The Open Model Initiative Team

6
7
7
17
8
5
9
18
submitted 1 week ago* (last edited 1 week ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

Quoted from Reddit:

Today, we’re excited to announce the launch of the Open Model Initiative, a new community-driven effort to promote the development and adoption of openly licensed AI models for image, video and audio generation.

We believe open source is the best way forward to ensure that AI benefits everyone. By teaming up, we can deliver high-quality, competitive models with open licenses that push AI creativity forward, are free to use, and meet the needs of the community.

Ensuring access to free, competitive open source models for all.

With this announcement, we are formally exploring all available avenues to ensure that the open-source community continues to make forward progress. By bringing together deep expertise in model training, inference, and community curation, we aim to develop open-source models of equal or greater quality to proprietary models and workflows, but free of restrictive licensing terms that limit the use of these models.

Without open tools, we risk having these powerful generative technologies concentrated in the hands of a small group of large corporations and their leaders.

From the beginning, we have believed that the right way to build these AI models is with open licenses. Open licenses allow creatives and businesses to build on each other's work, facilitate research, and create new products and services without restrictive licensing constraints.

Unfortunately, recent image and video models have been released under restrictive, non-commercial license agreements, which limit the ownership of novel intellectual property and offer compromised capabilities that are unresponsive to community needs. 

Given the complexity and costs associated with building and researching the development of new models, collaboration and unity are essential to ensuring access to competitive AI tools that remain open and accessible.

We are at a point where collaboration and unity are crucial to achieving the shared goals in the open source ecosystem. We aspire to build a community that supports the positive growth and accessibility of open source tools.

For the community, by the community

Together with the community, the Open Model Initiative aims to bring together developers, researchers, and organizations to collaborate on advancing open and permissively licensed AI model technologies.

The following organizations serve as the initial members:

  • Invoke, a Generative AI platform for Professional Studios
  • ComfyOrg, the team building ComfyUI
  • Civitai, the Generative AI hub for creators
  • LAION, one of the largest open source data networks for model training

To get started, we will focus on several key activities: 

•Establishing a governance framework and working groups to coordinate collaborative community development.

•Facilitating a survey to document feedback on what the open-source community wants to see in future model research and training

•Creating shared standards to improve future model interoperability and compatible metadata practices so that open-source tools are more compatible across the ecosystem

•Supporting model development that meets the following criteria: ‍

  • True open source: Permissively licensed using an approved Open Source Initiative license, and developed with open and transparent principles
  • Capable: A competitive model built to provide the creative flexibility and extensibility needed by creatives
  • Ethical: Addressing major, substantiated complaints about unconsented references to artists and other individuals in the base model while recognizing training activities as fair use.

‍We also plan to host community events and roundtables to support the development of open source tools, and will share more in the coming weeks.

Join Us

We invite any developers, researchers, organizations, and enthusiasts to join us. 

If you’re interested in hearing updates, feel free to join our Discord channel

If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI. 

Sincerely,

Kent Keirsey
CEO & Founder, Invoke

comfyanonymous
Founder, Comfy Org

Justin Maier
CEO & Founder, Civitai

Christoph Schuhmann
Lead & Founder, LAION

10
11
Decartunizer (lemmy.dbzer0.com)
submitted 1 week ago* (last edited 1 week ago) by vegeta@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com
11
6

Highlights for 2024-06-23

Following zero-day SD3 release, a 10 days later here's a refresh with 10+ improvements
including full prompt attention, support for compressed weights, additional text-encoder quantization modes.

But there's more than SD3:

  • support for quantized T5 text encoder FP16/FP8/FP4/INT8 in all models that use T5: SD3, PixArt-Σ, etc.
  • support for PixArt-Sigma in small/medium/large variants
  • support for HunyuanDiT 1.1
  • additional NNCF weights compression support: SD3, PixArt, ControlNet, Lora
  • integration of MS Florence VLM/VQA Base and Large models
  • (finally) new release of Torch-DirectML
  • additional efficiencies for users with low VRAM GPUs
  • over 20 overall fixes
12
9
13
12
14
20
15
7
16
20
17
33
18
9
19
17
20
53
submitted 2 weeks ago* (last edited 2 weeks ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

Abstract

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality.

To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry.

The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

Paper: https://arxiv.org/abs/2406.10163

Code: https://github.com/buaacyw/MeshAnythin

Project Page: https://buaacyw.github.io/mesh-anything/

21
26
22
15

Hello,

I’m trying to run the pixel art generator (https://perchance.org/ai-pixel-art-generator) locally on my machine so that I can run it programmatically from Python. From what I’ve gathered (mainly from this post: https://lemmy.world/post/5926365), the model behind the generator is SD 1.5, however, I’ve tried running it locally (downloaded from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main, tried both emaonly and pruned) and I can’t seem to be able to replicate the results. It would be helpful to know the exact configuration used to prompt SD 1.5, or provide some more help on how to set up some API, or even point to some Github with the code that I haven’t been able to find so that might not exist. I’ve tried to read all the documentation I could find but was not able to use any of the provided resources (like this one: https://perchance.org/diy-perchance-api, or this one: https://perchance.org/text-to-image-plugin). If anyone could help with any of the above, I would be eternally grateful <3. I will also list everything I’ve pieced together so far on how the generator works in case someone else might find it useful. The images generated with this configuration are similar in style to what pops out of the generator but they are just fundamentally different in quality. The ones from the generator are much better at depicting the prompt.

What I’ve gathered so far:

  • Model used: SD 1.5
  • Width x Height: 512x512
  • Sampling method (not a clue what this should be): DPM++ SDE
  • Prompt: , best pixel art, neo-geo graphical style, retro nostalgic masterpiece, 128px, 16-bit pixel art , 2D pixel art style, adventure game pixel art, inspired by the art style of hyper light drifter, masterful dithering, superb composition, beautiful palette, exquisite pixel detail
  • Negative Prompt: glitched, deep fried, jpeg artifacts, out of focus, gradient, soft focus, low quality, poorly drawn, blur, grainy fabric texture, text, bad art, boring colors, blurry platformer screenshot
23
43
24
20

Without paywall: https://archive.ph/QD9v1

25
25
submitted 2 weeks ago* (last edited 2 weeks ago) by ylai@lemmy.ml to c/stable_diffusion@lemmy.dbzer0.com

Excerpt from the relevant “ComfyUI dev” Matrix room:

matt3o
and what is it then?

comfyanonymous
"safety training"

matt3o
why does it trigger on certain keywords and it's like it's scrambling the image?

comfyanonymous
the 2B wasn't the one I had been working on so I don't really know the specifics

matt3o
I was even able to trick it by sending certain negatives

comfyanonymous
I was working on a T5 only 4B model which would ironically had been safer without breaking everything
because T5 doesn't know any image data so it was only able to generate images in the distribution of the filtered training data

comfyanonymous
but they canned my 4B and I wasn't really following the 2B that closely

[…]

comfyanonymous
yeah they did something with the weights
the model arch of the 2B was never changed at all

BVH
weights directly?
oh boy, abliteration, the worst kind

comfyanonymous
also they apparently messed up the pretraining on the 2B so it was never supposed to actually be released

[…]

comfyanonymous
yeah the 2B apparently was a bit of a failed experiment by the researchers that left
but there was a strong push by the top of the company to release to 2B instead of the 4B and 8B

Additional excerpt (after the Reddit post) from Stable Diffusion Discord “#sd3”:

comfy
Yes I resigned over 2 weeks ago and Friday was my last day at stability

view more: next ›

Stable Diffusion

4142 readers
5 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago
MODERATORS