[The news has been filled with stories about the dramatic, fast-moving evolution of AI-based media. For example, Interesting Engineering’s report on the latest version of AI image generating software Midjourney is titled “Believe it or not: This image was generated by an AI program using three text prompts. This marks the end of photography as we know it. It’s synthography from here on.” (See more coverage and examples from PetaPixel and Ars Technica). And CNN reports on “5 jaw-dropping things GPT-4 can do that ChatGPT couldn’t.” But the two first-person posts below by tech journalists point to the dangers we’ll face as these presence-evoking technologies become available to members of the wider public to set up and use on their own computers without guardrails. –Matthew]
[From Beebom; see the original story for more pictures]
I Created My Own AI Image Generator and Now I’m Worried
By Akshay Gangwar
March 17, 2023
Remember when Prisma was the ultimate “AI” image editing app out there? Yeah, we’ve certainly come a long way since then. With the rise of prompt-based AI image generators such as DALL-E and Midjourney, creating art and deepfakes is pretty much available to every one out there.
But there are limitations, aren’t there? After the initial novelty of asking Midjourney to imagine varied prompts and seeing what it throws out, it all gets rather boring. Or at least it did for me.
Narcissistic Energy?
Look, I am an introvert, which means I don’t really like going out much. But you know what I do like? Having pictures of myself in places I would probably never go to; heck, places I can’t go to as well.
Naturally, I wanted to ask AI tools to create images of me in different situations and places. However, I also didn’t want to upload images of myself on random websites in the hopes that the results might be good; and that’s when I read about Dreambooth.
Let the Games Begin…
Turns out, really smart people have brought things like Stable Diffusion to the masses. What’s more, others have collaborated with them and made it possible for literally anyone with some patience to create their own Stable Diffusion models and run them, completely online.
So even though I have an M1 MacBook Air which by no means is intended to be used as a training machine for a deep-learning image generation model, I can run a Google Colab notebook and do all of that on Google’s servers — for free!
All I really needed, then, were a couple of pictures of myself, and that’s it.
Training my AI Image Generator
Training your own image generator is not difficult at all. There are a number of guides available online if you need help, and it’s basically all very straightforward. You need to just open the Colab notebook, upload your pictures, and start training the model. All of which happens quite quickly.
Okay, let’s be fair, the text encoder training happens quite quickly, within 5 minutes. However, training the UNet with the parameters set to default does take a fairly long time — close to 15-20 minutes. However, considering the fact that we are actually training an AI model to recognise and be able to draw my face, 20 minutes doesn’t sound like too much time.
While training, there are a bunch of ways you can go about customising just how much you want to train your model, and what I understood from reading experiences of a lot of people online, is that there’s no real “one-size-fits-all” strategy here. However, for basic use-cases, the default values seemed to work just fine for most people, and I stuck with those as well. Partly because I couldn’t really understand what most of the things meant, and partly because I just couldn’t be bothered to try training multiple models with different training parameters to see what resulted in the best outputs.
I was, afterall, simply looking for a fun AI image generator that can make some half-decent images of me.
Exceeds Expectations
I am not an AI expert by any stretch of the imagination. However, I do understand that training a stable-diffusion model on a Google Colab notebook with 8 jpegs of myself cropped to 512×512 pixels will not really result in something extraordinary.
How very wrong I was.
In my first attempt at using the model I trained, I started with a simple prompt that said “akshay”. The following is the image that was generated.
Not great, is it? But it’s also not that bad,right?
But then I started playing with some of the parameters available in the UI. There are multiple sampling methods, there are sampling steps, CFG Scale, scripts, and a lot more. Time to go a little crazy experimenting with different prompts and set ups for the model.
Clearly, the results of these images aren’t perfect, and anyone who has seen me can probably tell that these are not “my” images. However, they are close enough; and I didn’t even train the model with any particular care.
If I were to follow the countless guides on Reddit and elsewhere on the internet that talk about the ways you can improve training and get better results from Dreambooth and Stable Diffusion, these images might have turned out even more realistic (and arguably, scarier).
This AI Image Generator is Scarily Good
See, I’m all for improvements in AI technology. As a tech journalist, I have followed the ever-changing and improving field of consumer-facing AI over the last couple of years, and for the most part, I’m deeply impressed and optimistic.
However, seeing something like Dreambooth in action makes me wonder about the unethical ways in which AI and ML-based tools are readily available to basically anyone with access to a computer and the internet.
There’s no questioning that there are a lot of bad actors in the world. While innocent use-cases of such easily-accessible technology definitely exist, if there’s one thing I have learnt in my years of reporting on tech, it’s that putting a product into the hands of millions of people will undoubtedly result in a lot of undesired outcomes. At best, something unexpected, and at worst something outrightly disgusting.
Having the ability to create deepfake images of pretty much anyone as long as you can source 5 to 10 pictures of their face, is incredibly dangerous if used incorrectly. Think misinformation, misrepresentation, and even revenge porn — deepfakes can be used in all of these problematic ways.
Safegaurds? What Safegaurds?
It’s not just Dreambooth either. In themselves, and used well, Dreambooth and Stable Diffusion are incredible tools that allow us to experience what AI can do. But there are no real safeguards to this technology from what I have experienced so far. Sure, it won’t let you generate outright nudity in images; at least by default. However, there are plenty of extensions that will let you bypass that filter as well and create pretty much anything you can imagine, based on anyone’s identity.
Even without such extensions, you can easily get tools like this to create a wide-range of potentially disturbing and disreputable imagery of people.
What’s more, with a decently powerful PC, one can train their own AI models without any safeguards whatsoever and based on whatever training data they want to use — which means the trained model will create images that can be damning and harmful beyond imagination.
Deepfakes are nothing new. In fact, there is a vast trove of deepfake videos and media online. However, up until the recent past, creating deepfakes was limited to a relatively small (although still large) number of people that existed in the intersection of “people with capable hardware”, and the “technical know-how”.
Now, with access to free (limited-use) GPU compute units on Google Colab and the availability of tools like fast-dreambooth that let you train and use AI models on Google’s servers, that number of people will go up exponentially. It probably already has — that’s scary to me, and it should be to you as well.
What Can We Do?
That’s the question we should be asking ourselves at this point. Tools like DALL-E, Midjourney, and yes, Dreambooth and Stable Diffusion, are certainly impressive when used with common human decency. AI is improving by leaps and bounds — you can probably tell that by looking at the explosion of AI-related news in the past couple of months.
This is, then, a crucial point where we need to figure out ways to ensure AI is used ethically. How we can go about doing that is a question I’m not sure I have the answer to, but I do know that having used the fast-dreambooth AI image generator, and after seeing its capabilities, I am scared of how good it is, without even trying too hard.
—
[From Shelly Palmer’s blog]
Run “ChatGPT” on Your Computer
By Shelly Palmer
March 20, 2023
Here’s something you probably won’t do today: install a large language model (LLM) chat application on your PC. That’s okay, I did it for you – in about 10 minutes.
The results (which are awesome, BTW) are nothing compared to the unintended consequences.
I didn’t install a version of OpenAI’s ChatGPT; I installed two leaked large language models: Meta’s LLaMA and Alpaca. Then, I installed an app called Dalai that I found on Github. (Dalai LLaMA. Get it?)
Here’s the crazy part: it’s good. Really good. In fact, this small experiment demonstrates that manageable versions of conversational AI apps can run on very modest hardware. I’m getting respectable performance on my $15,000 gaming PC, but also on my M1 MacBook Pro. (Yes, I installed it on two machines).
It gets crazier. There’s no one to tell me what I can or cannot train this model on, and there’s no one imposing any content restrictions, like only training a model on outdated information.
I can’t predict the future any better than you can, but think back to the early 1980s when personal computers began transforming our society. In less than 40 years, we’ve networked the planet, weaponized information technology, and completely changed the way we communicate and interact with each other. In this case, when I say “we,” I mean each and every one of us has contributed to the transformation. Importantly, during this time, the role of information gatekeeper was also dramatically transformed.
The current version of conversational AI has only been available to the general public since November 2022. Until this morning, I would have been comfortable saying that OpenAI, Meta, Google, and other big tech companies were going to control access to the leading edge of this technology for the foreseeable future.
It’s 9:00 a.m. ET on March 20, 2023, and I am now pretty sure that inspired individuals with a laptop and a Starbucks card (for access to WiFi and coffee) are going to create LLM-based applications as quickly as they can type. Take a moment to think about this.
Leave a Reply