Blog
Image Generation in 2023 vs 2024 + Astronaut Monkey
Dr. Ryan Ries here. Welcome back from re:Invent everyone. I hope you enjoyed reading through all the announcements and getting excited about how to apply these new features to your projects.
I know I have!
These big AWS announcements have been followed by the 12 days of OpenAI and their own series of announcements.
All this holiday joy at once!
So, this week, I thought it was fitting to chat about the progression of image generation over the past year.
Quick side bar before we jump in. Tomorrow I’m hosting a re:Invent recap webinar for anyone who wasn’t able to make re:Invent. Or, if you were there and just wanted to see mine and Mission’s CTO’s thoughts on the announcements, that’s cool too!
Now, back to our regularly scheduled programming.
Our Astronaut Monkey Friend
I had a grand vision for this week’s Mission Matrix to use the new Amazon Nova models, OpenAI models, Titan image gen, Stability.ai, and Gemini to see how image and video generation were looking.
This would’ve been a great way to compare the advances made over the last year when I used the Amazon Bedrock playground and Stable Diffusion 1.0 to make this astronaut monkey on the moon.
(If you’ve seen me present live, you probably recognize this image!)
Too Prickly for AI
Once again, Stability.ai’s Stable Image Ultra 1.0 created a solid image of our astronaut monkey and a cactus. The cactus is a test item because last year, the image generators could only create a cactus with a lot of prompting effort.
It doesn’t have the same moon look as last year because I’m pretty sure the cactus is really throwing the model off.
My prompt for this was “Create a photorealistic image of an astronaut monkey riding a bicycle on the moon. In the background, you see a cactus on the right-hand side. Make sure you can see inside the helmet that it is a monkey.”
Giving Titan the old college try
Since I was in the Amazon Bedrock playground, I wanted to test out the Titan Image Gen 2 model and immediately ran into errors telling me that my monkey prompt violated the AWS Responsible AI Policy.
Ummm… Didn’t know a space monkey would create such an issue!
I tried to do some research, but unfortunately, I still couldn’t figure out what exactly was violating the policy.
I scratched my head on this one for a while, but then I decided to try out some text-to-text models to rewrite my prompt in order for it not to violate the AWS policies.
Whenever I need to ask a model about documents that are fairly new, I usually end up using Google’s Gemini since it will scrape the web for an answer.
I tried out a pretty simple prompt: "Edit this prompt so it doesn’t violate the Amazon responsible AI policy: “Create a photorealistic image of an astronaut monkey riding a bike on the moon. To the right-hand side is a cactus. Make sure that the visor is clear so you can see the monkey’s face.”
I was expecting to get a new prompt to test out in Titan, but instead, it just produced a really poor image from Imagen3.
I don’t think this guy is surviving in the cold of the moon, but that cactus is actually looking pretty sharp!
Next up: Nova Pro
So, now I had to decide where to go next.
I still had the Bedrock Playground up, so I decided to try out the Amazon Nova Pro model. I had asked it earlier to create an image, and that failed since they need to update the output methodologies.
On the bright side, it did do a good job rewriting my prompt: "Create an imaginative and whimsical illustration of a cartoon astronaut monkey riding a bike on a lunar landscape. To the right-hand side, include a friendly cartoon cactus. Ensure the visor is styled in a way that allows a playful and expressive view of the monkey's face."
This version maintains a creative and fun theme while avoiding any potentially sensitive or unrealistic expectations regarding photorealism and clarity of features.
This rewriting also gave me the reason why Titan failed since I wanted a photorealistic image.
With my new prompt, the circles of inference were spinning, and I got 3 images to choose from.
While these are all fairly similar, the moon in the background and the monkey’s face are different.
I did modify my original prompt by taking out “photorealistic” and removing the cactus. Now I’d say we have a pretty solid monkey riding a bike on the moon. I would still recommend that the monkey wear some gloves, though!
Astronaut Monkey on OpenAI
I had started to try and test out video creation with Amazon, Luma, and Sora, but there was no way to get access to those models yet.
For example, Sora was way over capacity each time I tried to use it.
So, we shall prevail another time.
In the meantime, here is my attempt with Open AI to create our Astronaut Monkey.
(What in the world are those hands?!?!)
For next time
Alright, I didn’t quite accomplish everything I wanted to get done in this week’s Mission Matrix, but I learned a lot from these tests, and I hope you did too.
Next week, maybe (key word, MAYBE), we will be able to test some of the video capabilities of the models.
But this requires writing some code and not just messing around with easy-to-use UIs.
In the spirit of the holidays…
Let’s do one last test that failed last year: My attempt to create Rudolph the Red-Nosed Reindeer.
Here’s the before (my result from last year):
After:
Here’s this year’s Stability.ai image.
Gemini’s result:
And last but not least, Titan!
Until next time,
Ryan Ries
Now, time for this week's AI-generated image and the prompt I used to create it.
I know this entire newsletter was AI-generated images, but just for fun, I decided to combine Rudolph with our Astronaut Monkey friend!
"Generate an image of Rudolph, the red-nosed reindeer next to a monkey in an astronaut suit biking on the moon. You should see a cactus in the image."
Sign up for Ryan's weekly newsletter to get early access to his content every Wednesday.
Author Spotlight:
Ryan Ries
Keep Up To Date With AWS News
Stay up to date with the latest AWS services, latest architecture, cloud-native solutions and more.