All things Generative AI – a Deep Dive with Dr. Costa Colbert8 min readReading Time: 6 minutes
Costa was a professor of Neuroscience, one of the early scientists in the world to create IP around generative AI that works on scale in production in the world. And while he’d rather be coaching our AI teams or building stuff, we managed to get him out to teach us about all things generative AI, in English!
Generative AI is not just generating lines of text and vivid imagery but also a lot of hype. News outlets and social media have been taken by a storm of posts with facts, fiction, and speculation about this seemingly new oracle that seems to weave magic, knowing everything about everyone.
Generative AI refers to a type of artificial intelligence that can generate new data similar to existing data. This can include generating new images, text, music, or other types of media. One common approach to generative AI is using a type of neural network called a generative model, which is trained on a dataset of existing examples and can then generate new examples that are similar to those in the training dataset.
With that being said, how does it work? What are the use cases? What are the facts, and how do they play against the fiction being speculated about the technology? Does it deserve the hype it’s receiving? We’re skewing towards a yes, because ChatGPT wrote the paragraph above based on a prompt. Here’s a detailed look into the conversation and what you can take away from it.
What does a model type like GPT do? How does it work?
Costa: A model like GPT (Generative Pretrained Transformer) takes a prompt of words and predicts the words that should come afterward. They generate text or images depending on whether you’re dealing with LLMs (large language models) or image-centric models, which you can see with Stable Diffusion and Dall-E. When you interact with ChatGPT – you feel like someone is typing back, but basically, it is looking at the probability of the next set of words to come and is filling that in there. It’s an incredible template-matching machine.
So, when you say, “Write me something in this language, on this topic,” it does a very good job of putting that together. You could even ask it to explain it, and it will give you more text explaining it. But it’s still template matching. It’s not reasoning. And it is difficult to tell it apart from people writing largely because of the very large vocabulary and data it has been fed.
Doing this activity like this with pattern matching and probability applied essentially seems like something that doesn’t require a whole lot of reasoning. It’s hard to tell if the system is not reasoning because it’s simply putting together words and sentences with a very high probability of showing up next to each other. It is extremely complicated pattern-matching.
What do Large Language Models(LLMs) do well, and what has it been built for?
Costa: Let’s say you want to use a language model to write a newspaper story. You put in a few prompts like a tweet or a brief and ask it to write – you can go further and specify a style and have it associated with it. For eg, “Write like Winston Churchill”. But let’s say it got some facts wrong. So you provide feedback and say, “No, this is not right”, and fix that part. What you have here is a role change from what we’ve known, which is us becoming editors vs. writers, with the machine taking on some role here of a writer.
You can’t assume that the output it gives you is entirely correct. And as long as you can edit, then it’s a natural fit. If you have a website that ingests tweets and generates paragraphs with some context and puts together storylines, and you want this to be running 24/7 and generating summaries and news from all the reading, live, with no intervention – then this will be a disaster.
How does this apply to image-centric generations? Where does that succeed, and where does it fail?
Costa: If we extend this to pictures, you can imagine use-cases for an ad campaign. You want a nice background and a visual to present an idea – you can generate a whole bunch of images, and with the choices given, you can likely send it to an artist to make it production ready, then that might work.
The problem you run into here is when the art director says, “Yes, let’s use this exact image,” but the baby in the image has 3 fingers, and an alien face, and on a closer look, you see that a lot of what has been generated is completely inaccurate & distorted. The colors are brilliant, and the feeling of hope or joy or sorrow is captured well, but you are left to deal with an alien baby with 3 fingers and distorted eyes, and you realize this is not production ready by a mile.
Does the system really understand what we’re saying and doing here?
Costa: These systems appear like this oracle that seems to know everything about everything. That’s because it’s been fed an insane amount of data. 400M data points sometimes, for a use-case. There’s no human way possible to vet this information that’s feeding into the system. There’s likely a lot less information and facts in there than if you were to do a Google search. It might be on page 10, but it will be there. You’re not smushing all this data together to write a piece that is reasoning something. You’re simply directing people who’re looking for answers to many different points of view written by people who are reasoning. You get to pick and choose whose reasoning and story makes the most sense to you.
With Generative AI, it’s simply putting together what seems like very well-written pieces, and it is well-written in many cases, but there is no reasoning. There’s just the math of the words that most likely fit together. You could go here for school essays and some early writing. You couldn’t go here consistently for pieces that help you get good writing on a particular subject, just not yet.
Where does this fall short? Where is the risk?
Costa: On one hand, it is a question of accuracy. How much of what is being said is accurate, and how much you need someone to redo the writing or edit. The other part is that the more you keep doing it, the more the writing looks the same. The structure is the same, the style is the same, and it all starts to look exactly the same, both with imagery and text. It feels like it’s all merging. And that’s fine as long as your intent here works and this is for, play. Not otherwise.
Ashwini: For example, at MSD, we deal with applying this kind of generative AI for retail or other industries on scale, in production. And accuracy is everything. Your accuracy scores are the difference between a business making and losing money, time, and resources. You cannot generate content that looks generic and the same. There’s SEO involved, and when all businesses start using the same systems, you’re talking about all writing merging and not standing out and not being usable or contributing to a conversion rate or a cost-saving outcome in any meaningful way. You do not have the luxury of creating generic assets. So you could have generic assets that are good enough, and the outcomes of reduced repetitive work and time are a huge help, a variety of options that allow you to explore or generate accurate and precise content. What you see out there today is not for that last category. Anything for which there is an accurate answer, this is not the approach.
What to know before embracing this for your business?
Ashwini: There are 2 themes emerging, One is – “I want to get the job done, I don’t want to do this, and I need the machine to get this done, and it’s fine if it’s good enough,” and the other theme is one around play. What you want is to explore, discover and create and the accuracy there is of no importance. The 3 fingered alien baby is simply part of the aesthetic. Use cases that fit these molds work well. The downside is not crazy, and that is the biggest takeaway here. The risk is not high. The reward outweighs the risk of using this.
That is not the case when you’re using this where risk and liability is high and you cannot afford to be wrong. And in some cases, depending on the scale of the activity, if you put a large enough group of people and can meaningfully sample and edit, you’ll still be fine but if the scale of output is too big, then that sampling and editing are also not meaningful.
The inception and the nascent stage of Generative AI is teaching us this – we are going to see a tectonic shift in roles. From writers to editors, from data collectors to storytellers, and the ideal way forward is to embrace and adapt to this shift, or what we at MSD call, becoming AI-Native.