Google on Wednesday launched its next-generation and multimodal AI (artificial intelligence) model, Gemini, calling it the “most capable, flexible, and general AI model” that the company has ever built. It also displayed a six-minute demonstrative video to the public and publication outlets.
However, within 24 hours of its release, Google started facing backlash over the authenticity of the demonstration video of Gemini, saying that it was not conducted in real-time.
As one can see in the six-minute video above, it displays Gemini’s capabilities as a large language model (LLM), which includes verbal conversations between a human user and the AI-powered chatbot in real-time. It showcases the AI’s ability to generate game ideas, identify visual cues and physical objects and know the difference, understand hand gestures, or provide interpretations for a rubber duck held by a user.
The company’s description for its video named “Hands-on with Gemini: Interacting with multimodal AI” on YouTube includes a short line that says what we are about to see isn’t completely real.
“For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity,” Google says. In other words, it admitted that the model’s response time was edited and sped up, as it took much longer than the video showed (which was declared in the video description).
However, Bloomberg’s Parmy Olsen discovered that the demo wasn’t real, as it did not involve real-time voice interaction between the human user and the AI, which Google failed to disclose. Instead, the demo was produced using still frames from raw footage and scripted text prompts to which Gemini responded rather than replying or predicting real-time changes in the environment.
“That’s quite different from what Google seemed to be suggesting: that a person could have a smooth voice conversation with Gemini as it watched and responded in real-time to the world around it,” Olson writes.
However, Oriol Vinyals, the co-lead of Gemini at Google and VP of Research & Deep Learning Lead at Google DeepMind, defended the video and responded to the controversy in a post on X (formerly Twitter), saying that the aim of the video was to “inspire” rather than mislead.
“Really happy to see the interest around our “Hands-on with Gemini” video. In our developer blog yesterday, we broke down how Gemini was used to create it,” wrote Vinyals.
“We gave Gemini sequences of different modalities — image and text in this case — and had it respond by predicting what might come next. Devs can try similar things when access to Pro opens on 12/13 ?. The knitting demo used Ultra,” Vinyals added.
“All the user prompts and outputs in the video are real, shortened for brevity. The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”
Really happy to see the interest around our “Hands-on with Gemini” video. In our developer blog yesterday, we broke down how Gemini was used to create it. https://t.co/50gjMkaVc0
We gave Gemini sequences of different modalities — image and text in this case — and had it respond… pic.twitter.com/Beba5M5dHP
— Oriol Vinyals (@OriolVinyalsML) December 7, 2023
However, Google’s admission of edits in the Gemini AI demo video has been met with scepticism, with users expressing their disappointment and highlighting the lack of transparency and the deceptive nature of the video.
This is not the first time Google has faced criticism over the authenticity of its demo videos. Previously, the tech giant grappled with doubts about the validity of its Duplex demo, which allowed AI mimic a human voice to make appointment at hair salons and restaurants, raising concerns over privacy.