The New Bard and Crazy AI Images, Videos, and Translations
Just when I thought I'd figured out what to discuss about the fascinating developments in translation dubbing, a breakthrough in AI image processing swooped wholly and upturned my plans. As I geared up to dissect this new development, a new release of the AI model Bard took me by surprise. Against such a backdrop of incessantly materializing AI innovations, I decided to cram all this and more in one comprehensive post.
The New Bard Revealed
Less than 24 hours before writing this, I became aware of the Bard extensions - a recent development reminiscent of ChatGPT plugins. However, the use case for these extensions is quite specific. They work exclusively with popular Google applications like YouTube, Gmail, Google Docs, Google Drive, and more.
To truly appreciate these extensions, let's delve into a practical scenario where they could shine. A while ago, I had taken a photo of some Roman remains during an exploration outing. Bard, with its image analysis capabilities, could ascertain that the central figure in the image was Mithras, a well-known deity in Roman mythology. Is the Cherry on top? It managed to recommend a YouTube video about Mithras, without any prompt from me to do so.
Bard's ability to provide both image analysis and relevant online content in a single combined process - without requiring separate searches - makes it extraordinarily useful.
Moreover, consider another case where I had shot a photo during my travels. The ask was simple,” What’s the flight duration and typical cost from London to the location shown?”. Bard not only identified the location (Fisherman's Bastion), but it also found specific flights for me, tracked the cost, and provided additional information about the destination. Quite a handy feature for the avid traveler, I'd say.
Bard's Limitations
Although I've covered the impressive features of Bard, it's worth touching on an issue: its tendency to hallucinate. When asked to provide feedback from comments on my YouTube channel, Bard fabricated responses that didn't exist in my comment section. While Bard can sometimes find genuine comments, one must question its utility since its responses cannot be relied upon.
Bard's inconsistencies extended to other tasks as well. Although it was able to fetch my Replit bill from my Gmail, it initially failed to compare the cost of my monthly Replit bill with my Eleven Labs bill. Only after consistent prompting did it retrieve the desired bills and perform the comparison.
Moreover, Bard had trouble with image recognition and approximations. When asked to identify the character in an image of Darth Vader, Bard accurately identified him. However, when asked to estimate the gross revenue of the Star Wars franchise, the numbers were inaccurate. Even after adjusting for inflation, the results were less than stellar. Unfortunately, Bard's self-assessment of the accuracy of its response, conducted by clicking the "Double check response" button on Google, contributed little to its credibility. It was often overconfident about the accuracy of the response it provided, even when the response was incorrect.
All said and done, let's not dismiss Bard entirely. The new extensions are a significant step forward.
Transform the World of Translation Dubbing - Hagan Avatar 20
Coming to Hagan Avatar 20, a technology I had demoed in my last video, it truly took people by surprise. Notably, the technology's command over languages is astounding and comes with the potential to transform fields like movie dubbing.
Consider this translation of Oppenheimer's famous quote into Spanish:
"Era obvio quel mundo nosmo solonos pocosia treviero narceon algunas personas joradon la Majoría establishio record de la fraser de losto Sagrados induas el Bagavad guitar uno kevite. AORA me combertido and la muerte el destructor de mundos supongo quetodos pensamos de una forma."
Or how about the translated network scene into German:
"dumosaging dasish and menspin fadamal mine lab is Vietfall Alzheimer's to yet so fort alfst on handelst bitter state aleph on oyensteullen alfred so fort alfstein some fenced again as earthan on Dinen and copferau striking on shrine ishbin zovutant on verde as nishlanga hin naaman."
Hagan Avatar 20 even managed to paraphrase Andre Karpathi in Hindi.
Constraint, however, remains that this can only be done for translation, rather than creating fresh content. As a responsible AI enthusiast, I believe we need to tread carefully with deep fake technology use, as it could potentially manipulate political outcomes or distort historical facts. But for those keen to make their predictions of AI, Meticulous is an excellent platform to refer to.
Nevertheless, in terms of AI progress, the strides being made in GPT models, like GPT-3 Five, and their ability to play chess is a testament to the incredible leaps AI is taking toward reasoning capabilities.
Revolutionary AI Image Generation: Fusion Art and Runway Gen Two
Stepping away from reasoning, I've been fascinated by the latest developments in AI art creation. I've generated some fantastic images and videos using Fusion Art, Runway Gen Two, and even Adobe Express. Fusion Art provides 25 free credits, while Runway Gen Two offers a free sign-up and 45 seconds of video generation.
Fusion Art's strong suit is turning text into striking visuals by converting an image of the text into art based on a prompt. By varying the 'prompt' to change the theme and 'illusion strength' to determine how your original illusions appear, you can create various stimulating images or animations.
The potpourri of possibilities provided by AI is seemingly boundless and ever-growing. As is evident from the OpenAI Red teaming network, where experts in any of the 26 given fields can get paid for their services.
Have your say
Do the latest developments in AI bowl you over? Need some advice or foresight into your favorite AI theme? Feel free to drop a comment below and stay attuned for more head-turning discoveries from the realm of AI.
Subscribe to my newsletter
Read articles from Hanii directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Hanii
Hanii
I'm a teacher. I have written the code that you've used. I speak, code, write, empower, promote, braid, learn, and listen - usually not in that order.