Listen to the blog...

The launch of Gemini by Google is a major step forward for the company’s advancement in the artificial intelligence space. Gemini is a very advanced language model intended to transform the artificial intelligence field completely. One of Gemini’s many impressive features is its ability to simultaneously analyze text, photos, and video. But, “Can Google’s top-tier AI Gemini outperform its primary rival, the generative AI chatbot ChatGPT?

Let’s Find out..!!

Google’s Gemini vs Open AI’s ChatGPT

What Is Gemini AI?

Gemini is a new and powerful AI model by Google that goes beyond just understanding text—it can also make sense of images, videos, and audio. It’s like a smart assistant that’s good at handling tricky tasks in math, physics, and even coding in different programming languages. So, it’s not just about words; Gemini is all about understanding and doing things with pictures, videos, and sound.

According to Dennis Hassabis, CEO and co-founder of Google DeepMind, Gemini results from extensive teamwork from various Google teams, including those at Google Research. He explains that Gemini was purposefully developed to be multimodal, allowing it to smoothly comprehend, work with, and integrate various kinds of information. This includes text, code, audio, images, and videos, showcasing its ability to generalize across different data types from the ground up.

Different Versions of Gemini AI 

Google has introduced three versions of the Gemini large language model (LLM), collectively referred to as “Gemini 1.0,” representing the software’s initial release. These include:

  • Gemini Ultra: This is the most powerful but slower model.
  • Gemini Pro: Recognized for its scalability and versatility, it serves as the all-purpose model and is currently employed in Bard.
  • Gemini Nano: Known for efficiency, though less powerful, it is particularly suitable for on-device tasks.

At present, Google has made Gemini Pro available for use in Bard alongside the existing AI model, also named Bard. 

Additionally, Gemini Nano has been released for the Pixel 8. The Pixel 8 Pro is the first phone designed to work seamlessly with Nano, enabling new features such as the “Summarize in the Recorder” app and Smart Reply for WhatsApp.

It’s worth noting that Gemini Ultra is still in development and has not been released yet. Google is actively refining this high-powered model, with plans to unveil it for a new version of Bard, termed Bard Advanced, expected to launch in 2024.

Gemini vs ChatGPT: Which Is Better?

We have selected the best versions of both Gemini and ChatGPT. 

Here’s a comparison between Gemini Ultra, the flagship version of Google’s Gemini, and GPT-4V, the most advanced iteration of OpenAI’s ChatGPT, based on various benchmarks:

1. Massive Multitask Language Understanding (MMLU)

Gemini Ultra – MMLU

  • Achieved an impressive 90.0% in Massive Multitask Language Understanding (MMLU).
  • Demonstrates a remarkable ability to comprehend 57 subjects, including STEM, humanities, and more.
  • Highlights comprehensive understanding across various topics, showcasing its versatility and proficiency.

GPT-4V – MMLU:

  • Reports an 86.4% 5-shot capability in the same benchmark.
  • Signifies a substantial ability to understand and process information across a broad spectrum of subjects.
  • While slightly lower than Gemini Ultra, it demonstrates commendable language understanding capabilities in the MMLU context.

2. Reasoning Ability

Gemini Ultra:

  • Scores an impressive 83.6% in the Big-Bench Hard benchmark.
  • Demonstrates proficiency in diverse, multi-step reasoning tasks.
  • Highlights its capability to navigate complex and intricate scenarios, showcasing advanced reasoning abilities.

GPT-4V:

  • Shows comparable performance with an 83.1% 3-shot capability in a similar context.
  • Demonstrates commendable reasoning skills, particularly in tasks requiring multiple steps.
  • Despite a slightly lower score than Gemini Ultra, it showcases robust reasoning abilities in diverse scenarios.

3. Reading Comprehension 

Gemini Ultra:

  • Excels with an impressive 82.4 F1 Score in the DROP reading comprehension benchmark.
  • Demonstrates a high level of proficiency in comprehending and answering questions based on textual information.

GPT-4V:

  • Achieves a commendable 80.9 3-shot capability in a comparable scenario.
  • Demonstrates strong reading comprehension skills, particularly in scenarios requiring a three-shot context.

4. Commonsense Reasoning (HellaSwag)

Gemini Ultra:

  • Impresses with a notable 87.8% 10-shot capability in the HellaSwag benchmark.
  • Showcases adept commonsense reasoning abilities in diverse scenarios.

GPT-4V:

  • Demonstrates an even higher 95.3% 10-shot capability in the same HellaSwag benchmark.
  • Exhibits exceptional commonsense reasoning skills, surpassing Gemini Ultra in this particular context.

5. Mathematical Proficiency (GSM8K)

Gemini Ultra:

  • Excels in basic arithmetic manipulations, boasting an impressive 94.4% maj1@32 score in the GSM8K benchmark.
  • Demonstrates a high level of proficiency in handling fundamental mathematical operations.

GPT-4V:

  • Maintaining a strong 92.0% 5-shot capability in Grade School math problems per the GSM8K benchmark.
  • Displays commendable mathematical proficiency, particularly in solving elementary math challenges.

6. Challenging Math Problems (MATH):

Gemini Ultra:

  • Tackles complex math problems with a 53.2% 4-shot capability in the MATH benchmark.
  • Showcases versatility and adeptness in handling intricate mathematical challenges.

GPT-4V:

  • Maintains a competitive 52.9% 4-shot capability in a similar context as per the MATH benchmark.
  • Demonstrates proficiency in addressing challenging mathematical problems, aligning closely with Gemini Ultra’s performance.

7. Code Generation (HumanEval)

Gemini Ultra:

  • Efficiently generates Python code, demonstrating a commendable 74.4% 0-shot capability in the HumanEval benchmark, specifically in the Information Technology (IT) context.
  • Showcases a high level of proficiency in code generation tasks, particularly in Python.

GPT-4V:

  • Performs well in code generation, boasting a respectable 67.0% 0-shot capability in a similar context to the HumanEval benchmark.
  • Demonstrates competence in generating code, contributing to its versatility in handling various programming tasks.

8. Natural Language to Code (Natural2Code):

Gemini Ultra:

  • Showcases high proficiency in generating Python code, achieving an impressive 74.9% 0-shot capability in the Natural2Code benchmark.
  • Demonstrates advanced capabilities in translating natural language instructions into Python code.

GPT-4V:

  • Maintains strong performance in Natural Language to Code tasks, boasting a commendable 73.9% 0-shot capability in a similar benchmark.
  • Displays competence in the challenging task of converting natural language queries into executable Python code.

Conclusion

The bottom line is clear: Google’s Gemini Ultra is the go-to large language model, proving itself as the top player by outperforming its rival, GPT-4, in various tasks. However, it stumbled slightly in commonsense reasoning for everyday tasks (HellaSwag). With AI becoming a big player in various industries, Google’s move with Gemini is a game-changer, opening up exciting new possibilities.

Sanjay Mehan| SunArc Technologies
Sanjay Mehan