The Business & Technology Network
Helping Business Interpret and Use Technology
S M T W T F S
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
 
 
 
 
 

Meta AI’s Llama 3.1 405B surprisingly beats GPT-4o

DATE POSTED:July 23, 2024
Meta AI’s Llama 3.1 405B surprisingly beats GPT-4o

Leaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential.

Leaked: Meta AI Llama 3.1 405B benchmarks

Meta introduced Llama 3 in April 2024 as a new generation of cutting-edge, open-source large language models. The initial release included Llama 3 8B and Llama 3 70B, both of which established new performance benchmarks for LLMs in their respective sizes. However, within just three months, several other models have managed to surpass these initial benchmarks, indicating the rapid pace of advancement in the field of artificial intelligence.

Update: The 3.1 405B model is available for use at https://www.llama2.ai/. It’s important to note that the information feeding this model is up-to-date only until April 2023. Therefore, while this model offers a robust and sophisticated AI resource, its understanding and responses are based on data accumulated until that specified period. This limitation could influence the model’s effectiveness in handling queries about more recent developments or events.

Meta has announced that its most ambitious model in the Llama 3 series will boast over 400 billion parameters, a massive leap in scale that is still undergoing training. In a dramatic turn of events, early benchmark data for the forthcoming Llama 3.1 models—including the 8B, 70B, and the colossal 405B—were leaked on the LocalLLaMA subreddit today. The preliminary results suggest that the Llama 3.1 405B model could potentially surpass the performance of the current industry leader, OpenAI’s GPT-4o, across several critical AI benchmarks.

Should the Llama 3.1 405B model indeed surpass GPT-4o, it would represent the first instance of an open-source model eclipsing a leading closed-source LLM.

Benchmarks GPT-4o Meta Llama-3.1-405B Meta Llama-3.1-70B Meta Llama-3-70B Meta Llama-3.1-8B Meta Llama-3-8B boolq 0.905 0.921 0.909 0.892 0.871 0.82 gsm8k 0.942 0.968 0.948 0.833 0.844 0.572 hellaswag 0.891 0.92 0.908 0.874 0.768 0.462 human_eval 0.921 0.854 0.793 0.39 0.683 0.341 mmlu_humanities 0.802 0.818 0.795 0.706 0.619 0.56 mmlu_other 0.872 0.875 0.852 0.825 0.74 0.709 mmlu_social_sciences 0.913 0.898 0.878 0.872 0.761 0.741 mmlu_stem 0.696 0.831 0.771 0.696 0.595 0.561 openbookqa 0.882 0.908 0.936 0.928 0.852 0.802 piqa 0.844 0.874 0.862 0.894 0.801 0.764 social_iqa 0.79 0.797 0.813 0.789 0.734 0.667 truthfulqa_mc1 0.825 0.8 0.769 0.52 0.606 0.327 winogrande 0.822 0.867 0.845 0.776 0.65 0.56

As you can see above, leaked benchmarks reveal that Meta’s Llama 3.1 models outshine OpenAI’s GPT-4 in a variety of tests, establishing a new standard in several crucial areas of AI performance. Notably, Llama 3.1 excels in benchmarks such as GSM8K, Hellaswag, BoolQ, MMLU-humanities, MMLU-other, MMLU-STEM, and Winograd. However, it trails behind in the HumanEval and MMLU-social sciences tests, indicating areas where further refinement is needed.

It is critical to recognize that these benchmarks reflect the performance of the base models of Llama 3.1. The true potential of these models can be realized through instruction-tuning, a process that can significantly enhance their capabilities. The forthcoming Instruct versions of the Llama 3.1 models are expected to yield even better results, showcasing improvements across various benchmarks.

Meta AI Llama 3.1 405B surprisingly beats GPT-4oLeaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential (Image credit) Stressing out the importance of open-source initiatives

While GPT-5 may challenge Llama 3.1’s emerging dominance, the impressive performance of Llama 3.1 against GPT-4 underscores the growing influence and capability of open-source AI initiatives.

“We are embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development. The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding,” stated Meta in a blog post when launching Llama 3.

The significance of open-source AI cannot be overstated. By making their advanced models accessible to the public, Meta not only democratizes technology but also taps into the collective intelligence and diverse perspectives of the global developer community. This approach contrasts sharply with closed-source models, which are typically accessible only to a select group of users and researchers, thereby limiting the potential for widespread innovation and enhancement.

Featured image credit: Penfer/Unsplash