"Gemini 3 Pro’s Hallucination Rate in the evaluation is 88%, the same as Gemini 2.5 Pro and Gemini 2.5 Flash. This suggests Gemini 3 Pro made substantial gains in knowledge but not as material gains in its tendency to hallucinate."
Was the model rushed or did they place all their chips on scaling go brrrr!!
Gemini 3 Pro just took the #1 spot in our new AA-Omniscience Index — but it is a nuanced story
AA-Omniscience is our new knowledge and hallucination eval. Gemini 3 Pro’s leadership is driven by its high Accuracy (percentage correct); the model scored a massive 14 points higher than the next highest accuracy model, Grok 4. Gemini 3 Pro’s Hallucination Rate in the evaluation is 88%, the same as Gemini 2.5 Pro and Gemini 2.5 Flash. This suggests Gemini 3 Pro made substantial gains in knowledge but not as material gains in its tendency to hallucinate.
We measure Hallucination Rate based on how often the model answers incorrectly when it should have refused, defined as the proportion of wrong answers out of all non-correct attempts. In AA-Omniscience, we found there was little correlation between Accuracy and Hallucination Rate.
Additionally, we found there is a high correlation between the size of open weights models and Accuracy (but not Hallucination Rate). As such, Gemini 3 Pro’s very high Accuracy suggests it is a very large model.
See below for further details regarding AA-Omniscience 👇

18.3萬
457
本頁面內容由第三方提供。除非另有說明,OKX 不是所引用文章的作者,也不對此類材料主張任何版權。該內容僅供參考,並不代表 OKX 觀點,不作為任何形式的認可,也不應被視為投資建議或購買或出售數字資產的招攬。在使用生成式人工智能提供摘要或其他信息的情況下,此類人工智能生成的內容可能不準確或不一致。請閱讀鏈接文章,瞭解更多詳情和信息。OKX 不對第三方網站上的內容負責。包含穩定幣、NFTs 等在內的數字資產涉及較高程度的風險,其價值可能會產生較大波動。請根據自身財務狀況,仔細考慮交易或持有數字資產是否適合您。


