Google Gemini score on Humanity’s Last Exam by June 30? - AI Odds Analysis
All
Outcomes
Market
Price
AI Fair
Value
Value
Edge
50%+
YesNo
45%+
YesNo
60%+
YesNo
55%+
YesNo
40%+
YesNo
AI Insights:
03.16 14:26 UpdatedFair Value Reasoning:
With only 14 days left until the March 31 settlement, the market's pricing of the 50%+ option (40c) appears overly pessimistic, anchored to the Gemini 3.1 Preview score of 44.7%. However, the Deep Think model has already scored 48.4% (providing a massive safety cushion for the 45%+ option, FV 95c), and leaks suggest the Gemini 3.1 Pro GA version has reached 51.4%. Although the official leaderboard has not yet updated, given Google's tendency for end-of-quarter releases, the probability of officially submitting a >50% score is significantly higher than the market implies. Thus, a bullish rating is maintained for the 50%+ option (FV 65c).
Sign up to view more information
Rule Risk
There is a severe conflict between the title and the rules. The title suggests a deadline of 'June 30', but the rules explicitly state the resolution cutoff is 'March 31, 2026'. This discrepancy removes three months from the expected window, creating a massive trap for traders relying on the title for timeline estimation. The rule text must be prioritized.
Hedging
GOOGL
Gemini 3's score on HLE, a benchmark designed to test AGI limits, is a key proxy for Google's core AI competitiveness. An exceptionally high score (e.g., >60%) would be viewed as a strong counterstrike against OpenAI, significantly boosting investor confidence in Google's technical moat and likely causing a medium-impact movement in the stock price.
Divergence
Significant divergence exists. The prediction market (40c) leans towards Google failing to submit a >50% model by March 31, influenced heavily by the underwhelming Preview performance. However, tech community leaks and prior analysis (Fair Value ~65c-85c) suggest the Gemini 3.1 GA version is capable of 51.4%. The market price reflects a pessimistic 'what you see is what you get' view, ignoring the potential upside of an end-of-quarter software release.