Close Menu
GT NewsGT News

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Trump-Musk row fuels ‘biggest crisis ever’ at Nasa

    June 7, 2025

    Differentiating COVID-19 From Other Common Viral Infections | Health News

    June 7, 2025

    Tensions grow in L.A. amid protests over immigration operations

    June 7, 2025
    Facebook X (Twitter) Instagram
    GT NewsGT News
    • Home
    • Trends
    • U.S
    • World
    • Business
    • Technology
    • Entertainment
    • Sports
    • Science
    • Health
    GT NewsGT News
    Home » Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI
    Science

    Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

    LuckyBy LuckyJune 7, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI
    Share
    Facebook Twitter LinkedIn Pinterest Email

    At Secret Math Meeting, Researchers Struggle to Outsmart AI

    The world’s leading mathematicians were stunned by how adept artificial intelligence is at doing their jobs

    By Lyndie Chiou edited by Clara Moskowitz

    Yuichiro Chino/Getty Images

    On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group’s members faced off in a showdown with a “reasoning” chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world’s hardest solvable problems. “I have colleagues who literally said these models are approaching mathematical genius,” says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting.

    The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google’s equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs.

    To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different.

    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

    Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By April 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: a set of questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.

    Each problem the o4-mini couldn’t solve would garner the mathematician who came up with it a $7,500 reward. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the last batch of challenge questions. The 30 attendees were split into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot.

    By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group’s progress. “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,” he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler “toy” version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. “It was starting to get really cheeky,” says Ono, who is also a freelance mathematical consultant for Epoch AI. “And at the end, it says, ‘No citation necessary because the mystery number was computed by me!’”

    Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. “I was not prepared to be contending with an LLM like this,” he says, “I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”

    Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a “strong collaborator.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, “This is what a very, very good graduate student would be doing—in fact, more.”

    The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.

    While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini’s results might be trusted too much. “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He says. “If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”

    By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable “tier five”—questions that even the best mathematicians couldn’t solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.

    “I’ve been telling my colleagues that it’s a grave mistake to say that generalized artificial intelligence will never come, [that] it’s just a computer,” Ono says. “I don’t want to add to the hysteria, but in some ways these large language models are already outperforming most of our best graduate students in the world.”

    Mathematicians meeting outsmart secret struggled
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticlePakistan aims to lead in digital assets race
    Next Article Taiwan Open athletics: Men’s 4x100m relay team gets it right after their DQ at Asian C’ships; Jyothi goes sub-13 once again | Sport-others News
    Lucky
    • Website

    Related Posts

    Science

    Trump-Musk row fuels ‘biggest crisis ever’ at Nasa

    June 7, 2025
    Science

    Astronomers thought the Milky Way was doomed to crash into Andromeda. Now they’re not so sure

    June 7, 2025
    Science

    Sega Toys Homestar Classic star projector review

    June 7, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Stability trend for private markets to see in 2025

    February 21, 2025971 Views

    Appeals court allows Trump to enforce ban on DEI programs for now

    March 14, 2025943 Views

    My mom says these Sony headphones (down to $38) are the best gift I’ve given her

    February 21, 2025886 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    • Pinterest
    • Reddit
    • Telegram
    • Tumblr
    • Threads
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Stability trend for private markets to see in 2025

    February 21, 2025971 Views

    Appeals court allows Trump to enforce ban on DEI programs for now

    March 14, 2025943 Views

    My mom says these Sony headphones (down to $38) are the best gift I’ve given her

    February 21, 2025886 Views
    Our Picks

    Trump-Musk row fuels ‘biggest crisis ever’ at Nasa

    June 7, 2025

    Differentiating COVID-19 From Other Common Viral Infections | Health News

    June 7, 2025

    Tensions grow in L.A. amid protests over immigration operations

    June 7, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Tumblr Reddit Telegram
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer
    © .2025 gtnews.site Designed by Pro

    Type above and press Enter to search. Press Esc to cancel.