Close Menu
Technology News & TrendsTechnology News & Trends

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Spies hack Wi-Fi networks in far-off land to launch attack on target next door

    December 2, 2024

    QNAP firmware update leaves NAS owners locked out of their boxes

    December 2, 2024

    Found on VirusTotal: The world’s first UEFI bootkit for Linux

    December 2, 2024
    Facebook X (Twitter) Instagram
    Technology News & TrendsTechnology News & Trends
    Facebook X (Twitter) Instagram
    SUBSCRIBE
    • Home
    • Biz & IT

      Spies hack Wi-Fi networks in far-off land to launch attack on target next door

      December 2, 2024

      QNAP firmware update leaves NAS owners locked out of their boxes

      December 2, 2024

      Found on VirusTotal: The world’s first UEFI bootkit for Linux

      December 2, 2024

      Code found online exploits LogoFAIL to install Bootkitty Linux backdoor

      December 2, 2024

      Google and Kairos sign nuclear reactor deal with aim to power AI

      December 2, 2024
    • Science

      Spies hack Wi-Fi networks in far-off land to launch attack on target next door

      December 2, 2024

      QNAP firmware update leaves NAS owners locked out of their boxes

      December 2, 2024

      Found on VirusTotal: The world’s first UEFI bootkit for Linux

      December 2, 2024

      Code found online exploits LogoFAIL to install Bootkitty Linux backdoor

      December 2, 2024

      Google and Kairos sign nuclear reactor deal with aim to power AI

      December 2, 2024
    • Technology

      “Havard”-trained spa owner injected clients with bogus Botox, prosecutors say

      November 22, 2024

      The next Starship launch may occur in less than two weeks

      November 22, 2024

      For fame or a death wish? Kids’ TikTok challenge injuries stump psychiatrists

      November 22, 2024

      Nearly three years since launch, Webb is a hit among astronomers

      November 22, 2024

      Airborne microplastics aid in cloud formation

      November 22, 2024
    • Gaming

      Bazzite is the next best thing to SteamOS while we wait on Valve

      November 20, 2024

      Halls of Torment is Diablo cranked up to 50,000 kills/hour

      November 20, 2024

      GOG’s Preservation Program is the DRM-free store refocusing on the classics

      November 20, 2024

      How Valve made Half-Life 2 and set a new standard for future games

      November 20, 2024

      Dragon Age: The Veilguard and the choices you make while saving the world

      November 20, 2024
    • Gadgets

      Apple’s first Mac mini redesign in 14 years looks like a big aluminum Apple TV

      November 20, 2024

      GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini

      November 20, 2024

      Microsoft finally releases generic install ISOs for the Arm version of Windows

      November 20, 2024

      I, too, installed an open source garage door opener, and I’m loving it

      November 20, 2024

      Review: Amazon’s 2024 Kindle Paperwhite makes the best e-reader a little better

      November 20, 2024
    Technology News & TrendsTechnology News & Trends
    You are at:Home » New secret math benchmark stumps AI models and PhDs alike
    Biz & IT

    New secret math benchmark stumps AI models and PhDs alike

    November 20, 2024Updated:November 22, 2024No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”

    A chart showing AI model success on the FrontierMath problems, taken from Epoch AI's research paper.

    A chart showing AI models’ limited success on the FrontierMath problems, taken from Epoch AI’s research paper.


    Credit:

    Epoch AI

    To aid in the verification of correct answers during testing, the FrontierMath problems must have answers that can be automatically checked through computation, either as exact integers or mathematical objects. The designers made problems “guessproof” by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses.

    Mathematician Evan Chen, writing on his blog, explained how he thinks that FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO). Problems in that competition typically require creative insight while avoiding complex implementation and specialized knowledge, he says. But for FrontierMath, “they keep the first requirement, but outright invert the second and third requirement,” Chen wrote.

    While IMO problems avoid specialized knowledge and complex calculations, FrontierMath embraces them. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.

    The organization plans regular evaluations of AI models against the benchmark while expanding its problem set. They say they will release additional sample problems in the coming months to help the research community test their systems.

    Views: 205
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleIBM boosts entire quantum computing stack
    Next Article Anthropic hires its first “AI welfare” researcher

    Related Posts

    Biz & IT

    Spies hack Wi-Fi networks in far-off land to launch attack on target next door

    December 2, 2024
    Biz & IT

    QNAP firmware update leaves NAS owners locked out of their boxes

    December 2, 2024
    Biz & IT

    Found on VirusTotal: The world’s first UEFI bootkit for Linux

    December 2, 2024
    Add A Comment

    Comments are closed.

    Technical Analysis for AAPL by TradingView
    Demo
    Top Posts

    Spies hack Wi-Fi networks in far-off land to launch attack on target next door

    December 2, 2024

    QNAP firmware update leaves NAS owners locked out of their boxes

    December 2, 2024

    Found on VirusTotal: The world’s first UEFI bootkit for Linux

    December 2, 2024
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    85
    Featured Reviews

    Pico 4 Review: Should You Actually Buy One Instead Of Quest 2?

    January 15, 2021 Featured Reviews 2 Mins Read
    8.1
    Trends

    A Review of the Venus Optics Argus 18mm f/0.95 MFT APO Lens

    January 15, 2021 Trends 2 Mins Read
    8.9
    Featured Reviews

    DJI Avata Review: Immersive FPV Flying For Drone Enthusiasts

    January 15, 2021 Featured Reviews 6 Mins Read

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Spies hack Wi-Fi networks in far-off land to launch attack on target next door

    Tablet PC Market to Witness Exponential Growth by 2028, Sources Say

    Save $25 on Philips Wired Headphone For A Great Sounding Over-Ear Headphone

    Our Picks

    Spies hack Wi-Fi networks in far-off land to launch attack on target next door

    QNAP firmware update leaves NAS owners locked out of their boxes

    Found on VirusTotal: The world’s first UEFI bootkit for Linux

    Subscribe to Updates

    Stay updated with the latest breakthroughs in technology, innovation, and business trends from Faralogic.

    Technology News & Trends
    Facebook X (Twitter) Instagram Pinterest LinkedIn
    • User Agreement
    • Terms and Conditions
    • Disclaimer
    • About Us
    © 2025 FARALOGIC.

    Type above and press Enter to search. Press Esc to cancel.