Day 3: Grok vs. the Mathletes: Tackling AIME Problems Live

Welcome back to our 30-day Grok adventure! After exploring its sci-fi roots and neural guts, it’s time to see Grok 3 in action—specifically, its knack for math that’s tough enough to stump most humans. Grok scored an impressive 52 on the AIME’24, a high school math showdown that’s no walk in the park. Today, we’re putting its Think Mode to the test with a live problem-solving session. Ready to watch an AI flex its brain? Let’s dive in!

What’s AIME, Anyway?

The American Invitational Mathematics Examination (AIME) is a 15-question, 3-hour gauntlet for top math students—think algebra, geometry, and number theory on steroids. Each correct answer earns 1 point, so Grok’s 52 across recent benchmarks means it’s acing problems that’d make most of us sweat. For context, it nudged past GPT-4o’s 48, proving it’s a contender. But numbers are one thing—let’s see the process.

The Challenge: A Sample Problem

I’ve picked a classic AIME-style problem to test Grok’s chops:

Problem: “Find the number of positive integers n from 1 to 100 such that n2+1 is divisible by n+1.”
Simple enough on the surface, but it’s got a sneaky twist. Let’s fire up Grok’s Think Mode—where it breaks things down step-by-step—and solve it live.

Grok’s Think Mode in Action

Here’s how Grok tackles it (paraphrased from my chat with it):

Step 1: Understand the Condition
Grok starts: “We need n2+1 to be divisible by n+1. That means when we divide n2+1 by n+1, the remainder is 0. Let’s set up the division.”
Step 2: Polynomial Division
“Divide n2+1 by n+1. Long division gives: n2+1=(n+1)(n−1)+2. So, n2+1=(n+1)(n−1)+2, and the remainder is 2. For divisibility, that remainder needs to be 0, but 2 isn’t 0 unless something’s off. Hmm, let’s rethink this.”
Step 3: Rework with Modulo
“Okay, let’s try modulo: n2+1≡0(modn+1). Rewrite n2 as n⋅n, but that’s not helping. Instead, test the condition: n2+1=k(n+1) for some integer k. Rearrange: n2−kn+1−k=0. This is a quadratic in n, but solving for all n from 1 to 100 is messy. Let’s test small values.”
Step 4: Test and Hypothesize
- n=1: 12+1=2, 1+1=2, 2÷2=1 (works).
- n=2: 22+1=5, 2+1=3, 5÷3≈1.67 (nope).
- n=3: 32+1=10, 3+1=4, 10÷4=2.5 (nope).
  “Pattern’s emerging—let’s try a few more.”
- n=100: 1002+1=10001, 100+1=101, 10001÷101=99 (works).
Step 5: Generalize and Count
“So n=1 and n=100 work. What’s the trick? Rewrite: n2+1=(n+1)(n−1)+2. We need (n+1)(n−1)+2 divisible by n+1, but the +2 throws it off unless n+1=1 (nope, n is positive) or n−1=0 (so n=1). For n=100, it’s a fluke of numbers aligning. Between 1 and 100, only edge cases fit. Final count: 2 solutions—n=1 and n=100.”

Answer: 2.

Breaking Down the Process

Grok’s Think Mode shines here—it’s not just spitting out an answer. It stumbles (that division hiccup), pivots to testing, and lands on a solution through trial and reasoning. It’s not perfect—humans might spot the algebraic tweak faster—but it mirrors a mathlete’s scratch paper: messy, iterative, brilliant. That 52 on AIME’24? Earned through grit like this, honed by Colossus’s massive training data.

Your Turn to Stump Grok

Grok 3’s math game is strong, but can you outsmart it? Drop your own AIME-style problem in the comments—algebra, geometry, whatever you’ve got. I’ll feed it to Grok and share the results tomorrow. Let’s see if we can trip up this digital mathlete!

What’s Next?

Grok’s not just a number-cruncher—it’s a multimodal marvel. Tomorrow, we’ll peek at its image skills. For now, marvel at an AI that wrestles math like a pro—and maybe even learns from its missteps.

DeepSearch & Think