Surprising details about the Grok-3 release:
* Their current cluster already has 200k H100s/H200s. They initially reached out to datacenter facilities to ask how long for them to build them a 100k cluster and the timelines were 12-24 months, which was too slow, Elon said they'd definitely lose if they went that route.
So they found an abandoned factory in Memphis, an empty shell, and built custom electrical / cooling systems, using portable generators, Tesla packs to smoothen out the power spikes (due to the fact they're using synchronous gradient updates), etc.
It took them ~122 days to build the whole thing e2e with 100k H100s. And additional ~90 days to add 100k more. No one has ever done something like this.
* Elon announced they're building a new ~1.2 GW cluster of GB200s/GB300s - this is OOM larger than any other datacenter in the world, and their current datacenter is already the largest single cluster in the world.
* Igor said that while they were running Grok-3, AI engineers used to go to the cluster physically and plug off a node to make sure the run is robust to such pertrubations -> this is one of the things Elon does great, reduces barriers between designers/engineers, engineers/datacenter technicians, etc.
* Grok-3 is the first model to pass 1400 score on the arena :O
The scary obvious thing here is that due to the culture the team has, Elon's capability to attract capital and talent, and the rate of progress, I don't think anyone will be able to compete with them.
* They said they'll open-source Grok-2 as soon as Grok-3 is stable, in a few months. And they plan to keep that strategy going forward, open-sourcing last generation but still managing to stay competitive. They hide the chain of thought the same way OpenAI did.
Original video here: https://x.com/elonmusk/status/1891700271438233931