Those are some cherry picked numbers. NVL72 can do 360 fp16 pflops with sparsity, and scales pretty well with a doubling of performance for fp8 and another doubling with fp4. Point being, that a different benchmark may tell a vastly different story.
Getting 16 racks to work together is quite a feet and the engineering in the interconnects sounds like they are carrying the show. How far can it scale? Will it double the performance with a 32 rack deployment? Nvidia is not a slouch in interconnects either. I honestly don't know the scaling of a 16 rack system, but NVL72 can be deployed in a single rack. What is the performance of 16 of those?
Lastly, didn't I read on Tom's somewhere that the 910C was manufacturered by TSMC and was done so against their knowledge? How many of these chips can be sourced going forward? Catching up by deploying 4x as many chips only works if you can get 4x as many chips.