The Cerebras Wafer-Scale Engine (WSE) was the largest chip ever built as of August 2019. The chip is built from a single wafer of silicon, supplied by TSMC and measures 46,225 mm2 and consists of 1.2 trillion transistors, 400,000 cores and 18 gigabytes of on-chip memory (SRAM). The size of the chip tries to work around AI limitations based on multiple graphics-processing units (GPU) systems. The solutions lose time to information bottlenecks where the WSE should, through its 400,000 cores, communicate quickly and reduce AI learning time. Speaking to Fortune, CEO and co-founder Andrew Feldman said, about WSE:
Every time there has been a shift the computing workload, the underlying machine has had to change.
As of August 2019, Cerebras started running the first customer workloads on the WSE. They didn't release any performance specifications except to say they expected the chip to pull 14-15 kilowatts.
Cerebras received $200 million in funding from Benchmark, Foundation Capital, Eclipse, Coatue, VY Capital, Altimeter, and angel investors Fred Weber, Ilya Sutskever, Sam Altman, Andy Bechtolshelm, Greg Brockman, Adam D'Angelo, Mark Leslie, Nick Mckeown, David "Dadi" Perlmutter, Salyed Atiq Raza, Jeff Rothschild, Pradeep Sindhu, and Lip-Bu Tan.
Previous to Cerebras's chip, chip manufacturers were reluctant to build chips this large. When manufacturing chips, often the silicon has imperfections which are turned into lower-grade chips or tossed out. Cerebras seeks to work around this process with redundant circuits to work around defects.
Cerebras uses TSMC's 16nm node and a 300mm wafer out of which they cut the largest square. From that, they make their 400,000 sparse linear algebra cores (SLA cores) which are designed for AI deep learning workload. The 18 gbs of SRAM becomes an aggregate 9 petabytes per second of memory workload.
Cerebras, with TSMC, developed a technique of laying thousands of links across scribe lines. This results in a chip which doesn't behave like 84 processing tiles but instead like 400,000 cores. The built in redundancy of this technique means Cerebras might reach 100% yield in their manufacturing process.
There were also the issues of thermal expansion on a chip this large, connectivity and cooling. Cerebras developed a new connector to deliver power through the chips PCB rather than across. To cool the chip, and stop any thermal throttling or thermal expansion which could crack the chip, they developed a water-cooling solution which punches water onto a copper cold-plate with a micro-fin array to pull heat off the chip. The hot water is then air-cooled in a radiator.