Tensor processing unit

Tensor processing units (or TPUs) are application-specific integrated circuits (ASICs) developed specifically for machine learning. Compared to graphics processing units, they are designed explicitly for a higher volume of reduced precision computation (e.g. as little as 8-bit precision[1]) with higher IOPS per watt, and lack hardware for rasterisation/texture mapping.[2] The chip has been specifically designed for Google's TensorFlow framework, however Google still uses CPUs and GPUs for other types of machine learning.[3] Other AI accelerator designs are appearing from other vendors also and are aimed at embedded and robotics markets.

Google has stated that its proprietary tensor processing units were used in the AlphaGo versus Lee Sedol series of man-machine Go games.[2] Google has also used TPUs for Google Street View text processing, and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day. It is also used in RankBrain which Google uses to provide search results.[4] The tensor processing unit was announced in 2016 at Google I/O, although the company stated that the TPU had been used inside their datacenter for over a year prior.[3][2]

The chip size can fit in a hard drive slot within a data center rack according to Google Distinguished Hardware Engineer Norm Jouppi.[3]

Architecture

The TPU is an 8-bit matrix multiply engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 662 mm2. The clock speed of 700 MHz and has a thermal design power of 28-40 W. It has 28 MiB of on chip memory, and 4 MiB of 32-bit accumulators taking the results of a 256x256 array of 8-bit multipliers. Instructions transfer data to or from the host, perform matrix multiplies or convolutions, and apply activation functions [5]

See also

References