The graphics processing unit (GPU) is a processing unit designed to handle graphics (2D and 3D) and video more efficiently. Originally designed for the gaming industry, GPUs are now frequently used as an accelerator for machine learning (ML) and artificial intelligence (AI), as well as still functioning as graphics processor in automotive screens/infotainment systems, desktop computers, supercomputers and mobile phones. The GPU’s use in AI/ML inferencing systems may eventually be limited by how power hungry it is and will someday be eclipsed by more purpose built chips for inferencing. GPUs are programmed with vector based languages such as CUDA (from Nvidia). A GPU may be a single integrated circuit or IP added to an SoC or ASIC. GPUs are also found in clusters in ML/AI acceleration.

The basic structure of a GPU is better memory and more math capability than CPUs. GPUs have large numbers of MACs and high-speed memory interfaces. They can perform the necessary computations much faster than a general-purpose CPU by performing in parallel using many more ALUs (arithmetic logic units). The downside is that GPUs tend to utilize floating-point arithmetic, which is well beyond the needs of AI algorithms.