# GPU

* [Graphics Processing Unit](https://en.wikipedia.org/wiki/Graphics_processing_unit)

* <https://www.omnisci.com/technical-glossary/cpu-vs-gpu>

* GPUs offer many cores, but narrow instruction set and lower clock speed

* VS video card: Video Card generated feed of output images; has a GPU in its core. Also usually has dedicated RAM (unlike integrated video cards that share RAM with the rest of the system).

* GPUs offer many cores, but narrow instruction set and lower clock speed

* Major brands: NVIDIA (GeeForce), AMD Radeon, Intel.

### CUDA

* CUDA - NVIDIA's platform and API for accessing their GPU's instruction sets. CUDA-powered GPUs also support open standards like OpenMP or OpenCL.

* Natively supports C, C++, Fortran. Third-party wrappers exist for Python, Ruby and many other languages.

* Parallelize functions by making a function doing only a part of the job depending on its thread id.

* Threads are grouped into blocks; blocks form a grid.

* Lower-level [Driver API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html) and higher-level [Runtime API.](https://docs.nvidia.com/cuda/cuda-runtime-api/index.html)

* <https://developer.nvidia.com/how-to-cuda-c-cpp>

### DL

Thinks like convolution or even matrix multiplication can easily be parallelized to run much faster on GPU.
