Programming Massively Parallel Processors
The CUDA and GPU-architecture book — memory hierarchy, kernels, parallel patterns.
Picked this up after spending too long staring at GPU profilers without understanding what the hardware was actually doing. Reading the 4th edition.