Ever since Nvidia revealed its Drive PX 2 platform, there’s been speculation and curiosity about what kind of SoC might power the deep learning solution. Nvidia has pivoted away from the smartphone, tablet, and handheld market and instead designed its next-generation SoC for self-driving cars and deep learning applications.
Parker is built on two Denver 2.0 CPU cores and a cluster of four Cortex-A57 ARM CPU cores. The GeForce GPU is a 256-core GPU, possibly in a 256:16:16 configuration (that’s cores, texture mapping units, and ROPs). Clock speed is unknown, but Nvidia claims the chip provides up to 1.5TFLOPS of processing power in FP16 workloads (the older Maxwell-based Tegra X1 was capable of 1TFLOPS of FP16 performance.)
The CPU arrangement is unusual, but Nvidia hasn’t revealed details on how Denver 2.0 differs from 1.0 and the high-level overview the company provided doesn’t give us much to go on. The original Denver CPU was a very wide, in-order superscalar core with the ability to execute up to seven instructions per clock cycle, large L1 caches, and the ability to translate ARM code into its own native format for execution. Denver 1.0 also included an ARM-compatible decoder that allowed the chip to schedule and execute native ARM code, though it’s not clear if Denver 2.0 is designed to perform this task as well. Anandtech has more details on Project Denver if you’re interested in the core. We know that Denver 2.0 is still a seven-way superscalar architecture with a 2MB L2 cache and that it connects to the Cortex-A57 through a proprietary interconnect fabric.
Nvidia’s unusual core combination could reflect several goals. First, using a quad pair of Cortex-A57 cores could allow the company to assign workloads where it makes the most sense to run them. Some multi-threaded workloads might run better across four Cortex-A57 cores than on a pair of Project Denver cores. It’s also possible that the Cortex-A57 cores are there to handle programs that aren’t a good fit for the unique Project Denver core. The third option is that the Project Denver CPUs are used for specialized application processing, possibly in concert with the GPU, while the bog-standard ARM cores handle the operating system or other routine tasks.
Any self-driving car needs support for cameras and Parker is built to handle decode and encode for 4K streams at 60 FPS. The entire SoC is backed up by a 128-bit LPDDR4 interface, which is fairly wide — most SoCs today still rely on a 64-bit interface. This implies Parker’s memory bandwidth could be up to 2x higher than Maxwell, which used a conventional 64-bit LPDDR4 interface.
Ever since Nvidia announced Parker, fans of the Nvidia Shield have wondered if the company would bring the platform to a new generation of handheld or television console. So far, Nvidia has been mum on this point.