Linux4Tegra with CUDA support on Nvidia Shield TV

The Nvidia Shield TV Box offers 256 CUDA cores delivering over 1 TeraFLOPs of performance for around 200 €. Great value for mobile deep learning experiments – if only a standard linux would run on it. Fortunately some people over at XDA-Developers have figured out a way to do just that. The instructions are however assuming much previous knowledge, which is why I attempted to compile a recipe here with everything in one place.

For some of the cross compiling you will either need a native Ubuntu Trusty x64 machine or a VM. As I don’t have a x64 machine with Ubuntu Trusty, I will go the VM route. I hate Virtual Box and its laggy awkward UI. Fortunately I have revently discovered vagrant, which automates and hides many of those things.

0. Installing and connecting to a Trusty x64 VM is now as easy as:

Unfortunately the memory of the VM is too small for compiling the kernel, so edit the vagrant file to add some more ram and cpus and the ability to capture sdcardreaders:

Further hints for designing filters can be obtained by:

Unmount the sd card reader from the host system and reload vagrant to restart with the new config:

1. Now we can prepare a MicroSD card with the operating system. First identify your sd card reader, init the sd card with ext4 and mount it:

Download the root file system and some additional tegra drivers from Nvidia

Finally, to fix the wifi firmware, download https://drive.google.com/open?id=0Bz5kaPQJx_AgTjZBeGUycTBfa0E and extract to host ~. The in the vm

Sync the prepared rootfs to the sdcard and unmount the disk:

You can safely remove the micro sd card now and plug it into the back of your nvidia shield tv console.

2. Unlock the bootloader

First install the ADB tools for OS X

or Linux

Enable debugging on the Shield TV

1. Goto Settings
2. Go across to About in Device
3. Go down to Build and click on it 10times until it says you are in development mode

Enable ADB over USB

1. Make sure you have performed the above steps „Enable debugging“
2. Goto Settings
3. Go Across to Developer options
4. Go down to Debugging
5. Toggle USB debugging to On

Now boot into fastboot

– Perform software shutdown on SHIELD by holding Power button for 10 seconds
– Connect USB OTG cable to SHIELD
– Start pressing power button for 3 seconds
– HDMI TV should be always connected to SHIELD

You should now be able to see your shield from your computer by typing:

if not, unplug the device, stop the adb server, add the nvidia vendor id to db_usb.ini and restart the server:

plug the device back in and try add devices again.

Now get some information about the bootloader:

If your bootloader is locked, it must be unlocked first with:

Select ‚Confirm‘ to unlock the bootloader which may take up to 2 hours for the pro device.

Now fastboot getvar all should read:

Now you have to setup Android TV again and activate debug mode again 🙁
Perform a downgrade to firmware 1.3, and root the shield while you are at it.
Register for Nvidia developer account, obtain and unzip the files, bring device into fast boot mode:

3. To build a boot.img download the patched kernel source from GoogleDrive into your host machines vagrant dir. It should be auto mounted to /vagrant. Then extract the sources:

compile the kernel:

and finally make a bootable boot.mg

4. Put the sd card in Shield TV SD card slot, Plug OTG cable between Shield TV and PC go into fast boot mode

To boot Linux 4 Tegra once run from the VM:

The Nvidia Logo will pop up and go away while ubuntu starts, which may take a good 2 min. Log in with ubuntu/ubunu. If the Ubuntu desktop is too large for your tv screen, you may want to disable overscan in your tv’s settings.

Once that works, you may want to write boot.img into the recovery spot

or even replace Android altogether by:

and then flash the original boot into the recovery spot. SSH into your shield

the password is ‚ubuntu‘.

Then first prevent the Nvidia driver for Tegra X1 being overwritten by apt-get upgrade by:

Add yourself as a new user and copy your public key, install some frequently used packages:

Now lets install the CUDA support. Download https://developer.nvidia.com/embedded/jetson-development-pack and apt-get sources and scp them to your shield.

Finally add the CUDA stuff to the path:

Now you should be able to call the CUDA compiler:

Next, tune the shield to achieve the full performance:

Now you can try out some of the examples in

Single Board Computer Benchmarks

Over the last few years, I have evaluated many single board ARM computers for mobile robotics and home automation applications. Here are some rough benchmark results comparing across different SBC generations that may help avoid some bad buys:

Raspberry Pi

CPU : ARMv6-compatible processor rev 7 (v6l)
L2 Cache :
OS : Linux 4.0.9+
C compiler : gcc version 4.6.3 (Debian 4.6.3-14+rpi1)
libc : libc-2.13.so
MEMORY INDEX : 2.528
INTEGER INDEX : 3.150
FLOATING-POINT INDEX: 2.073

Banana Pi


CPU : Dual
L2 Cache :
OS : Linux 3.4.90
C compiler : gcc version 4.6.3 (Debian 4.6.3-14+rpi1)
libc : libc-2.13.so
MEMORY INDEX : 3.448
INTEGER INDEX : 4.516
FLOATING-POINT INDEX: 3.794

Raspberry Pi 2


CPU : 4 CPU ARMv7 Processor rev 5 (v7l)
L2 Cache :
OS : Linux 3.18.5-v7+
C compiler : gcc version 4.6.3 (Debian 4.6.3-14+rpi1)
libc : libc-2.13.so
MEMORY INDEX : 4.256
INTEGER INDEX : 5.640
FLOATING-POINT INDEX: 4.786

Raspberry Pi 3


MEMORY INDEX : 7.105
INTEGER INDEX : 8.976
FLOATING-POINT INDEX: 7.601

Radxa Rock with antiquated kernel on NAND:


CPU : 4 CPU
L2 Cache :
OS : Linux 3.0.36+
C compiler : gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)
libc : libc-2.19.so
MEMORY INDEX : 9.142
INTEGER INDEX : 9.994
FLOATING-POINT INDEX: 9.965

Radxa Rock Pro with experimental 3.18 kernel on SD card:


CPU : 4 CPU ARMv7 Processor rev 0 (v7l)
L2 Cache :
OS : Linux 3.18.0-rc5+
C compiler : gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)
libc : libc-2.19.so
MEMORY INDEX : 3.463
INTEGER INDEX : 3.741
FLOATING-POINT INDEX: 3.665

Hardkernel Odroid XU4 on SD card


CPU : 8 CPU ARMv7 Processor rev 3 (v7l)
L2 Cache :
OS : Linux 3.10.82
C compiler : gcc version 4.8.4 (Ubuntu/Linaro 4.8.4-2ubuntu1~14.04)
libc : libc-2.19.so
MEMORY INDEX : 15.504
INTEGER INDEX : 15.309
FLOATING-POINT INDEX: 14.164

And my current 13″ MacBookPro 🙂


CPU : 2,8 GHz Intel Core i7
L2 Cache :
OS : Darwin 14.5.0
C compiler : Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
libc :
MEMORY INDEX : 47.880
INTEGER INDEX : 40.440
FLOATING-POINT INDEX: 83.190

You can benchmark your own SBCs with nbench by:

If you have some interesting results of your own, please don’t forget to drop a comment.

Free up space on your linux machine

Here are some useful commands to free up space on your linux machine. To display the used space of your filesystem:

To display the folders and how much space they take:

Find the 20 largest offenders on your system:

Or use the graphical command line tool ncdu:

If you are using a Debian based system, you can list the installed packages with size:

remove unwanted packages:

and uninstall their unused dependencies:

and finally

clear the packet cache.

Access TimeCapsule from Radxa Rock

Eventually you will want some more memory on your Radxa Rock than the integrated 8 GB NAND or the micro SD-Card. Here is how to tap the vast memory resources of a TimeCapsule.

Then test the fstab entry:

If it mounts correctly you are set. Else try mounting with verbose information:

Do not reboot until you verified that everything works correctly or you will be unable to boot (which is quite hard to recover if you boot from NAND).

If you need German umlaut support you can set the iocharset to utf8:

However this will fail on the most recent debian based rock images with:

mount error(79): Can not access a needed shared library
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)

because the nls_utf8.ko module is missing.