AI (Artificial Intelligence) is a hot topic these days. But it was not always the case. For example, in 1995 when I worked on my postgraduate diploma in AI, I often had to answer questions such “what is that?”, “why would you do that?”, etc. And I loved to answer along the lines “you do AI when you lack natural intelligence!”. Few SciFi movies later and a spectacular increase in cheap compute power and data availability, everyone does AI nowadays. For the non-experts, I would make a quick clarification: the AI everyone brags about is what I call a weak AI, to be opposed to strong AI – which I consider as being the true AI. Strong AI is way out of our reach, and you can sleep without fear, Skynet will not emerge tomorrow. And if anyone claims otherwise, always seek the benefit of seeding fear/false ideas in the public’s mind (besides ignorance of course).

Machine Learning

That being said, weak AI is very useful and does work well … assuming it if has been programmed the right way. In most cases, ML (Machine Learning) is the technique used to leverage the excellent pattern-matching capabilities of ANN (Artificial Neural Network). But the detection quality of such ANN depends upon its training. And that’s the hard part of ML. The process consists of inputting a large amount of well-chosen data and grading the output of the ANN. The more control you apply and the better the data quality is, the better the ANN’s performance will be in fine. In other words, it is really garbage in, garbage out! This means for example that an ANN excelling in pattern-matching handwriting will fail to identify anything else. And this is also true for other weak AI techniques such as Expert Systems. But when ANNs are well trained (since we cannot say programmed as we usually understand it), they can be very handy!

Do Weak AI

There are many good ML frameworks available out there: TensorFlow and Caffe are among the best known and supported. You may have also heard that GPUs (or FPGAs, really any form of custom hardware) can be used to speed up some ML algorithms. Although that’s true, when you want to use ANNs (for CV computer vision for example), it is not easy to implement such HW systems for a maker. On the other hand, there are many affordable and quite potent embedded platforms that you may want to use in a project. In my case, I’ve picked a Raspberry Pi 3 Model B. But in itself, the PI is not capable enough to use an ANN to do CV. This is where an accelerator such as the Movidius NCS (Neural Compute Stick) comes handy!

Plan A Couple of Days for Setup

The Intel NCS comes as a large aluminum USB stick (72.5 x 27 x 14mm) to connect to a USB 2.0 or better, 3.0 interface. If you plan to use multiple of them (yes you can!) you will need a USB hub or so. The SDK and to interact with it requires Linux (Ubuntu 16.04+). I tried to build the SDK under Windows 10 (in the embedded Linux – Linux 3.4.0+ #1 PREEMPT x86_64 x86_64 x86_64 GNU/Linux), but failed to compile the source code in a reasonable time (multiple days on a 36 CPU system…). You can use a VM as well as a native system. This includes the Raspberry Pi (3B), running Raspbian 9 (stretch). The installation/build will require some time – including the build of OpenCV – and ~5GB of storage space. So, you may have to image your SD card into a larger one prior and expand the root partition. I had to do so, going from 16G to 128GB (but 32GB should be OK). In addition, you will need a camera (USB, the Pi camera is not working as-is with the SDK samples). But that’s really all you will need (with stable and fast Internet access).


HW Accelerator

At the heart of the NCS, there is an Intel Myriad 2 Vision Processing Unit (VPU) @ 600 MHz. It can achieve ~150 GFLOPS while consuming less than 1W of power! Quite impressive and welcome for an embedded design. To achieve such performance/power, the Myriad architecture is a patchwork of dedicated and optimized IPs, among which an impressive array of 12 x 128-bit vector VLIW units. As well as a couple of RISC cores, one of them us running the code that you offload to it (under the control of an RTOS). The best way of wrapping your head around this piece of HW is to think of it as a dedicated HW accelerator. Once you did the hard work of training your ANN, you use the SDK tools (mvNCCheck mvNCCompile mvNCProfile) to verify, compile and optimize it. Once that’s done, use the API (call it from C/C++ or Python3) to upload and run the ANN on the NCS! And you can add as many NCSs your application needs (and the platform has USB ports available) to scale the performance. I’ve tested the examples shipped with the SDK and although they worked well, the models were under optimal for my planned use. This was expected, and really just shows you that the biggest effort resides in the training! But, having such HW available for a very reasonable price (<$100, available from Intel, Google, etc.) is a blessing for the makers and anyone interested to look into this AI hype by themselves!


In the meantime, I wish you and your beloved a Happy new year 2018!

example of profiling output
mvNCProfile tiny-yolo-v1.prototxt - s 12
mvNCProfile v02.00, Copyright @ Movidius Ltd 2016
**** WARNING: using empty weights ****
/usr/local/bin/ncsdk/Controllers/ UserWarning: You are using a large type. Consider reducing your data sizes for best performance
"Consider reducing your data sizes for best performance\033[0m")
USB: Transferring Data...
Time to Execute : 172.14 ms
USB: Myriad Execution Finished
Time to Execute : 152.55 ms
USB: Myriad Execution Finished
USB: Myriad Connection Closing.
USB: Myriad Connection Closed.
Network Summary
Detailed Per Layer Profile
Bandwidth time
# Name MFLOPs (MB/s) (ms)
0 data 0.0377777.4 0.003
1 scale1 173.4 1174.7 8.800
2 pool1 3.2 791.1 7.743
3 scale2 462.4 883.1 15.620
4 pool2 1.6 928.4 3.299
5 scale3 462.4 658.2 10.525
6 pool3 0.8 956.5 1.601
7 scale4 462.4 411.3 8.722
8 pool4 0.4 948.2 0.807
9 scale5 462.4 198.7 11.506
10 pool5 0.2 917.5 0.417
11 scale6 462.4 354.3 8.782
12 pool6 0.1 875.4 0.219
13 scale7 462.4 791.3 11.918
14 scale8 231.2 577.9 9.278
15 fc9 36.9 2144.0 16.415
Total inference time 115.65
Generating Profile Report 'output_report.html'...

Extract of Python code

def main():
# Set logging level and initialize/open the first NCS we find
mvnc.SetGlobalOption(mvnc.GlobalOption.LOG_LEVEL, 0)
devices = mvnc.EnumerateDevices()
if len(devices) == 0:
print('No devices found')
return 1

device = mvnc.Device(devices[0])

# Load graph from disk and allocate graph via API
with open(tiny_yolo_graph_file, mode='rb') as f:
graph_from_disk =
graph = device.AllocateGraph(graph_from_disk)

# Read image from file, resize it to network width and height
# save a copy in img_cv for display, then convert to float32, normalize (divide by 255),
# and finally convert to convert to float16 to pass to LoadTensor as input for an inference
input_image = cv2.imread(input_image_file)
input_image = cv2.resize(input_image, (NETWORK_IMAGE_WIDTH, NETWORK_IMAGE_HEIGHT), cv2.INTER_LINEAR)
display_image = input_image
input_image = input_image.astype(np.float32)
input_image = np.divide(input_image, 255.0)

# Load tensor and get result. This executes the inference on the NCS
graph.LoadTensor(input_image.astype(np.float16), 'user object')
output, userobj = graph.GetResult()

# filter out all the objects/boxes that don't meet thresholds
filtered_objs = filter_objects(output.astype(np.float32), input_image.shape[1], input_image.shape[0]) # fc27 instead of fc12 for yolo_small
print('Displaying image with objects detected in GUI')
print('Click in the GUI window and hit any key to exit')

# display the filtered objects/boxes in a GUI window
display_objects_in_gui(display_image, filtered_objs)

# Clean up