In the previous article in this series co-produced with Mouser Electronics, we saw four of the many use cases in which programmable logic plays a central role. Now, this article will consider IoT Edge AI, a subject that is currently very popular.
However, it can be viewed from two opposing perspectives: there is a tremendous difference between big, powerful, hot hardware and small, optimized, cold hardware.
Moreover, machine learning algorithms are slow and power hungry when existing hardware is used. What hardware doesn’t achieve must obviously be done with software, thus creating a bottleneck for both latency and power consumption.
Finding the right board on which to develop affordable, high-performance and low-power applications can be frustrating. Many such products are offered on many websites, always promising levels of performance, integration, and maturity that are almost never delivered in real applications, unless at the prototype stage.
The MAX78000 chip family developed by Maxim Integrated defies these expectations. Beside its other hardware characteristics, this dual-core chip offers CNN-ready specialized hardware that can be executed at different levels of power consumption.
Today, convolutional neural networks, or CNNs, execute many AI functions. CNNs have proven themselves to be accurate in interesting tasks related to image recognition. Their computation demands accelerating hardware based on FPGA, GPU or other ASIC devices in standard hardware.
More specific circuitry, together with a faster access to memory, can be achieved on IoT devices – these perform well if propelled by the correct software code.
All of these constraints may seem hard to cope with, but at least one such platform exists: on the MAX78000, with its hardware optimized to run CNN and other AI-powered applications, ‘execute’ is all you need for your IoT devices powered with AI hardware and software.
MAX78000: Architecture
The MAX78000 is an ultra-low power microcontroller with a dedicated Convolutional Neural Network (CNN) accelerator. This architecture enables the development of very power efficient AI applications in energy constrained environments.
The MAX78000 provides many options, including oscillators, clock sources, and operation modes, that allow low power applications to be developed at the edge.
The device hosts two different cores – one ARM Cortex M4 running at 100 MHz (with its FPU microcontroller), and one Risc-V unit that can execute application and control codes as well as drive the CNN accelerator. The weight RAM for direct storage of the CNN weights is 442 kB, while the flash memory is 512 kB.
MAX 78000 Evaluation kit
The MAX78000 Evaluation kit provides a platform for leveraging the MAX78000 hardware and peripherals to build new generations of artificial intelligence devices located at the Edge.
Onboard hardware includes a digital microphone, a gyroscope/accelerometer, a parallel camera module support and a 3.5in touch-enabled color TFT display.
A power accumulator for tracking device power consumption and latency times drives a secondary display. A 0.1in pin header connects uncommitted GPIO and analog inputs, a USB Micro-B connector powers the unit, and a USB-to-SPI bridge provides rapid access to onboard memory.
More information about the kit and a video of unpacking procedures can be found here.
MAX78000FTHR Application Platform
The MAX78000 product line also includes a feather board. The MAX78000FTHR Application Platform allows engineers to evaluate ultra-low-power, artificial intelligence (AI) solutions using the MAX78000 Arm Cortex processor with an integrated convolutional neural network accelerator.
The board includes the MAX20303 PMIC for battery and power management. Integrated peripherals on the board include a CMOS VGA image sensor, a digital microphone, a low-power stereo audio CODEC, a 1MB QSPI SRAM, a micro SD card connector, an RGB indicator LED, and a user pushbutton.
Maxim Integrated MAX78000FTHR Application Platform provides a power-optimized flexible platform to implement proofs-of-concept and early software development.
Never trust a benchmark you didn’t run yourself
Some information about benchmarking in low-power devices and the importance of power consumption when using convolutional neural networks (CNN) is offered here as a useful introduction to the case studies that will be discussed later in this article.
When developing an application with MAX78000, the user can switch between different operation modes, working at different clock speeds, and schedule the tasks accordingly to save power. The Power Monitor can be used in System Power mode to measure power consumption and wakeup or bootup time.
Even easy tasks such as a triple-nested matrix-multiplication loop can be power-thirsty.
The real required current depend on many factors. What number types are running: floating point or integers? Is there dedicated hardware?
Never trust somebody else’s benchmark – it’s always best to run them yourself. The dedicated power monitor installed on the MAX78000 Evaluation kit helps in performing exact benchmarking in order to make the right choice.
CNN power consumption
The core of an AI application is the inference, performed by the CNN accelerator. Depending on the application, the inference can happen continuously on incoming data or periodically at certain time intervals.
In our first case study example, the CNN inference starts once the input data is ready. The inference starts in FIFO mode in our second example (face recognition solution).
The CNN’s power consumption is measured in three phases:
- Loading weights (kernels): Occurs once, in order to load weights into the CNN memory in active mode
- Loading input data: Every time there is a new inference. In FIFO mode, this can overlap with inference
- Inference: Operates on input data and generates the classification result.
More technical information, particularly regarding implementation of the MAX78000 software with PyTorch on Docker, can be found on this dedicated Git page.
Let’s now look at three basic case studies: voice recognition (keyword spotting), face recognition (FaceID), and motion recognition.
Case Study 1: Keyword Spotting
Keyword spotting is a basic function that is often required in AI apps. The function here consists of a speech recognition algorithm using the MAX78000 Evaluation kit. The voice input is grabbed from the on-board microphone, recognized with its affordability level and eventually used as input to a program.
The voice reading is sampled at 16kHz and the threshold on 128 sample windows is monitored. Once the silence threshold is breached, the application captures one second’s worth of samples, runs CNN inference on the samples, and displays the classification result on the TFT display.
The machine learning model is built with Maxim’s development flow on PyTorch, trained with a subset of Google’s speech command dataset with 20 keywords, and deployed on the MAX78000EVKIT.
What energy efficiency level can we achieve with this application? The Arm core handles the entire task in this benchmarking example. Several modes of operation were tested. The Power Monitor in System Power Mode is used to measure the total energy in ten second intervals. T
he power in ACTIVE mode is about 20mW, regardless of silence or detection as the inference energy is a fraction of total power consumption. With some attention, the power goes down to 8.3mW (6.58mW during silence).
Afshin Niktash, software engineer at Maxim Integrated, tests Keyword Spotting with the help of a snake game.
Case Study 2: Face recognition
This demo application demonstrates the identification of subjects using facial recognition on the MAX78000. The FaceID CNN model generates embedding from live images captured from the EV kit camera.
This algorithm recognizes a face, or more than one face in the same image. The system uses voice-activated commands captured by the activation microphone and recognized by the keyword spotting algorithm discussed above.
Distance is calculated, and if it exceeds a particular threshold, the best candidate is selected. The inference is executed on a 160×120 face box of each image.
To enhance the performance of identification, the inference is executed three times per frame, as well as each time the face box moves even slightly on the picture.
In this case study, the entire task is performed by the Arm core. This goes to SLEEP mode during inference and remains in STANDBY mode between frames for about half a second. Table 8 summarizes the execution time and power of the main functions of this FaceID. Table 9 displays the time and power consumption during each operation mode.
A video starring Gokhan Bektas, software engineer at Micros, gives a simple demonstration of how to identify faces with keyword spotting using the MAX78000 Evaluation Kit. A full description of the overall process can be found on the company’s website.
Case Study 3: Motion on camera
The camera included in the MAX78000 Evaluation kit checks a certain area. The visual wake word model, or VWW, is capable of detecting the presence of a person in the image in 20 milliseconds or less.
The application made use of the Coco dataset in both training and testing processes. Quantization has an 85% accuracy and an 80-millisecond latency.The application is useful for light control, surveillance and many other tasks.
The Risc-V core is used to detect a person in the frame and to wake up the Arm core. The typical wake-up configuration code is as simple as:
Example of Arm wakeup configuration:
Wakeup from LPM on GPIO:
MXC_GCR->pm |= MXC_F_GCR_PM_GPIO_WE; // enable wakeup from GPIOs
MXC_PWRSEQ->lpwken2 |= (1 << 7); // Pb2: GPIO2.7 is selected for wakeup
Wakeup from LPM on RISC-V interrupt:
MXC_PWRSEQ->lppwen |= MXC_F_PWRSEQ_LPPWEN_CPU1; // wakeup on RISC-V interrupt
Hanlin Sun, Test Engineer from AiZip, explains the whole process in his video.
Conclusions
Professional boards that host AI-specific hardware and a very rich set of power management options are difficult to find. Testing and benchmarking a board can be almost impossible if the provider doesn’t offer the right tools for the job – tools you can run yourself.
Today’s world demands applications that run on hardware activated by real-life events such as voice, intrusion, face and others. Writing application software for the wrong hardware can be a real conundrum even for skilled coders.
The MAX78000 platform has allowed the development of real-life IoT applications that leverage the chip family’s onboard CNN-specific hardware, testing capabilities, and rich toolset, making the MAX78000 the reference choice to begin with.