In the first article in this series co-produced with Mouser Electronics, we explored the range of FPGA devices produced by Xilinx and discussed the benefits of adopting such a system for developers, engineers, and end-users alike. Now, let’s dig a little deeper and discover what makes an FPGA tick.
The programmable logic found in FPGAs is an excellent solution for implementing parallel processing structures. Although Programmable Logic (PL) is ideal for dealing with issues such as finite impulse response filters, image processing pipelines, and motor control algorithms, sometimes serial processing is necessary.
Situations in which this is the case include implementing communication protocols, graphical user interfaces or control, configuration, and status reporting of IP blocks. Serial processing is also essential if we want to work with advanced open-source frameworks and languages such as TensorFlow, OpenCV and Python.
Aided by programmable logic, there are several options open to us if we want to implement embedded processors with programmable logic devices. Taking a broad view, we can define these in two discrete groups:
- Heterogeneous System-on-Chip: Combining programmable logic with a processing system, the processing solution of these heterogeneous system-on-chip solutions is tough on the device’s silicon. This solution consequently offers outstanding performance but only limited configuration flexibility because of the processing solution.
- Soft-Core Embedded Processors: Programmable logic resources such as flip flops (FF), look-up tables and BRAMs (Block RAM) are used to implement soft-core processors. Consequently, the processors offer more configuration possibilities, but their performance is often negatively impacted.
As we will discover, both solutions – heterogeneous SoC and soft-core embedded – offer a variety of use cases across several exciting applications.
Additionally, it is possible to implement additional soft-core processors in the programmable logic of heterogeneous SoCs. This is not unusual and can be used to create a big.LITTLE system that enables time for essential tasks to be off-loaded.
Embedded Processors in Xilinx
The Zynq-7000 SoC and Zynq MPSoC product families in the Xilinx range both offer embedded processors. Devices from these lines offer genuinely heterogeneous processing systems on the same silicon. In architectural terms, the processor system initially boots in the manner of a traditional processor before subsequently configuring the programmable logic.
First introduced in the Zynq-7000 SoC, Xilinx’s product combines programmable logic with dual or single-core 32-bit Arm Cortex-A9 processors.
Unsurprisingly, the processing system also provides peripherals that can be used for both volatile and non-volatile memory, as well as a number of interfaces such as Ethernet, UART and CAN.
The Cortex-A9 cores also include a floating-point unit and a NEON engine (or “MPE” Media Processing Engine) in order to support high-performance applications. Large data sets can be processed in parallel, thanks to the NEON engine, using a single instruction against multiple data (SIMD).
Image and audio processing benefit from this in particular, as do similar applications in which data sets need to be processed using simple instructions (e.g. multiply and add) repetitively, with little control code. In such applications, performance can be noticeably improved by leveraging the SIMD unit.
Advanced eXtensible Interfaces (AXI) are used to effect data transfer between the processing system and the programmable logic. This allows either the processor system or the programmable logic to initiate the transaction. Data can thus be easily transferred to and from the processor system’s DDR memory.
Because it combines the processing system and programmable logic in this way, the Zynq-7000 series is an exceptional choice for applications such as image processing, robotics, and augmented reality that require both serial and parallel processing.
Both sides of this pairing can be adapted to improve connectivity and make use of the support offered to a broad range of frameworks and applications. Central elements or algorithms can be accelerated using programmable logic, while the processing system benefits from Embedded Linux solutions.
Combining PS (processing system) and PL provides for a more responsive and deterministic solution. The table that follows provides a simple illustration based on implementing AES encryption.
Operating System | Processor System Clocks | PS Clocks with Programmable Logic | Reduction in Processing Time |
Baremetal | 28574 | 7104 | 75% |
FreeRTOS | 28368 | 7104 | 75% |
Linux | 36662 | 15644 | 54.8% |
Processing capabilities underwent a major increase as the Zynq-7000 SoC evolved to become the next-generation Zynq MPSoC, and the latest logic fabric was added. Heterogeneous processors were included for the first time, giving developers the opportunity to deal with multiple challenges within the same device.
The Zynq MPSoC processing system incorporates:
- Application Processing Unit – quad or dual 64-bit Arm Cortex-A53 processors
- Real Time Processing Unit – dual lockstep 32-bit Arm Cortex-R5 processors
- Platform Management Unit – Triple Modular Redundant 32-bit MicroBlaze processor, implemented in silicon
- Graphics Processor Unit – Arm Mali-400 MP GPU
The MPSoC processing system includes four processing groups available to the developer for programming but also offers a configuration security processor that allows engineers to implement safety and security processing and security event responses.
Having such a broad array of processing solutions allows for single-chip solutions to be created for many applications. In the automotive field, for example, complex algorithms and user interfaces can be implemented using the APU and GPU while real-time control and vehicle control interfacing can utilise the RPU, designed and certified for ISO26262 or IEC6508 applications.
AXI interfaces are also used to enable communication between the PS and PL, although this time they replace 32-bit interfaces. 128-bit interfaces increase the throughput between PS and PL to a significant degree.
High-performance vision-based machine learning applications, such as those often used in automotive or other edge-based solutions can be implemented as a result of this high bandwidth capability. The Zynq-7000 SoC and Zynq MPSoC class of devices consequently offer the highest performance processor systems around.
To see this at work, consider the image processing application example at Figure 1. Image data is transferred between the processor system and the programmable logic to implement the desired algorithm.
Soft-Core Processors – spoilt for choice!
An unlimited choice exists in the Xilinx ecosystem for soft-core processors. The FF, LUTs (Local User Terminal), and RAMs of Xilinx FPGAs can be used to implement any processor described in RTL.
The most popular choices include:
- MicroBlaze – a 32-bit processor, a range of configurations from the controller to full MMU (Memory Management Unit) support capable of running embedded Linux are possible;
- Arm Cortex-M1 – with a small logic footprint and great code density, courtesy of the Thumb Instruction set, this is a 32-bit FPGA implementation of the popular Cortex-M0;
- Arm Cortex-M3 – another 32-bit implementation, this time of the Cortex-M3 processors. Full support of MMU and OS are on offer alongside good code density derived from Thumb/Thumb2 instruction set support. A popular choice for Internet of Things applications.
- RISC-V – open-source, 32/64/128-bit instruction set. RISC-V compliant implementations are available from a number of IP vendors for use in Xilinx FPGAs. Highly customizable, like MicroBlaze, RISC-V can also run embedded operating systems including Linux.
Although a soft-core processor inevitably provides less performance than a hard silicon instantiation, the greater configuration possibilities and adaptability of the soft-core option means that a much more highly customized solution can be implemented.
A soft-core processor can also be portable, covering the needs of several devices or even vendors (depending upon the precise selection of processor).
As with hard silicon processors, AXI is often the interface of choice to connect peripherals to soft-core equipment. In this context, “peripherals” includes DDR memory interfaces, UARTs, and popular processor interfaces such as I2C and SPI. Figure 2 provides an illustration: a MicroBlaze processor configures and controls a high-speed image processing pipeline.
So, how does an engineer make the choice between the implementation of a hard or soft-core processor?
Performance is always a big factor, but an engineer might also consider application-specific needs, flexibility, security, resource availability, portability, and licensing.
Each of these factors will have a different weight for every individual application, but these are the factors designers and developers should think about when determining the best choice for their particular situation.
To usefully compare processor capabilities, inherent processing power first has to be compared. A benchmark called Dhrystone MIPS or Millions of Instructions Per Second is used to make this comparison. Lining up the hard and soft cores being considered in a table, it becomes clear that the embedded processors offer higher clock speeds.
Processor | DMIPS/MHz | Comment |
Cortex-A53 | 2.3 | Quad or dual processors |
Cortex-A9 | 2.3 | Dual or single |
Cortex-R5 | 1.67 | Dual or lockstep |
MicroBlaze | 1.04 – 1.31 | |
Cortex-M1 | 0.8 | |
Cortex-M3 | 1.25 | |
RISC-V | 1.7 | Depends on implementation |
The needs of the application are equally important; for example, if the processor core only needs to configure IP within the processing system or implement serial communication protocols, then a soft-core-based processor may well be the prefered choice.
On the other hand, for high-performance algorithms that demand powerful processing capabilities, hard-core processors undeniably have a performance advantage.
Another major factor affecting choice may be security. This is particularly important for edge applications. Hard processing solutions like the Zynq MPSoC incorporate security measures such as a Configuration Security Unit, Secure Boot and Arm Trust Zone. Soft-core processors often require that separate security protections are added to the programmable logic.
Configuration is one of the biggest points of difference between hard- and soft-core processors: the processor dominates in a hard-core system, booting first and configuring the programmable logic to the desired specifications.
This allows the implementation of several power-saving modes, for example, powering down a processor core, peripherals, or even the entirety of the programmable logic.
By contrast, an FPGA must first be configured to instantiate a soft-core processor. Once that has successfully occurred, the processor can begin operation. The ability of the soft-core processor to implement power down/power saving schemes is therefore limited, although a lot can still be achieved using lower clock frequencies.
The big.LITTLE Approach
Nonetheless, applications that simultaneously use both hard and soft processors in the same solution are seen with increasing frequency. The image below illustrates this approach.
A big.LITTLE approach like this focuses the high-performance application processor on the high-level application and delegates real-time applications such as sensor interfacing and motor control to the soft-core processor in the programmable logic.
A big.LITTLE approach like this offers a more responsive solution than an application processor in isolation.
Creating the architecture for a big.LITTLE interface correctly also allows updates to the main application as needed, but avoids changing the code in execution within the little processor. Sensor changes and updates can also be addressed easily by updating the code running in the soft-core processor.
Software Development
The Vivado Design Suite is used in the development of both hard and soft-core processor solutions, to configure the hard-core processor or implement the soft-core solution. Configuration complete, the Xilinx Unified Software Platform, Vitis, can make use of the design description.
Vitis already supports application software for processors in the Zynq-7000 SoC, Zynq MPSoC, and MicroBlaze devices. Development aimed at third-party processors such as Arm Cortex-M1, Cortex-M3, and RISC-V will make use of toolchains provided by the processor core’s creator – Arm Keil, for example.
JTAG or Serial Wire interfaces allow users the ability to debug these solutions at the software development stage. Breakpoints, watch registers, and monitor memory locations can all be added, with the result that users can easily identify the root cause of any problem.
Conclusion
Thanks to the range of embedded processors that can be incorporated into programmable logic devices, there is a suitable processor available for each and every use case.
Making a start with these solutions is simple: embedded processor solutions, whether hard- or soft-core, can be created easily – no need to write even a single line of HDL! Vivado’s IP Integrator allows developers to focus on their application.
Join us for the next instalment in this series as we explore the design tools Xilinx has created to make integrating programmable logic a stress-free process. Or, if you want more technical information about the hardware you can use for your project, visit the Mouser website.