In the third article in this series co-produced with Mouser Electronics, we saw how to simplify FPGA implementation tools’ usage with user-friendly toolchains for design verification and programming file. Now, let’s see how FPGA Internal and External Interfacing can be implemented in programmable logic.
One of the most exciting things about FPGAs, beyond their parallel nature and the capacity for heterogeneous systems they offer, is the interfacing capability they possess, which can be described as ‘any-to-any’.
In practical terms, this means that – with the right PHY – programmable logic can provide users wirth interfacing to numerous industry standards, as well as legacy and bespoke interfaces. In many cases, due to the flexibility of the various IO cells available through Xilinx programmable logic, an external PHY is often not needed at all.
Nonetheless, external interfacing is not the only thing programmable logic designers need to consider. It’s also important to consider the most efficient methods of moving data received from the IO cells around within the programmable logic.
The Xilinx UltraScale and UltraScale+ families offer several different IO structures, including the Kintex and Virtex UltraScale families as well as the Kintex, Virtex and Zynq UltraScale+ families.
Xilinx UltraScale and UltraScale+
The UltraScale and UltraScale+ devices provide three different IO classes (not including gigabit transceivers).
- High Performance (HP) class devices are optimised for high-speed interfaces such as memory and chip-to-chip. The maximum operating voltage for a supported IO standard is 1v8.
- High Range (HR) class supports interfaces that operate at up to 3v3.
- High Density (HD) class supports interfaces that operate at up to 3v3 at data rates of up to 250 Mbps
UltraScale devices offer a mix of HP and HR IO banks, while UltraScale+ devices have a mix of HP and HD IO banks.
Each IP class can support many IO standards, from basic single-ended CMOS signals to LVDS and HSTL, which has a significant impact on integration at a system level. The wide range of support on offer covers a variety of IO standards, diminishing the need for external PHYs, which in turn reduces component count, crucial PCB area, and power.
The benefits of programmable logic IO
The benefits of programmable logic IO go beyond the obvious, though. These IO structures have numerous features that simplify implementation at the system level far beyond board area and power.
The flexible IO structures these devices offer allow control of drive strength and slew rate, which optimises performance in terms of signal integrity and EMI emissions of signals on the board. Another benefit is the on-chip termination schemes a Digitally Controlled Impedance (DCI) makes possible.
By supporting alignment between high-speed signal traces, the IO cells offer precision timing adjustments using IDELAY and ODELAY output, and the ISERDES and OSERDES structures support conversion between serial and parallel data – a feature that is especially useful when aligning data patterns on high-speed interfaces as shown below in figure one.
HR, HP, and HD IO classes undoubtedly provide significant interfacing capabilities and permit a considerable amount of data movement on and off-chip, but the highest interfacing capacities are offered by gigabit transceivers.
Such transceivers make it possible to transfer data at an ultra-fast rate. In turn, this allows the programmable logic to operate with some of the fastest serial interface standards around, including PCIe, SATA, 100G Ethernet, SDI, JESD204A/B, USB 3.0 and DisplayPort.
Xilinx GTx transceivers
Xilinx gives transceivers the denomination GTx, where x indicates the specific standards (the specific mix of GTx depends on the device family). UltraScale and UltraScale+ devices can offer data rates between 6 Gbps (GTR) and 58 Gpbs (GTM). Across the UltraScale and UltraScale+ families of devices, an excellent range of ultrafast IO is achieved, with a significant peak bandwidth.
- Virtex UltraScale+ GTY/GTM 32.75/58.0 Gbps
- Kintex UltraScale+ GTH/GTY 16.3/32.75 Gbps
- Zynq UltraScale+ GTR/GTH/GTY 6.0/16.3/32.75 Gbps
- Virtex UltraScale GTH/GTY 16.3/30.5 Gbps
- Kintex UltraScale GTH 16.3 Gbps
The ability to interface with the data volumes provided by the HD, HP, HR, and GTx options means that the devices need to be just as efficient.
Internal Data Movement
A central feature of programmable logic devices is the ability to move data – between IP blocks certainly, but also between programmable logic and processing systems.
The Xilinx environment uses the Advanced eXtensible Interface (AXI) as the primary protocol for data movement. This is a subset of the Arm AMBA bus that was developed with the primary goal of supporting implementation in programmable logic .
AXI offers three different interfacing standards to provide scalability for different use cases:
- AXI Full / Memory Map – this higher-performance memory-mapped interface supports independent read and write channels. Both channels permit bursts, optimising throughput. AXI Full is often used in programmable logic designs to implement direct memory transfers between the programmable logic and, e.g., an external DDR memory.
- AXI Lite – This is a stripped-down version of AXI Full, which provides a memory-mapped interface that can be used for configuration and control of IP blocks, but does not support burst accesses.
- AXI Stream – Offers a unidirectional stream of data from a producer to a consumer. This stream is point-to-point and contains no addressing information.
AXI Full and AXI Lite both offer independent read and write channels. The complexity on offer naturally varies according to the system selected. Read channels include two sub-channels: read address/control and read data/response.
The write channel has three sub-channels: write address, write data and write response.
AXI Stream interfaces are often used to transfer information from a single producer to a consumer, typically via IP blocks as part of a processing chain, such as image processing or signal processing, where the signal is received and processed by each IP block before being passed on to another.
Increased bandwidth is one of the great benefits here, resulting from the wide data bus widths produced when AXI interfaces are implemented in programmable logic. Data bus widths can vary between 32 and 256 Bits, assuming a clock of 400MHz, and produce data rates between 12.8 Gbps and 100 Gbps. Figure Two shows this arrangement of AXI Full, AXI Lite, and AXI Stream.
Arm TrustZone software
Arm TrustZone software can be used to lock down AXI interfaces, improving design security when working with Zynq MPSoC UltraScale+ devices. This capability is increasingly important both in the Cloud and at the Edge.
Enabling the use of these orthogonal software worlds makes it more difficult for lower-security, higher-risk applications to access registers and peripherals defined as secure.
System-level cache coherency
With heterogeneous SoCs like the Zynq MPSoC UltraScale+ device, system-level cache coherency becomes increasingly important. IO cache coherence and complete cache coherence with the ACE, ACP and HP(C) ports available on the Zynq MPSoC processing system are enabled by additional sideband signals provided by AXI.
As implemented within programmable logic, AXI is a point-to-point protocol between a producer (master) and a consumer (slave). Developers can facilitate connections between a single producer and several consumers, by using a smart interconnect.
This is particularly useful when a single AXI Lite interface is used to configure several IP blocks. Smart interconnects can also provide clock domain crossing if necessary.
The use of standard interfaces based on AXI enables both Xilinx and third-party partners to conceive and create a much larger ecosystem of IP cores. The development of custom IP using either RTL or HLS Standard interfaces also benefits.
Conclusion
Whether for traditional logic level interfaces, or high-speed gigabit serial links, the innate flexibility of programmable logic IO structures allows programmable logic designers to implement a wide variety of commonly-used interfaces directly within the device.
Other system-level benefits include improved integration, lower power, and tighter control of parameters such as signal integrity and EMI thanks to the capacity to terminate on-chip and control drive strengths and slew rates.
When used internally, AXI allows programmable logic to process data while the integration of custom–developed and third-party IP is made simple.
The result of this combination of external and internal interfacing capabilities allows programmable logic to offer solutions with ground-breaking performance.