“Embedded FPGA (eFPGA) refers to the embedding of one or more FPGAs in the form of IP in chips such as ASIC, ASSP or SoC.
Embedded FPGA (eFPGA) refers to the embedding of one or more FPGAs in the form of IP in chips such as ASIC, ASSP or SoC.
In other words, an eFPGA is a digitally reconfigurable fabric consisting of programmable logic in programmable interconnects, typically represented as a rectangular array, with data inputs and outputs located around the edges. eFPGAs typically have hundreds or thousands of inputs and outputs that can be connected to buses, data paths, control paths, GPIOs, PHYs, or any desired device.
All eFPGAs have a look-up table (LUT) as the basic building block. The LUT has N inputs to select a small table whose output represents any desired boolean function of the N inputs. Some eFPGA LUTs have four inputs, some have six. Some LUTs have two outputs. LUTs usually have flip-flops on the outputs; these can be used to store the results. These LUT register combinations usually come in quaternary form, along with carry arithmetic and shifters to efficiently implement adders.
The LUT receives all inputs from the programmable interconnect network and feeds back all its outputs to the programmable interconnect network.
In addition to LUTs, eFPGAs may also contain MACs (multiplier/accumulator blocks). They are also connected to programmable interconnect networks for more efficient digital signal processing (DSP) and artificial intelligence (AI) capabilities. For memory, there is a lot of RAM, usually in a dual-port package. As for LUTs and MACs, the connection to a programmable interconnect network is via RAM.
The eFPGA has an outer ring of input and output pins that connect the eFPGA to the rest of the SoC, and these pins also connect to a programmable interconnect network.
Software tools are used to synthesize Verilog or VHDL code to program the eFPGA logic and interconnects to implement any desired function.
eFPGAs are convenient new logic blocks that enhance the value of SoCs in many ways, including:
Extensive, fast control logic using hundreds of LUTs;
Reconfigurable network protocols;
Reconfigurable algorithms for vision or artificial intelligence;
Reconfigurable DSP for aerospace applications;
Reconfigurable accelerators for MCUs and SoCs.
In addition to the above, there are many more, which will not be introduced one by one here.
Today, there are a few eFPGA vendors, mainly Achronix, Flex Logix, Menta, and QuickLogic, plus a few smaller vendors. With these options, customers need to decide which one best meets their needs. So, how to choose? While commercial factors need to be considered, this article focuses on technical factors.
Step 1: Process Compatibility
Typically, even in the early stages of IP evaluation, companies choose foundry and process nodes. And TSMC, GlobalFoundries and SMIC are now or are developing eFPGAs for process nodes including 65nm, 40nm, 28nm, 22nm, 16nm, 14nm and 7nm.
However, not all vendors have eFPGAs for all foundries/process nodes, at least not yet. It is important to check which ones are compatible with your process via their website. You should also look to see if the eFPGA in question has been validated in the chip and reported under the NDA.
Don’t forget to check metal stack compatibility. Your choice of critical IP such as SerDes or your application may require you to use a specific metal stack, but not all eFPGA IP is compatible with all metal stacks.
Step 2: Array Size and Function
Not all eFPGA vendors can make very small-scale eFPGAs, and at the same time, not all vendors can make very large-scale eFPGAs. Also, the nature of the MAC and RAM they support may vary.
This may filter out some vendors as to whether you need hundreds of LUTs or hundreds of thousands of LUTs, and your MAC and RAM needs.
Step 3: Benchmarking with RTL
eFPGA vendors provide you with software for evaluation so you can determine (RTL) the silicon area and performance that each eFPGA can achieve. You need the eFPGA to be able to operate in the same temperature and Voltage range as the rest of the SoC, so make sure what you need is supported.
When benchmarking, it’s important to compare apples to apples. For example, you should compare each eFPGA on the same process (slow/slow or typ/typ or fast/fast) and at the same voltage and at the same temperature. You should expect software tools from eFPGA vendors that will allow you to examine performance at different process corners and voltage combinations.
Note that your RTL is for eFPGA. If RTL is adopted from a hard-wired ASIC design, there will often be 20-30 logic layers between flip-flops. If you put it in an unoptimized eFPGA, it will run very slow. In eFPGAs, LUT outputs always have flip-flops, and you can use them to add more pipelines to the RTL for higher performance in the eFPGA.
When it comes to RTL, make sure you’re testing what’s important to you.
A 16-bit adder. What you care about is how fast it runs, but if you’re not careful, what you see might surprise you. Now imagine a big eFPGA. If the adder is placed in one corner of the array with the input and output close together, the performance will be much higher than finding the adder in the middle of the array. This is because if you look at the performance from the array input to the array output, when the adder is in the middle of the array, the distance to the adder for the data input and the adder output is longer. In fact, the adder is the same and runs fast in both locations. The problem is that your test doesn’t isolate the performance of the adder, but it also adds the signal needed to reach the adder.
The image below is an example that uses a LUT for routing, the LUT speed will not change, but the delays entering and leaving the LUT through the interconnect will occur.
To deal with this effect, especially since you might be comparing two different sizes of eFPGA, what you need to do is set up registers on the input and output, which ensures that the performance you care about is measurable, independent of the size of the array and the location restrictions.
If you need the DSP or AI capabilities of a MAC, understand how each eFPGA differs in multiplier size and pipeline. If the RTL specifies an MxN multiplier, the synthesis software will ensure that the eFPGA implements it, but it may span two or more multipliers to achieve the desired effect. This is important if you need MxN. However, if trying to compare apples-to-apples multiplication performance, you’ll want to get the RTL to use a multiplier size appropriate for all the eFPGAs you’re evaluating.
Figure: N-Tap FIR filter architecture
Some eFPGAs pool the MACs directly, which is much faster than a programmable interconnect. Implementing an N-Tap FIR filter will show the difference between an eFPGA with MAC-to-MAC pipelining and an eFPGA without pipelining. Above is an example of an N-Tap FIR filter implemented using a pipelined DSP MAC.
Step 4: Use the RTL benchmarking area you need
As with performance, be very careful when trying to benchmark the relative area of the RTL of different eFPGAs. Some eFPGA vendors enable you to easily generate dozens of different array sizes, but others may only offer two sizes for your benchmarks.
The first step is to look at the LUT count (or MAC count). However, different eFPGA vendors may have different LUT sizes. It might not fill up in a lookup table, so if you have two flip-flops going into one NAND gate and then into the other, any size lookup table will implement a NAND gate.
Some eFPGAs have two flip-flops at the output, which allows the N-input LUT to be decomposed into two smaller LUTs that share the N-input LUT and some subset of the inputs. This feature improves area utilization.
Even if you’re benchmarking N-LUT eFPGAs from two vendors — and your design uses half of the two LUTs and both have the same area — you can’t conclude that they’re equally good. What you need to determine is whether the utilization of the eFPGA LUT is achievable. General eFPGA utilization is 60-70%, but some eFPGA utilization can reach 90%. The only way to find it is to make the RTL almost fill the LUT of the eFPGA.
Another way to gain a sense of use is to look at a visualization of the placement. In the example below, the LUTs are clearly grouped very closely together (the shaded blocks are the LUTs used in the design), which is a good visual indicator of high utilization.
However, even here you have to be careful. If, in the design above, the inputs and outputs were evenly distributed on the sides of the eFPGA array, this would have the effect of spreading out the LUTs more evenly as the location/path software minimizes the critical paths.
So when using this visual inspection, try to group the inputs and outputs into one corner of the eFPGA so that the location/route software can put the LUTs together to minimize the critical path.
Step 5: Benchmark Input and Output Capacity
Some eFPGA-based applications require a large number of inputs and outputs. For example, the bus of a network chip can be 512 bits wide (sometimes even thousands of bits wide). You need to look at the input and output counts available for each K-LUT to see if it’s in the range for your needs.
eFPGAs are exciting new tools that allow SoC architects to make their chips more flexible and reconfigurable.
Using the guidelines above, you’ll be able to more quickly find the eFPGA that best fits your unique application, specific needs. If you choose the right solution, you will be able to realize the full potential of eFPGA.
The Links: LQ057Q3DG02 SKKT273/16E LCD-PART