Keyword – Balance!
Without designing the equalization function directly into the FPGA I/O fabric, connecting any device to a DDR3 SDRAM DIMM would be complex and costly, requiring a large number of external components, including delay lines and associated controls.
What is equilibrium? Why is it so important?
To improve signal integrity when supporting higher frequencies, the JEDEC committee defined a fly-by termination scheme that employs clock and command/address bus signals to improve signal integrity to support higher performance . The fly-by topology reduces concurrent switching noise (SNN) by deliberately inducing flight-time skew between clock and data/strobes on each DRAM as clocks and addresses/commands pass through the DIMMs ),As shown in Figure 1.
The time-of-flight offset can be as high as 0.8 tCK, and when the offset is spread wide enough, it will not be known in which of the two clock cycles the data is returned. Therefore, the equalization function allows the controller to compensate for this offset by adjusting the timing within each byte lane. The latest FPGAs can provide many functions for interfacing with double data rate SDRAM memory for a variety of applications. However, for use with the latest DDR3 SDRAM, a more robust equalization scheme is required.
FPGA I/O structure
High-performance FPGAs like Altera’s Stratix III family offer I/O speeds up to 400 MHz (800 Mbps), with high flexibility to support existing and emerging external memory standards such as DDR3.
Figure 1: DDR3 SDRAM DIMM: The time-of-flight skew reduces the SSN, and the data must be clocked up to two clock cycles by the controller.
During a read operation, the memory controller must compensate for the latency that affects the read cycle caused by the fly-through memory topology. Equalization can be thought of as a delay appearing on the data path that is greater than the delay of the I/O itself. Each DQS requires an independent phase shift (process, Voltage, and temperature (PVT) compensated) of the synchronous clock position. Figure 2 shows two DQS groups returned from a DIMM under the same read command.
Figure 2: 1T, falling edge and equalization registers in the I/O cell.
Initially, each separate DQS is phase-shifted by 90 degrees, capturing the DQ data associated with that group. A free-running resynchronized clock (same frequency and phase as DQS) is then used to transfer data from the capture domain into the equalization circuit shown in Figure 2 with the pink and orange lines. At this stage, each DQS group has an independent resynchronized clock.
Then the DQ data is sent to the 1T register. An example of a 1T register is given in Figure 2, and this 1T register needs to be used to delay the DQ data bits in a specific DQS group in the upper channel. Note that in this example, the 1T register is not required for the lower channel. This process begins to align the upper and lower channels. Whether any given channel requires a 1T register is automatically determined as part of the calibration scheme in the free physical layer IP core.
Then two DQS groups are transferred to the falling edge register. Optional registers can also be toggled in and out via an auto-calibration process at startup, if desired. The final step is to align the upper and lower layers to the same resynchronized clock, thus forming a source synchronous interface that delivers fully aligned or equalized single data rate (SDR) data to the FPGA fabric.
Similar to read equalization, but in the opposite direction, the DQS group is signaled at different times to coincide with the clock arriving at the device on the DIMM and must meet the +/- 0.25 tCK required by the tDQSS parameter. The controller must adjust the relationship between DQS and CK by creating a feedback loop, during which the controller writes data into the DRAM and scans it back through sequential phases until it finds the end of the write window. For better build and hold margins, data should be emitted at the midpoint of the good window.
Other FPGA I/O Capability Innovations
High-performance Stratix III FPGAs also have many innovative other I/O features that enable simple and robust connections to a variety of memory interfaces, including dynamic on-chip termination (OCT), variable I/O /O delay and half data rate (HDR), etc.
Parallel and serial OCT provide proper line termination and impedance matching for read and write buses, so no external resistors are required around the FPGA, which reduces external component cost, saves board area, and reduces wiring complexity. In addition, it greatly reduces power consumption because the parallel termination can be effectively bypassed during write operations.
Variable delay for DQ deskew
Length mismatch and electrical deskew are tracked with variable input and output delays. Fine input and output delay resolution (i.e. 50 picosecond steps) is used for finer inter-DQS deskew (independent of the equalization function), which is caused by board length mismatch or FPGA and memory It is caused by the change of the I/O cache on the device, as shown in Table 1. Ultimately, this increases the capture margin per DQS group.
Table 1: Resolution and absolute value pending characteristics.
In order to incorporate the DDR3 automatic deskewing algorithm as part of the startup calibration process, a delay cell needs to be implemented from the FPGA fabric at runtime. Output delays can also be used to intentionally reduce the number of I/Os that are switched at the same time by inserting a small offset in the output channels.
The DQS signal is used as an input strobe, which must be shifted to an optimal position to capture a read transaction. The phase-shift circuit can phase-shift the input DQS signal by 0°, 22.5°, 30°, 36°, 45°, 60°, 67.5°, 72°, 90°, 108°, 120°, 135°, 144° or 180°, depending on the frequency mode of the DLL. The phase-shifted DQS signal is then used to clock the various input registers of the I/O cell.
A delay locked loop (DLL) maintains the phase at a fixed position over the entire PVT range. The phase comparator of the DLL module is used to keep the phase difference between the two inputs at zero. This is achieved by uniformly correcting specific delays in the DLL module (10-16). Control signals used to update a delay block in the DLL are also sent to the delay block in the DQS input path. For example, a 90° phase shift can be achieved using all 16 delay elements in the DLL and the 4th delay tap in the DQS phase shift input path:
Or choose 10 delay elements in the DLL and tap 4 in the DQS phase shift input path to achieve a 36° phase shift:
The DLL uses a frequency reference to dynamically generate control signals for the delay chain in each DQS pin and allows it to compensate for PVT changes. There are four DLLs in Stratix III FPGAs, all of which are located at the corners of the device, so that each DLL can cover both sides of the device, allowing multiple DDR3 SDRAM memory interfaces to be supported on all sides of the device.
Across high-speed data rate domains and design simplification
The DDR capture register and HDR register support safe transfer of data from the double data rate domain (data on both edges of the clock) to the SDR domain (data on the rising edge of the same frequency clock, but double the data width), to the HDR domain ( The data is on the rising edge of the clock, the data width is still doubled, but the clock frequency is only half of the SDR domain), which makes the internal design timing easier to achieve.
Die, Package and Digital Signal Integrity Improvements
The FPGA die and package must be designed to provide better signal integrity for high-performance memory interfaces (i.e. an 8:1:1 ratio of user I/O to ground and power, and an optimal signal return path such as shown in Figure 3). In addition, the FPGA should provide dynamic OCT and variable skew rates to be able to control the rise and fall times of the signals and programmable drive capability to meet the requirements of the standard used (ie SSTL 1.5 Class II).
Figure 3: Eight user I/Os connected to each power and ground.
Summary of this article
High-performance Stratix III FPGAs can complement high-performance DDR3 SDRAM DIMMs by providing high memory bandwidth, improved timing margins, and flexibility in system design. Since DDR3 will soon surpass DDR2 in practical use, high-end FPGAs that offer lower cost, higher performance, higher density, and superior signal integrity must provide JEDEC-compliant read and write equalization for compatibility with high-performance FPGAs. DDR3 SDRAM DIMMs are connected. The organic integration of FPGA and DDR3 SDRAM will be able to meet the requirements of current and next-generation communication, networking and digital signal processing systems.