Some PCIe ECNs

SR-IOV defines mechanisms for a system’s endpoints and its CPU to allow sharing of its resources

1, FLR: capability used in IOV
Function Level Reset Support (FLR)
The integrated GbE controller supports FLR capability. FLR capability can be used in
conjunction with Intel® Virtualization Technology. FLR allows an operating system in a
Virtual Machine to have complete control over a device, including its initialization,
without interfering with the rest of the platform. The device provides a software
interface that enables the operating system to reset the entire device as if a PCI reset
was asserted.

2,ARI: Alternatively Routing ID, capability used in IOV
ARI extends the capabilities of a PCIe endpoint by increasing the number of available device functions from eight up to 256 by using the Bus and Device bit fields from the requester ID. A system needing to support ARI requires all devices in the PCIe chain (CPU, PCIe Switch, endpoint) to support ARI.

Consequently, the PCIe switch between the CPU and the endpoint needs to be able to decode and route packets accordingly. Without ARI, a virtual system cannot take advantage of the additional functions enabled in the PCIe endpoint. In a virtualized system, 16 function are typically available with some endpoints implementing as many as 256.

 

For better power management:

3, Latency Tolerance Reporting(LTR)

The first ECNs added to the specification were focused on tackling the overall power management and reducing the active power consumption in a system. When trying to implement an overall power management strategy, designers typically shut down components when they are not in use. For example, a tablet that uses WiFi for connectivity consumes more power when the WiFi radio is on and connected to the network. However, if the tablet doesn’t need to transmit or receive data for a period of time, the WiFi radio can be turn off to save battery life. The key to implementing this power saving strategy is to know how long it takes for the radio to wake back up. Without a mechanism to know how long to wait, the software has to guess how much latency is acceptable for the device–guessing incorrectly can result in performance issues or hardware failures. Consequently, platform power management is often too conservative or not implemented at all, resulting in devices that use more power than necessary.

Obviously, power management has to be done at the system level. This requires a mechanism to tune the power management based on the actual device requirements and adjust the dynamic power usage verses performance. The solution is to have each device in the system report its latency requirements to the host. Devices that utilize PCIe for connectivity, a PCIe endpoint, can utilize the Latency Tolerance Reporting (LTR) mechanism that has been incorporated into the PCIe specification. LTR enables a device to communicate the necessary information back to the host by using a PCIe message to report its required service latency. The LTR values are used by the platform (the tablet in our example) to implement an overall power management strategy that will extend the battery life of the tablet, while giving optimal performance.

4, Optimized Buffer Flush/Fill

Another ECN added to the PCIe specification to improve the overall power management is Optimized Buffer Flush/Fill or OBFF. As a system operates, each of its devices does not know the power state of each of the resources in the system. Without coordination, each of the devices will go in and out of their low-power states as necessary to execute the tasks they are assigned to do.  This “asynchronous” behavior prevents the optimal power management of the CPU, host memory sub-system and other devices, because the intermittent traffic keeps the system permanently awake and unable to optimize power management across the system.

Figure 1: Asynchronous behavior prevents optimal power management

As part of an implementation of a system-level power strategy, the idle time and low-power states of the devices must be optimized to enable them to stay in their low-power states longer. Basically, a host can provide information to all devices by broadcasting a message about the system power state. The devices can utilize this information to group a load of requests, wait until the system wakes up, and burst out all of the requests at the same time. By doing this, the device is a good citizen and does not wake up a sleeping CPU and/or the system memory sub-system. Waiting creates extended periods of system inactivity which saves overall system power (as shown in Figure 2). In other words, the host utilizes the OBFF ECN to give devices a “hint” so they can optimize their behavior, which improves power management at the system level.

Figure 2: Coordinated idle time extends system inactivity, reducing power consumption

5, L1 – substates

PCI-SIG and the member body continue to make changes to improve the ability to implement power management strategies across a system. However, what about the power that is consumed while your tablet or Ultrabook is in the suspend state? Pulling your tablet or laptop out of your bag during a long flight, only to find that it consumed all of the battery power while it was in standby mode, is one of a business traveler’s nightmares. This experience is a lesson in how non-optimized systems consume a surprising amount of power while in the standby state. PCIe’s L1 low-power state is just not enough, as the idle power consumed by PCIe-based devices does not meet the emerging thin and light form factor requirements, which require 8 to 10 hours of use time and a seemingly infinite amount of standby time. Of course, this has to be done with minimal added costs while maintaining backwards compatibility.

As shown in Figure 3, a PCIe link is a serial link that directly connects two components, such as a host and a device. Ignoring the state of the host or the device for this discussion, the PCIe link is defined to save power when the controlling link state machine (LTSSM) is in the L1 state. However, the PCIe interface has both analog and digital circuits and the L1 state doesn’t turn off all the analog circuits in the PHY. The Receiver Electrical Idle detector and the transmit common-mode voltage driver continue drawing power. The result is that each lane of the link can consume 10 to 25mW per lane while in standby…quietly draining the device’s battery.

Figure 3: L1 sub-states ECN reduces the power consumed by the link

Designers using the current low-power states of the PCIe specification can utilize the L1 state to reduce power consumption. The traditional L1 state allows the reference clock to be disabled on entry to L1, which is controlled by a configuration bit written to by software. However, the PCIe link still consumes too much power due to leakage, the transmit common-mode voltage circuit, and the Receiver Electrical Idle detector circuitry. The result for the end product user is drained batteries and unmet governmental regulations. To avoid these issues, the PCIe link must reduce its link idle power to approximately 10% of the active power, or in the range of 10s of microwatts.

The PCI-SIG community has just approved an enhancement to the L1 state called L1 sub-states. The L1 sub-states ECN adds two “pseudo sub-states,” called L1.1 and L1.2 to the LTSSM, which can be used to turn off additional analog circuits in the PHY. L1.1 allows the common-mode voltage to be maintained, while L1.2 allows all high speed circuits to be turned off. To use L1.2, L1 sub-states also require the LTR ECN to be supported by the PCIe interface. The logical view of the LTSSM with the new L1 sub-states is shown in Figure 4.

Figure 4: Relationship of logical L1.1 and L1.2 modes to L1 state specification

Designers need to be aware of a few challenges that implementing the new L1.1 and L1.2 lower power sub-states may present. For example, L1 sub-states may require additional pins if the reference clock generator is off-chip and redefines the CLKREQ# signal to be bidirectional to allow handshaking with the system reference clock controller… Not all form factors support CLKREQ# (which is only defined in the mini-CEM card specification)-form factors that do not have CLKREQ# defined will need to use an in-band mechanism when it becomes available. This L1 sub-state solution is an out-of-band solution since it doesn’t use the differential signals of the PCIe link and there are additional discussions in place to provide an in-band solution utilizing the existing differential signals. The implementation of L1 sub-states also requires some silicon modifications to gate the power of the PCIe analog circuits and logic while retaining the port state. Of course, any modifications to support L1 sub-states must still support the default L1 legacy operation and the new features are enabled via system firmware during the driver’s discovery process of the link capabilities.

Table 1 shows the low-power solutions available with the existing L1 state compared to using L1 sub-states. It is expected that the power savings scale linearly for multi-lane links and implementing the L1 sub-states feature reduces power consumption at the increase of the L1 exit latency. Implementing L1 sub-states is key to reducing power consumption for mobile designs using PCI Express.

Table 1: Comparison of proposed solutions