Anonymous
Not logged in
Talk
Contributions
Create account
Log in
RS-485
Search
Editing
Coherent Accelerator Processor Interface
(section)
From RS-485
Namespaces
Page
Discussion
More
More
Page actions
Read
Edit
Edit source
History
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Implementation == === CAPI === CAPI is implemented as a functional unit inside the CPU, called the Coherent Accelerator Processor Proxy (CAPP) with a corresponding unit on the accelerator called the Power Service Layer (PSL). The CAPP and PSL units acts like a cache directory so the attached device and the CPU can share the same coherent memory space, and the accelerator becomes an Accelerator Function Unit (AFU), a peer to other functional units integrated in the CPU.<ref>[https://www-304.ibm.com/webapp/set2/sas/f/capi/CAPI_POWER8.pdf Coherent Accelerator Processor Interface (CAPI) for POWER8 Systems β White Paper]</ref><ref name="RAWkeynote">[http://raw.necst.it/2016/RAW-keynote-Hofstee-final.pdf Reconfigurable Accelerators for Big Data and Cloud β RAW 2016]</ref> Since the CPU and AFU share the same memory space, low latency and high speeds can be achieved since the CPU doesn't have to do memory translations and memory shuffling between the CPU's main memory and the accelerator's memory spaces. An application can make use of the accelerator without specific device drivers as everything is enabled by a general CAPI kernel extension in the host operating system. The CPU and PSL can read and write directly to each other's memories and registers, as demanded by the application. ==== CAPI ==== CAPI is layered on top of [[PCI Express#PCI Express 3.0|PCIe Gen 3]], using 16 PCIe lanes, and is an additional functionality for the PCIe slots on CAPI enabled systems. Usually there are designated CAPI enabled PCIe slots on such machines. Since there is only one CAPP per POWER8 processor the number of possible CAPI units are determined by the number of POWER8 processors, regardless of how many PCIe slots there are. In certain POWER8 systems, IBM makes use of dual chip modules, thus doubling the CAPI capacity per processor socket. Traditional transactions between a PCIe device and a CPU can take around 20,000 operations, whereas a CAPI attached device will only use around 500, significantly reducing latency, and effectively increasing bandwidth due to decreased operations overhead.<ref name="RAWkeynote" /> The total bandwidth of a CAPI port is determined by the underlying PCIe 3.0 x16 technology, peaking at ca 16 GB/s, bidirectional.<ref name="nextplatform-capi">[http://www.nextplatform.com/2016/10/17/opening-server-bus-coherent-acceleration/ Opening Up The Server Bus For Coherent Acceleration]</ref> ==== CAPI 2 ==== CAPI-2 is an incremental evolution of the technology introduced with IBM POWER9 processor.<ref name="nextplatform-capi"/> It runs on top of PCIe Gen 4 that effectively doubles the performance to 32 GB/s. It also introduces some new features like support for DMA and Atomics from the accelerator. === OpenCAPI === The technology behind OpenCAPI is governed by the ''OpenCAPI Consortium'', founded in October 2016 by [[AMD]], [[Google]], [[IBM]], [[Mellanox]] and [[Micron Technology|Micron]] together with partners [[Nvidia]], [[Hewlett Packard Enterprise]], [[Dell EMC]] and [[Xilinx]].<ref>[http://opencapi.org/2016/10/tech-leaders-unite-to-enable-new-cloud-datacenter-server-designs-for-big-data-machine-learning-analytics-and-other-emerging-workloads/ Tech Leaders Unite to Enable New Cloud Datacenter Server Designs for Big Data, Machine Learning, Analytics, and other Emerging Workloads]</ref> ==== OpenCAPI 3 ==== OpenCAPI, formerly ''New CAPI'' or ''CAPI 3.0'', is not layered on top of PCIe and will therefore not use PCIe slots. In IBM's CPU [[POWER9]] it will use the ''Bluelink 25G'' I/O facility that it shares with [[NVLink|NVLink 2.0]], peaking at 50 GB/s.<ref>[http://www.nextplatform.com/2016/08/24/big-blue-aims-sky-power9/ Big Blue Aims For The Sky With Power9]</ref> OpenCAPI doesn't need the PSL unit (required for CAPI 1 and 2) in the accelerator, as it's not layered on top of PCIe but uses its own transaction protocol.<ref>[https://www.hpcwire.com/2016/10/14/opencapi-takes-on-pcie/ OpenCAPI Takes on PCIe, Vows 10X Improvement]</ref> ==== OpenCAPI 4 ==== Planned for future chip after the General Availability of POWER9.<ref name="power9_webinar">{{cite web|last1=Stuecheli|first1=Jeff|title=Webinar POWER9|url=https://www.youtube.com/watch?v=eBvscMLVLEU#t=48m49s|publisher=AIX Virtual User Group|language=en|format=Video recording / slides|date=26 January 2017}} - [https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/56cea2a9-a574-4fbb-8b2c-675432367250/media/POWER9-VUG.pdf Slides] <sub>(PDF)</sub> - [https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Power%20Systems/page/AIX%20Virtual%20User%20Group%20-%20USA AIX VUG page] has links to slides and video</ref> ==== OMI ==== OpenCAPI Memory Interface (OMI) is a [[Serial communication#Serial versus parallel|serial attached]] [[Random-access memory|RAM]] technology based on OpenCAPI, providing [[Latency (engineering)|low latency]], [[Bandwidth (computing)|high bandwidth]] connection for main memory. OMI uses a controller chip on the memory modules that allows for technology agnostic approach to what is used on the modules, be it [[DDR4 SDRAM|DDR4]], [[DDR5 SDRAM|DDR5]], [[High Bandwidth Memory|HBM]] or storage class [[non-volatile random-access memory|non-volatile RAM]]. An OMI based CPU can therefore change RAM type by changing the memory modules. A serial connection uses less floorspace for the interface on the CPU die therefore potentially allowing more of them compared to using common DDR memory. OMI is implemented in IBM's [[Power10]] CPU, which has 8 OMI memory controllers on-chip, allowing for 4 TB RAM and 410 GB/s memory bandwidth per processor. These DDIMMs (Differential Dynamic Memory Module) includes an OMI controller and memory buffer, and can address individual memory chips for fault tolerance and redundancy purposes. [[Microchip Technology]] manufactures the OMI controller on the DDIMMs. Their SMC 1000 OpenCAPI memory is described as "the next progression in the market adopting serial attached memory."<ref>{{citation |author=Patrick Kennedy |url=https://www.servethehome.com/microchip-smc-1000-for-the-serial-attached-memory-future/ |date=August 5, 2019 |title=Microchip SMC 1000 For The Serial Attached Memory Future |publisher=Servethehome}}</ref>
Summary:
Please note that all contributions to RS-485 may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
RS-485:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Wiki tools
Wiki tools
Special pages
Page tools
Page tools
User page tools
More
What links here
Related changes
Page information
Page logs