In an interesting development for the GPU industry, PCIe-attached memory is set to change how we think about GPU memory capacity and performance. Panmnesia, a company backed by South Korea’s KAIST research institute, is working on a technology called Compute Express Link, or CXL, that allows GPUs to utilize external memory resources via the PCIe interface.
Traditionally, GPUs like the RTX 4060 are limited by their onboard VRAM, which can bottleneck performance in memory-intensive tasks such as AI training, data analytics, and high-resolution gaming. CXL leverages the high-speed PCIe connection to attach external memory modules directly to the GPU.
This method provides a low-latency memory expansion option, with performance metrics showing significant improvements over traditional methods. According to reports, the new technology manages to achieve double-digit nanosecond latency, which is a substantial reduction compared to standard SSD-based solutions.
Moreover, this technology isn’t limited to just traditional RAM. SSDs can also be used to expand GPU memory, offering a versatile and scalable solution. This capability allows for the creation of hybrid memory systems that combine the speed of RAM with the capacity of SSDs, further enhancing performance and efficiency.
While CXL operates on a PCIe link, integrating this technology with GPUs isn’t straightforward. GPUs lack the necessary CXL logic fabric and subsystems to support DRAM or SSD endpoints. Therefore, simply adding a CXL controller is not feasible.
GPU cache and memory systems only recognize expansions through Unified Virtual Memory (UVM). However, tests done by Panmnesia revealed that UVM had the poorest performance among tested GPU kernels due to overhead from host runtime intervention during page faults and inefficient data transfers at the page level.
To address the issue, Panmnesia developed a series of hardware layers that support all key CXL protocols, consolidated into a unified controller. This CXL 3.1-compliant root complex includes multiple root ports for external memory over PCIe and a host bridge with a host-managed device memory decoder. This decoder connects to the GPU’s system bus and manages the system memory, providing direct access to expanded storage via load/store instructions, effectively eliminating UVM’s issues.
The implications of this technology are far-reaching. For AI and machine learning, the ability to add more memory means handling larger datasets more efficiently, accelerating training times, and improving model accuracy. In gaming, developers can push the boundaries of graphical fidelity and complexity without being constrained by VRAM limitations.
For data centers and cloud computing environments, Panmnesia’s CXL technology provides a cost-effective way to upgrade existing infrastructure. By attaching additional memory through PCIe, data centers can enhance their computational power without requiring extensive hardware overhauls.
Despite its potential, Panmnesia faces a big challenge in gaining industrywide adoption. The best graphics cards from AMD and Nvidia don’t support CLX, and they may never support it. There’s also a high possibility that industry players might develop their own PCIe-attached memory technologies for GPUs. Nonetheless, Panmnesia’s innovation represents a step forward in addressing GPU memory bottlenecks, with the potential to impact high-performance computing and gaming significantly.
Related Posts
Your first Windows on ARM gaming laptop may be a Lenovo Legion
The clue comes from how Lenovo's naming appears to tag the platform inside each system. Qualcomm is marked with a Q, while NVIDIA shows up under two labels, N1 and N1X. A dataminer, Huang514613, surfaced a cluster of Lenovo models where those tags appear across multiple families, not just one device.
New study shows AI isn’t ready for office work
A reality check for the "replacement" theory
Google Research suggests AI models like DeepSeek exhibit collective intelligence patterns
The paper, published on arXiv with the evocative title Reasoning Models Generate Societies of Thought, posits that these models don't merely compute; they implicitly simulate a "multi-agent" interaction. Imagine a boardroom full of experts tossing ideas around, challenging each other's assumptions, and looking at a problem from different angles before finally agreeing on the best answer. That is essentially what is happening inside the code. The researchers found that these models exhibit "perspective diversity," meaning they generate conflicting viewpoints and work to resolve them internally, much like a team of colleagues debating a strategy to find the best path forward.