Memory, including DRAM (Dynamic Random Access Memory) and NAND (Flash Memory), has always been a crucial part of the semiconductor industry. The growth of the memory market has provided a new growth point for the semiconductor industry, driving its further development. Especially in recent years, with the continuous development and popularization of generative artificial intelligence (AI) technologies, such as ChatGPT, the memory market has also experienced growth, particularly in new memory technologies like HBM and QLC SSD.
It is well-known that computing power is a critical component in the AI era, but many people tend to overlook the important role storage plays in the AI lifecycle. This is especially true as generative AI becomes increasingly popular, leading to an exponential growth in unstructured data such as images, videos, and audio, naturally stimulating new storage demands. According to IDC research, it is expected that by 2028, the global data output will reach 394ZB, with AIGC (Artificial Intelligence Generated Content) data output being particularly prominent. By then, AI image and video generation will increase by 167 times.
On this basis, storage has gradually become a bottleneck in AI development. In recent years, in-memory computing has received widespread attention, but commercialization has always been a challenge. Recently, Samsung Electronics and SK Hynix have been collaborating to standardize LPDDR6-PIM memory products. This partnership aims to accelerate the standardization of low-power memory specifically designed for AI, which is expected to drive the commercialization of in-memory computing.
1. Storage-Compute Integration Accelerates AI Computation
It is widely recognized that algorithms, data, and computing power (chips) are the three core elements of artificial intelligence development, with chips being the foundation for determining whether AI can eventually be realized. With the ongoing deepening of research into large models like ChatGPT and GPT-4, model structures have become increasingly complex, and data and computation requirements have grown significantly. Meanwhile, with the gradual failure of Moore’s Law, the progress of chip manufacturing technology has slowed down. The conflict between the development of algorithms and hardware has become an important challenge in the AI field. Efficiently using complex algorithms to process massive amounts of data is a pressing issue.
The reason for this is that chips, as the foundation of AI, are facing a serious “Von Neumann architecture bottleneck.” In Von Neumann architecture, computation and memory are separate; the computing unit reads data from memory and stores it back after processing. Especially with the explosion of performance-demanding scenarios like AI, the shortcomings of the traditional Von Neumann architecture, such as power walls, performance walls, and memory walls, are becoming more apparent. At the same time, as device sizes approach physical limits, the path of improving chip performance through process advancements is gradually being hindered, creating a “process wall” issue for chip development.
To address these issues, both domestic and international academic and industrial communities have carried out numerous studies from multiple perspectives, including architecture, processes, and integration, exploring new generation chip technologies for the post-Moore era. For example, dataflow architecture chips achieve streaming computation, offering throughput far higher than that of Von Neumann architecture when processing AI-related large-scale data; reconfigurable chip technology allows hardware circuit structures to be defined by software, realizing high flexibility and energy-efficient computation; wafer-level chips expand chip area through advanced process technology to increase computing power; 3D chips use 3D integration packaging technology to stack multiple chips vertically to achieve high bandwidth and high computing power; and storage-compute integrated chips, through collaborative innovation across devices, architectures, circuits, and processes, achieve the integration of memory and computation, breaking through the Von Neumann architecture bottleneck at its core.
Based on the architectural characteristics of storage-compute integrated chips, they can significantly reduce data transportation overhead, breaking through the “memory wall” and “power wall.” At the same time, due to their large-scale parallel computing capabilities, they can achieve performance comparable to advanced processes even on relatively outdated technology nodes, alleviating the pressure from process miniaturization. Furthermore, storage-compute integration technology can easily combine with other technologies, such as reconfigurable chips, wafer-level chips, and 3D integration technologies. Therefore, storage-compute integrated chips are considered one of the most important chip technology directions in the post-Moore era.
There are three mainstream technological paths for storage-compute integration: near-memory computing (PNM), in-memory processing (PIM), and in-memory computation (CIM).
Near-memory computing has the advantage of reducing data movement and improving cache efficiency, making it suitable for applications that require large-scale parallel processing and memory bandwidth optimization. In-memory processing excels in data-intensive applications and energy efficiency optimization, making it suitable for applications requiring rapid data processing and reduced energy consumption. In-memory computation stands out for high parallelism in specific domains and customized hardware optimization, making it suitable for applications that require highly specialized and customized solutions.
2. Major Players’ In-Memory Computing Layouts
The concept of storage-compute integration dates back to 1969 when Kautz et al. at the Stanford Research Institute first integrated storage and logic, proposing the “logic-in-memory” approach. Subsequent researchers have conducted various studies on chip circuit structures, computing architectures, and system applications. However, due to the complexity of circuit design and process difficulties, most subsequent research essentially achieved “near-memory computing,” where data still needs to be read from memory and then computed nearby. Currently, the most typical industry solution involves shortening the distance between memory and processors through 3D packaging and high-bandwidth memory technologies, as well as improving data bandwidth. Near-memory computing technology is relatively mature and has already entered large-scale production. Leading semiconductor companies like AMD, Intel, Samsung, and SK Hynix have all released near-memory computing chips based on high-bandwidth memory (HBM) technology and 2.5D/3D packaging technology. For instance, Samsung’s latest HBM3 Icebolt technology adopts a near-memory computing architecture, achieving a processing speed of 6.4 Gbps and bandwidth up to 819 GB/s with 12 layers of 10nm DRAM stacking. However, in essence, near-memory computing is still a Von Neumann architecture where storage and computation remain separate.
In recent years, driven by big data applications, data volumes have grown exponentially. Researchers began considering giving memory some computational capabilities to reduce data movement, lower energy consumption in computer systems, and achieve integration of memory and computation. As a result, the “in-memory computation” architecture, which integrates storage and computation, has become a research hotspot in the industry. Starting in 2021, in-memory computation products have gradually been introduced, with international giants like Samsung, SK Hynix, TSMC, and companies like Mythic beginning trial production of in-memory computation chips.
In December 2021, Alibaba’s Damo Academy Computing Technology Laboratory successfully developed the world’s first DRAM-based 3D bonding stacked in-memory AI chip, claiming a performance improvement of over 10 times in specific AI scenarios and an energy efficiency ratio improvement of up to 300 times.
In 2021, Samsung showcased its in-memory computing chip based on HBM2-PIM technology, offering up to 1.2 TFLOPS of embedded computing power, enabling the memory chip to perform tasks typically handled by CPUs, GPUs, ASICs, or FPGAs. In 2022, Samsung even modified the AMD Instinct MI100 compute card by adding HBM-PIM chips to build a large-scale computing system. The system demonstrated a 2.5x performance increase and reduced power consumption to 1/2.67 of the original when training the T5 language model. Furthermore, to validate the MoE (Mixture of Experts) model, Samsung used 96 MI-100 GPUs equipped with HBM-PIM to construct an HBM-PIM cluster. In MoE models, compared to HBM, HBM-PIM GPUs showed a 100% performance improvement and a 300% increase in energy efficiency.
In 2023, at Hot Chips 2023, Samsung Electronics revealed the latest research results on HBM-PIM (High Bandwidth Memory with Processing-In-Memory) and LPDDR (Low Power Double Data Rate Dynamic Random Access Memory)-PIM. LPDDR-PIM combines mobile DRAM with PIM, enabling data processing and computation directly on mobile devices. Since it is designed for mobile devices, its bandwidth (102.4 GB/s) is relatively lower. However, it achieves a 72% reduction in power consumption. Samsung is heavily invested in PIM technology, aiming to surpass SK Hynix in AI applications.
Another storage giant, SK Hynix, is also making strides in this field. In 2022, SK Hynix announced its first PIM-based product: the GDDR6-AiM sample. GDDR6-AiM adds computational functionality to GDDR6 memory with a data transfer speed of 16Gbps. Compared to traditional DRAM, systems combining GDDR6-AiM with CPUs or GPUs can accelerate computation speed by up to 16 times in specific environments. With significantly improved performance, the GDDR6-AiM operating voltage is 1.25V, lower than the 1.35V of GDDR6. Moreover, the application of PIM technology reduces data transfers between chips and CPUs/GPUs, lowering their power consumption, making GDDR6-AiM successful in reducing power consumption by 80%.
Additionally, TSMC has also demonstrated exploration results on SRAM, ReRAM, PCM, STT-MRAM, and other devices for in-memory computing. US processor company Mythic launched the M1076 processor, using an analog in-memory computation approach with Flash as the storage medium. The chip achieved 25 TOPS of computing power with 3W power consumption in a 40nm process. In 2022, China’s Zhixin Technology launched the first mass-produced in-memory computing SOC chip, WTM2101, using an analog storage-compute model with Flash as the medium. It achieved 50 Gops of computing power with ultra-low power consumption of 5uA in a 40nm mature process, and it has been commercialized in smart wearable devices. In 2023, AfterMo Intelligent launched the Hongtu H30 chip, which adopts a digital storage-compute model with SRAM as the medium, achieving 256 TOPS of computing power and 35W power consumption. WTM2101 is also the first in-memory computing chip in the world to achieve mass production and commercialization at the million-unit level. The commercialization of in-memory computing is showing initial results, and more and more in-memory computing products are being introduced.
3. Standardization of PIM Technology is Needed to Promote Development
Although many companies have already laid out PIM technology, they have been stuck at the commercialization stage. One of the key reasons is that companies develop products based on their own standards, leading to differences in concepts and specifications, making it difficult for the industry to adopt a common standard.
Samsung Electronics and SK Hynix are collaborating to promote the standardization of LPDDR6-PIM memory. This partnership aims to accelerate the standardization of low-power memory specifically for artificial intelligence. The two companies have established an alliance to ensure that the next generation of memory will align with this trend. They are working with the Joint Electron Device Engineering Council (JEDEC) on standardization, discussing the specific specifications for each standard.
First, standardization can enhance compatibility and interoperability. By standardizing, PIM devices produced by different manufacturers can work seamlessly in the same system, reducing system failures or performance degradation due to compatibility issues. This helps facilitate the widespread adoption and popularity of the technology.
Second, standardization helps reduce costs. It can lower R&D costs and time since different device manufacturers can share and utilize existing standards, avoiding redundant development. Additionally, standardization can promote economies of scale, reducing production costs, making PIM technology more accessible and affordable.
The large-scale deployment of PIM chips is still not clear, but the arrival of this day is worth anticipating. The evolution of technology never stops, and market demand also continuously changes. When various conditions mature, it may be the time for storage-compute integrated chips to shine. Now, standardization is imminent, which means that the conditions are about to mature.
Related:
Disclaimer:
- This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
- This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
- Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.