ChatGPT, a chatbot developed by OpenAI, has gained global attention due to its impressive performance, with giants such as Microsoft and Google investing heavily in similar technologies. Baidu and other Chinese internet companies are also developing such technologies and plan to release them soon.
GPT, short for Generative Pre-trained Transformer, is a natural language processing model that is pre-trained and iteratively improved. Large language models, like those generated by GPT, can produce high-quality outputs, making them useful for search, chatbots, image generation, and editing, among others.
Microsoft has incorporated ChatGPT technology into its search engine, Bing, bringing innovation to search through conversational AI that accurately understands human questioning intent and provides answers. In addition to text, large pre-trained models can generate images, and AI artists that are indistinguishable from humans have emerged multiple times in recent months.
As the number of ChatGPT users grows rapidly, the demand for computing power and related chips will increase. The technology behind it heavily relies on AI processing capabilities, which involve networks, storage, and computing infrastructure.
GPUs are the mainstay of AI computing.
NVIDIA, which provides GPU chips to OpenAI, is benefiting most from ChatGPT. When AI models generate content through algorithms, large amounts of data are required for training, which is performed on GPU clusters. The trained models are then deployed on GPU clusters for inference, completing tasks such as image generation and chat.
“AI algorithms are constantly improving and iterating, and GPUs strike a balance between flexibility and computing power,” said Li Feng, Dean of Moore Threads College. Compared to specialized AI chips, GPUs are more flexible and can guarantee AI computing performance, making them popular among developers. He mentioned that the main source of increased computing power worldwide comes from GPUs.
Currently, the GPT-3.5 model behind ChatGPT has more than 175 billion parameters, and the cost of training is expensive. According to a Guosen Securities report, the cost of training GPT-3 is about $1.4 million, while the cost of training larger language models ranges from $2 million to $12 million.
Based on ChatGPT’s average of 13 million independent visitors in January of this year, it would require more than 30,000 NVIDIA A100 GPU graphic processors, with an initial investment cost of approximately $800 million. In addition, according to Citigroup’s estimate, ChatGPT could potentially drive NVIDIA-related product sales to $3 billion to $11 billion in 12 months.
This means that applications like ChatGPT will drive demand for NVIDIA GPU chips. NVIDIA’s GPU graphics processors dominate the large AI model training market, and the company’s stock has risen 55% this year.
A head of a large IT company’s AI department told Interface News that NVIDIA is undoubtedly the champion in the training end, while other players are vying for the inference end, which has less workload and is more sensitive to power consumption and latency. Li Feng mentioned that Moore Threads has been testing internally, and the fully deployed AIGC platform on Moore Threads GPUs will be available soon. This platform includes a series of content generation platforms, such as image generation and natural language generation.
In addition to GPUs, other chip types involved in computing capability include CPUs, FPGAs, ASICs, and so on. Different combinations of computing chips can meet the computing needs of different AI models.
Specialized AI chips, such as ASICs (Application-Specific Integrated Circuits), are also expected to occupy a place in AI computing power in the future. Google previously released its self-developed TPU (Tensor Processing Unit) and iterated multiple times, which is a chip designed specifically for machine learning. According to data provided by Google, the computing efficiency of TPU is more than ten times that of past GPUs. Google has deployed TPU on its own cloud platform, and its future conversational AI service, Bard, will also run on TPU.
Niche chip HBM to the foreground
In a computing system, there must be architecture for storage, networking, and other components that match the computing process. Whether it’s CPUs, GPUs, or other specialized chips, they will inevitably be interrupted by storage, communication, and other processes during computation, requiring industry participants to come up with corresponding solutions.
In the ChatGPT craze, a niche storage chip has gained recognition with the increasing demand for AI computing. According to the Korean Economic Daily, HBM chips from Samsung and SK Hynix have won additional orders due to the surge in demand for GPUs fueled by ChatGPT, unexpectedly becoming a popular type of storage chip in the market.
HBM (High Bandwidth Memory) is a type of memory chip that can achieve high bandwidth. Compared with ordinary DRAM memory, HBM can provide higher data transfer speeds. Based on this characteristic, HBM is mainly used in high-performance computing scenarios, such as supercomputers, AI accelerators, and high-performance server fields.
When working in conjunction with CPUs and GPUs, HBM can improve machine learning and computing performance. Currently, the rapid development of ChatGPT has benefited GPU manufacturers such as NVIDIA. ChatGPT used over 10,000 NVIDIA A100 GPUs to learn a massive amount of document data. HBM can be installed in an accelerator card, and the NVIDIA A100 can be equipped with up to 80GB of HBM2 memory.
“HBM has always been difficult to sell because the price is three times that of DRAM, but AI is the killer application for HBM,” said Dan Nystedt, Vice President of TriOrient Investments. Due to its high cost, HBM has been difficult to scale up and promote in the market. However, it is expected that AI applications will further expand the market size.
Currently, the HBM demand driven by ChatGPT has caught the attention of upstream manufacturers. SK Hynix mentioned that they have already developed the fourth-generation HBM product and supplied it to NVIDIA last year. Samsung Semiconductor told Interface News that interactive AI learning and reasoning based on AI technology require high-performance processors and supporting high-performance memory combinations, which will positively affect the demand for memory.
Samsung Semiconductor has announced that it has made progress in combining AI processors with HBM-PIM (Process in Memory) technology on storage chips for AI applications. Samsung is planning to work with customers to build a PIM platform ecosystem.
According to market research firm Omdia, HBM’s total revenue in the market is predicted to reach $2.5 billion by 2025. This number is expected to increase significantly as AI computing demand grows.
In the long term, HBM, in conjunction with new data transfer protocols such as CXL, will continue to enhance AI computing performance and gain support from industry giants. According to consulting firm Jibang Consulting, CXL will become more popular as future CPUs integrate CXL functionality, and more joint design solutions using HBM and CXL are expected to be seen in future AI servers.
Distributed computing calls for DPU
The parameter count of ChatGPT is on the order of billions, making it impossible to train or perform inference effectively on a single machine, thus requiring the use of distributed computing. In distributed computing, the bandwidth between machines and efficient computing chips becomes crucial, as data interconnects are often the bottleneck. At the data center level, the industry expects the DPU to be viewed as the “third chip” of the data center to solve such problems.
“ChatGPT and other language generation models have parameter counts up to hundreds of billions, which is almost impossible to use for single machine training and inference, and must rely heavily on distributed computing.” The person in charge of DPU development at Yunmai Xilinx told Jiemian News that in distributed computing, the DPU needs to perform data processing and preprocessing, and distribute tasks to CPUs, GPUs, and FPGAs for computation.
DPU, which stands for Data Processing Unit, is used to process massive amounts of data that flow back and forth between multiple servers in cloud data centers. Cloud vendors transform network cards into DPUs to alleviate CPU burdens and allow them to focus on more critical tasks, similar to how a company’s front desk can alleviate employees’ workload.
In addition to GPUs, HBMs, and DPUs, the industry also expects chip heterogeneity technology supported by small chiplet technology to support computational growth. By modularizing the abilities of different chips and utilizing a new design, interconnect, and packaging technologies, chiplet can be used in a packaged product from different technologies, processes, and even factories. From the perspective of the semiconductor industry chain, IP authorization, wafer foundry, and packaging and testing vendors provide critical technical support, which can be considered important infrastructure for improving AI computational power. Upstream and downstream vendors such as Imagination, Arm, TSMC, and UMC are expected to benefit from this trend.