Beyond GPU Hype: The Critical Role of CPUs in AI Success

Discover why top-notch CPUs are indispensable for AI platforms, complementing GPUs for unmatched performance and efficiency.
Beyond GPU Hype The Critical Role of CPUs in AI Success

Table of Contents

The fire of AIGC is not only the heat of the 100-mode war and the springing up of various AI applications.

More so, due to the explosive growth in the demand for arithmetic power and communication, the underlying specialized acceleration chips and the AI acceleration servers equipped with these chips have been brought into the public spotlight again.

According to statistics, the worldwide market size of AI servers has reached $21.1 billion in 2023, and IDC has also released a forecast saying:

  • It is expected to reach $31.79 billion in 2025, with a CAGR of 22.7% from 2023 to 2025.
  • The training and inference of AIGC large models require a large amount of high-performance arithmetic support, and the demand for AI servers will also increase.

And AI acceleration servers are different from ordinary servers, in the architecture is generally used in a heterogeneous way, and the number of GPUs is as much as can be allocated, which is one of the factors that cause the current GPUs to be difficult to find.

But you know what? Even in the era of big models GPU or various AI acceleration chip light becomes more dazzling, but for the AI infrastructure, the CPU is still essential to the existence of – at least a high-end AI acceleration server in every 8 GPUs need to be matched with 2 CPUs.

Not only that, due to the heterogeneous nature of AI-accelerated servers, there are a variety of other architectures on the market in addition to the CPU + GPU combination approach, such as:

  • CPU + FPGA
  • CPU + TPU
  • CPU + ASIC
  • CPU + Multiple Acceleration Cards

It is not difficult to see that even if the combination of AI-accelerated server architecture changes in all kinds of ways, the only thing that can not change is the CPU, and often has to be with the high-end kind.

So why is it like this?

CPUs in AI-accelerated servers

First of all, the CPU is equivalent to the human brain for AI-accelerated servers.

It can be responsible for the computing and control of the entire server and is the core component that directly affects the overall performance of the server.

The CPU processes instructions from the operating system and coordinates the work of various hardware components, including memory management, data flow control, and I/O operations.

Even in AI servers, where GPUs or other gas pedals are responsible for performing most of the compute-intensive tasks, the CPU is still indispensable because it ensures the stable operation of the entire system, efficient communication, and collaboration among components, and ultimately advances the smooth execution of tasks.

Second, CPUs also offer flexibility and generalization.

CPUs are generally designed as general-purpose processors, capable of performing various types of computing tasks.

While GPUs are more efficient in parallel processing, CPUs are more flexible in handling serialized tasks, executing complex logic, and running general-purpose applications.

A truly complete AI application platform needs to handle a range of closely related yet distinctive tasks, including data preprocessing, model training, inference, and post-processing, which may also, or even specifically, require the general-purpose processing power of the CPU.

Not only that, the CPU is also a key point for system startup and maintenance.

This is because the server startup process, system monitoring, troubleshooting, and maintenance operations all require the CPU to perform; without the CPU, these critical system-level tasks would not be possible.

And the CPU has even more advantages in software compatibility that have been accumulated over the years.

Most software and applications on the market are designed for CPUs, including operating systems, database management systems, and development tools, and AI-accelerated servers need to run these to support the development and deployment of AI applications.

As we have just said, AI acceleration servers are now heterogeneous, and the CPU can be used as a control node to manage the computational tasks of GPUs or other gas pedals, to realize efficient resource allocation and task scheduling.

Finally, there is the issue of cost.

While GPUs are very efficient in AI computation, CPUs or other specialized accelerator chips are still a cost-effective option, especially when dealing with tasks that are not suitable for GPUs or gas pedals. a combination of CPUs and them can provide a better balance of performance and cost.

It’s easy to see why CPUs are the only thing that can’t be left out of an AI-accelerated server.

The next question is what CPUs are being used by the mainstream server vendors.

Let’s take Wave, the No. 1 AI-accelerated server in China, as an example. According to the latest news, its NE5260G7 servers have been adapted to the fifth-generation Intel® Xeon® Scalable processors recently released by veteran chip giant Intel.

The reason why Wave is adapted to the latest high-end CPU can be understood as “high-end games need to be paired with high-end GPUs and CPUs”, and AI servers also need to be adapted to high-end hardware if they want to make breakthroughs in performance.

Specifically, compared to its predecessor, the fifth-generation Intel® Xeon® Scalable Processor excels in handling AI workloads, with a 21% increase in performance, especially in AI reasoning tasks, where the performance increase reaches 42%.

In addition, memory bandwidth was increased by 16 percent, and the fifth-generation Xeon® Scalable processors were able to increase overall performance by up to 21 percent for general computing tasks and achieve up to a 36 percent increase in performance per watt of power across multiple real-world customer workloads.

It is precisely because the “core” is so powerful that Wave’s servers have achieved an average of 21% performance improvement.

However, after all, AI is not exactly a pure model or large model acceleration, so the above CPU advantage is only a part of the ability, in each segmented application scenario, it has a greater role to play.

AI is not exactly a large language model

Even in AI servers equipped with GPUs or dedicated gas pedals, the role of the CPU goes far beyond mastering or serving the gas pedal.

Rather, it plays diverse roles throughout the entire lifecycle of the AI system, throughout the entire process from data acquisition, preprocessing, training, inference, application, and so on.

Let’s start with the most crucial AI model, especially the link of model inference.

Whether it’s the big language model that occupies the most hot topic now, the traditional deep learning model, or the AI for Science application formed by the convergence of scientific computing and artificial intelligence, CPUs, especially Intel® Xeon® Scalable Processors with built-in AI acceleration, have a good track record in reasoning applications.

For example, in the protein folding prediction boom started by AlphaFold2, with the third and fourth generation of Xeon® scalable processors continuously optimizing the end-to-end throughput capability, it can achieve a more cost-effective acceleration solution than the GPU, which directly lowers the entry threshold of AI for Science.

OCR, for example, has been given a new “soul” with the evolution of the built-in AI acceleration technology of the Xeon® Scalable processors, which not only increases the accuracy rate but also further reduces the response latency.

Not to mention the general-purpose big models represented by ChatGLM, as well as the industry-specific big model applications output by industry software or solution providers such as Weining and Huizhou, which all provide strong practical evidence to validate the strength of Extreme® in big model inference, as well as the superior cost compared to gas pedal chips, and the advantages of being easier to obtain, deploy, optimize, and use.

Take a look at our Most “In” AI section to refresh your knowledge.

Plus the whole process of AI heavily involves data processing.

AI applications in actual business often require a knowledge base containing a large amount of data behind them.

These data are stored by compressing a massive text corpus into the form of dense vectors and quickly finding the most relevant information to the query through efficient similarity search, which is also known as a vector database.

In this regard, the Intel® AVX-512 instruction set and Intel® AMX acceleration technology, which are optimized for vector and matrix computation, are useful in meeting the challenges of high concurrency and real-time computation of massive, multi-dimensional vector data.

Industry-renowned vector database developers such as Tencent Cloud and StarHub have chosen the fifth-generation Intel® Xeon® Scalable Processor as the underlying platform for hosting and acceleration.

Tencent Cloud VectorDB, in collaboration with Intel, has improved the vector retrieval efficiency of its vector database by about 2.3 times compared to the baseline group after optimizing the hardware and software aspects of the fifth-generation Xeon® platform, and again improved performance by about 5.8 times in a test scenario using Intel® AMX accelerated data format INT8.

Based on the fifth-generation Xeon® Scalable Processor, Starring Technology has launched the Transwarp Hippo distributed vector database solution, which achieves a generational performance improvement of about 2x and can effectively meet the storage and computation needs of massive, high-dimensional vectors in the era of large models.

The data-related aspects of the whole process of AI include more than just vector databases that can be used as external knowledge bases for large models. It also involves data preprocessing before model training, data scheduling during training, continuous optimization and maintenance after the model is launched, and discovery and processing of abnormal data.

As we all know, data is one of the three elements of AI, equivalent to the blood and raw materials of AI, and without quality data, the most advanced algorithms and models are also castles in the air. However, raw data is often uneven and needs to go through a series of processes such as data cleaning, conversion, feature engineering, and so on, before it can finally be used by AI systems.

These data processing tasks involve massive logical operations, as well as equally, if not more massive levels of memory operations, such as access and transfer, which require very high processing speeds and latency, and are therefore also usually undertaken by the CPUs that are closest to the system’s memory and more adept at general-purpose computing.

The fifth-generation Intel® Xeon® Scalable Processors take these needs into account, with several built-in gas pedals to support data processing, such as:

  • DSA Data Streaming Accelerator: Optimizes data replication and conversion operations to improve network and storage performance.
  • IAA In-Memory Analytics Accelerator: Improves analytics performance while offloading CPU kernel tasks to accelerate workloads such as database query throughput.
  • QAT Data Protection and Compression Accelerator (QuickAssist Technology): Significantly accelerates data compression, symmetric and asymmetric data encryption, and decryption, improving CPU efficiency and overall system performance.
  • DLB Dynamic Load Balancer to help prevent performance bottlenecks and enable low-latency control plane workloads.

These gas pedals are flexibly configured or supported in different segments of the 5th Generation Xeon® Scalable processors and can be enabled on-demand through Intel On Demand, allowing them to be adapted to the needs of different workloads.

Last but not least, better protection of data privacy, model, and application security, after all, all AI scenarios can not be sacrificed at the expense of security, and some AI application scenarios are particularly concerned about this, such as in the financial and healthcare industries.

For these industry scenarios, it is critical to be able to use hardware-level Trusted Execution Environment (TEE) technology based on CPU implementation to protect sensitive data and code from attacks.

For example, Ping An Technology has used Intel® Software Guard Extensions (Intel® SGX) to build a federated learning solution.

Ping An Technology uses Intel® SGX’s “enclave” memory region to securely perform model training locally without sharing raw data. SGX also supports secure multi-party computing protocols, such as homomorphic encryption and secure aggregation, for better privacy protection in federated learning.

Aliyun, on the other hand, introduced BigDL-LLM privacy protection solution based on the latest 5th generation Intel® Xeon® Scalable processor.

With the Intel® Trust Domain Extension (Intel® TDX) technology built into the new processor, it achieves better protection for distributed nodes or AI pipelines, enabling customers to use more data in AI applications without sacrificing data privacy, effectively exploring the value of data, and building a more efficient privacy-protected machine learning solution for customers. This enables customers to use more data in AI applications without sacrificing data privacy, effectively explore the value of data, build more efficient privacy-protected machine learning programs for customers, and help the widespread application of large models.

It is important to know that TEE-based federated learning or privacy-preserving machine learning technology is a great pedestal for AI to open and share multi-organization data in large-scale practice in the future.

Through this technology, data sharing and joint analysis can be realized between different organizations under the premise of ensuring data security and privacy, and richer and more comprehensive data support can be provided for the continuous development and evolution of AI.

Getting the whole process of AI acceleration right, the CPU can’t be a shortboard

So, let’s expand our horizons from pure model acceleration to more comprehensive, multi-dimensional, pipelined AI platform applications. It is not difficult to foresee that as such platform-level applications mature and move towards real-world combat, our expectations for small AI acceleration servers and large AI infrastructure are expanding and upgrading.

Focusing only on the AI model itself and the performance of GPUs and dedicated gas pedals will be more and more like a single point of thinking.

In the future, we must pay more attention to the collocation and collaborative work of multiple hardware and software in the entire AI platform, in which the CPU, as the main control, acceleration, and auxiliary multi-faceted, is crucial to make up for the shortcomings of the entire platform and improve the quality of the entire platform.

This may be the reason why high-end CPUs, represented by the fifth-generation Intel® Xeon® Scalable Processor, will still win a place in the AI server or infrastructure market under today’s technology wave.

After all, the role of high-end CPUs is not only to directly accelerate AI reasoning, but also to the overall performance of the entire AI platform or system to improve, but also to provide a more stable and secure operating environment to expand the boundaries of AI, only a few links are taken care of, to promote the AI Everywhere vision further towards reality.

Or in short, if AI is to truly move towards more practical scenarios, how can it be without a more powerful, more reliable, more comprehensive, and versatile CPU?

End-of-DiskMFR-blog

Disclaimer:

  • This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
  • This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
  • Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.
DiskMFR Field Sales Manager - Leo

It’s Leo Zhi. He was born on August 1987. Major in Electronic Engineering & Business English, He is an Enthusiastic professional, a responsible person, and computer hardware & software literate. Proficient in NAND flash products for more than 10 years, critical thinking skills, outstanding leadership, excellent Teamwork, and interpersonal skills.  Understanding customer technical queries and issues, providing initial analysis and solutions. If you have any queries, Please feel free to let me know, Thanks

Please let us know what you require, and you will get our reply within 24 hours.









    Our team will answer your inquiries within 24 hours.
    Your information will be kept strictly confidential.

    • Our team will answer your inquiries within 24 hours.
    • Your information will be kept strictly confidential.

    Let's Have A Chat

    Learn How We Served 100+ Global Device Brands with our Products & Get Free Sample!!!

    Email Popup Background 2