With the gradual landing of artificial intelligence, the products of startup AI chip companies have gradually entered the product stage. This article will introduce the relatively well-known foreign AI chip startups and compare them with related companies in China.

Cloud chip
Cloud AI chips are chips that perform artificial intelligence-related operations on the server side. This generation of artificial intelligence is actually based on deep learning neural networks. Deep neural networks first need to use a large amount of data for training operations, and then after training is complete, the neural network model can perform inference calculations based on the input data.

In cloud data centers, training and inference both require the use of AI chips to accelerate. Neural network training requires a large amount of data operations. In fact, it is the GPU's acceleration capability for neural networks that makes deep networks popular. Conversely, if there is no hardware-accelerated training, I am afraid that only the current generation of artificial intelligence will not rise. For companies holding a large amount of data, the training computing power of neural networks is actually part of the productivity, because whether the training of new neural network models can be completed quickly and the iterative training of existing neural network models directly affects whether artificial Intelligent deployment to emerging areas, or whether it can quickly improve the performance of neural network models to meet user needs. This is why Internet giants such as Google have to develop TPUs to speed up training. At the same time, in the field of artificial intelligence that requires massive computing power to complete training, such as natural language processing and speech recognition, training computing power has become an important threshold for entering the field. .

In addition to training in the cloud, the inference operations of neural network models also require a lot of computing power. As the neural network model is deployed to more and more services (including speech recognition, image segmentation and recognition, recommendation systems, etc.) and more and more users access these services, the inference part also needs dedicated acceleration.

Currently, GPUs are the mainstream of acceleration for cloud training and inference operations. However, the growth of GPU computing power cannot keep up with the increase in demand for neural network computing power by enterprise users. In addition, the high power consumption of GPUs is also an important challenge for cloud data centers. Therefore, cloud data centers have in fact always wanted a new generation. The acceleration chip was born.

Among foreign AI chip startups, companies that focus on cloud applications account for a considerable proportion. In foreign countries, the AI chips used in cloud servers have been recognized by the capital market as an important market, and capital is willing to invest larger capital to help excellent teams to make related chips.

Habana is a well-known AI chip startup focusing on cloud servers. Habana's chip covers cloud training and inference tasks. Its Goya series of chips are aimed at the inference market and can achieve 15453 ResNet-50 throughput; while its Gaudi chip faces the training market. The highlight is the efficient support for RDMA, so it can ResNet-50 was trained to process 1,650 images per second.

Here we can also see the difference between training chips and inference chips. The training chip needs to consider scalability, that is, whether it can achieve a nearly linear speed increase in the case of large-scale deployment. Therefore, in addition to the calculation part, the network communication part (including the software interface) is also critical. The inference chip is more straightforward, as long as it takes care of calculation and memory access.

Graphcore is another established AI chip company whose main product also includes cloud training and inference computing. Unlike Habana, which uses two chips for training and inference, Graphcore uses the same IPU to take care of cloud training and inference tasks. Unlike Habana's specialized architecture design for training and inference, Graphcore's design is a multi-core idea. There are more than a thousand cores on each IPU and up to 300MB of on-chip memory to perform computational graphs, so the difference between inference and training It's just that the calculation graph is different. Due to the large amount of on-chip memory, even for training such memory access calculations that are very frequent can meet the needs, but from the perspective of reasoning, so much on-chip memory seems to be a bit of a killer, and so on the chip Placing so much on-chip memory will have a greater impact on chip yield and significantly increase costs. In addition, another challenge is how to write corresponding compilers for so many cores. Recently, Graphcore started deployment in Microsoft Azure cloud services. According to reports, using Graphcore IPU to train the most popular natural language processing BERT model can save 20% power consumption while achieving leading performance, and is relatively inferior to GPU Can achieve 3 times the throughput.

Groq is a company created by the Google TPU team. According to the current information, Groq is mainly aimed at the cloud inference market. At the press conference a few days ago, Groq just released its chip architecture, which is said to support 1POS / s computing power. According to the information released by Groq, its chip architecture is called Tensor Stream. This architecture is relatively simple in hardware, removing all non-essential control logic. All control is controlled by a software compiler, which can save savings. The chip area is left to the computing unit, so a higher computing power per unit area can be achieved.

In fact, attempts to hand over everything to the compiler have been made more than a decade ago, and the architecture is called the Very Long Instruction Word (VLIW) architecture. Intel's Itanium series chips use the VLIW architecture, but have not succeeded because the VLIW compiler in the general computing field is too complicated. On the contrary, in the fields such as DSP, because the calculation is relatively regular, VLIW has obtained many applications. Today's neural network model calculation diagrams are indeed a lot more expensive than general calculations, so this kind of attempt to give all scheduling tasks to the compiler may make sense. However, it is interesting that most of the current AI chips used for inference have a simple on-chip control unit. In other words, the Tensor Stream idea controlled by the compiler proposed by Groq is already a routine operation. Therefore, we still have to wait and see what specific aspects of Groq can surpass other inference chips.

Unlike the cloud chip discussed earlier, Mythic uses an architecture based on in-memory computing. Mythic is a start-up company at the University of Michigan whose technology is based on the results of in-memory calculations performed by the University of Michigan research group. Currently in the cloud, an important issue is how to implement data movement. When the neural network model is large, the model data must be constantly moved back and forth between off-chip DRAM and the computing chip, so this data transmission process has become a bottleneck that limits performance in many cases. Mythic's approach is to move the calculations directly to flash memory, which avoids the performance bottleneck caused by data movement. Specific to the implementation of the calculation, Mythic uses analog calculation, that is, the data is converted into voltage, and the weight is equivalent to the variable resistor in the flash memory. After the calculation is completed, the analog-to-digital converter is used to convert the digital signal and further deal with. The advantage of this is that the energy efficiency ratio of analog calculation is much higher than that of digital calculation, but the bottleneck is that it is difficult to achieve higher accuracy calculations. At the same time, the conversion between analog and digital may become the bottleneck of final efficiency.
Terminal computing chip

Unlike cloud computing, terminal computing emphasizes ultra-low power consumption and energy efficiency.

Syntiant is a chip company that is accelerated by ultra-low-power neural networks. The current main target market is always-on wake-up and monitoring in the field of smart speakers. The average power consumption of Syntiant's NDP100 chip is claimed to be 150uW, so it can enter various IoT devices and greatly increase the battery life of IoT devices.

Greenwaves is another startup from France that works on terminal AI chips. Greenwaves' chip is a high-performance MCU based on RISC-V architecture, and also adds support for AI computing (supported by instruction set and dedicated AI accelerator). Greenwaves' architecture is derived from the open source project PULP (Low Power Parallel Computing) of the University of Technology Zurich and the University of Bologna. This project is based on the RISC-V architecture to optimize a large number of low power parallel computing. Commercialization. At present, Greenwaves' first-generation chip GAP8 has been officially launched, and it has received investment from Huami earlier this year.

Comparison with Chinese AI chip startups

We can see that among the famous foreign AI chip startups, there are more cloud chip companies than terminal chip companies. Relatively speaking, among domestic AI chip startups, there are many companies targeting the terminal market, and not many companies have a layout in the cloud chip (only Ebara, Cambrian, etc.), and startup chips that accelerate the cloud training There are even fewer companies.

Investigating the reasons, we believe that one reason is that capital preferences are different. In foreign countries, especially the Silicon Valley in the United States, the semiconductor investment industry has gone through decades, so capital is willing to place big bets on companies with strong team capabilities and high technological direction thresholds. Cloud chips, especially cloud training chips, are such a startup company. Cloud training chips must use the latest semiconductor processes and advanced packaging technologies, and they also need strong support for supporting software. Therefore, startup companies often need thousands of Tens or even hundreds of millions of dollars of financing can complete the product. But once you gain a foothold, it will be difficult for other competitors to re-enter the market. This kind of cloud chip company is difficult to do without capital support. In China, things such as cloud chips that require massive investment are often done by large companies rather than start-ups-for example, we see that Huawei and Alibaba have achieved world-leading results on cloud chips.

The opposite of cloud chips is terminal chips. Most AI chip startups we see in China are terminal chips. Such chips have lower design thresholds compared to the cloud, and from the global distribution of the semiconductor industry, China is also closer to the design and production of mobile phone and smart speaker terminal products. This also explains why there are so many startups of terminal AI chips in China.

Looking forward to the future, we believe that China's terminals and cloud AI chips will have considerable development and stand at the forefront of world technology.

Solemnly declare: The resources and information in this article come from the Internet and are completely free to share. It is for study and research use only. The copyright and copyright belong to the original author. If you do not want to be reprinted, please notify us to delete the reprinted information. This information is published for the purpose of disseminating more information and has nothing to do with the position of this website. This information (including but not limited to text, data, and charts) is not guaranteed to be accurate, authentic, and complete.