BLOG

Evangelos Eleftheriou | CTO at AXELERA AI Technology is progressing at an incredible pace and no technology is moving faster than Artificial Intelligence (AI). Indeed, we are on the cusp of an AI revolution which is already reshaping our lives. One can use AI technologies to automate or augment humans, with applications including autonomous driving, advances in sensory perception and the acceleration of scientific discovery using machine learning. In the past five years, AI has become synonymous with Deep Learning (DL), another area seeing fast and dramatic progress. We are at a point where Deep Neural Networks (DNNs) for image and speech recognition can provide accuracy on par or even better than that achieved by the human brain.Most of the fundamental algorithmic developments around DL go back decades. However, the recent success has stemmed from the availability of large amounts of data and immense computing power for training neural networks. From around 2010, the exponential increase of single-precision floating point operations offered by Graphic Processing Units (GPUs) ran in parallel to the explosion of neural network sizes and computational requirements. Specifically, the amount of compute used in the largest AI training has doubled every 3.5 months during the last decade. At the same time, the size of state-of-the-art models increased from 26M weights for ResNet-50 to 1.5B for GPT-2. This phenomenal increase in model size is reflected directly in the cost of training such complex models. For example, the cost of training the bidirectional transformer network BERT, for Natural Language Processing applications, is estimated at $61,000, whereas training XLNet, which outperformed BERT, costs about nine times as much. However, a major concern is not only the cost associated with the substantial energy consumption needed to train complex networks but also the significant environmental impact incurred in the form of CO2 emissions.As the world looks to reduce carbon emissions, there is an even greater need for higher performance with lower power consumption. This is true not only for AI applications in the data center, but also at the Edge, which is where we expect the next revolution to take place. AI at the Edge refers to processing of data where it is collected, as opposed to requiring data to be moved to separate processing centers. There is a wealth of applications at the edge: AI for mobile devices, including authentication, speech recognition, and mixed/augmented reality, AI for embedded processing for IoT devices, including smart cities and homes or embedded processing for prosthetics, wearables, and personalized healthcare, as well as AI for real-time video analytics for autonomous navigation and control. However, these embedded applications are all energy and memory constrained, meaning energy efficiency matters even more so at the Edge. The end of Moore’s and Dennard’s laws are compounding these challenges. Thus, there are compelling motivations to explore novel computing architectures with inspiration from the most efficient computer on the planet, the human brain. Traditional Computing Systems: Current State of PlayTraditional digital computing systems, based on the von Neumann architecture, consist of separate processing and memory units. Therefore, performing computations typically results in a significant amount of data being moved back and forth between the physically separated memory and processing units. This data movement costs latency and energy and creates an inherent performance bottleneck. The latency associated with the growing disparity between the speed of memory and processing units, commonly known as the memory wall, is one example of a crucial performance bottleneck for a variety of AI workloads. Similarly, the energy cost associated with shuttling data represents another key challenge for computing systems that are severely power limited due to cooling constraints as well as for the plethora of battery-operated mobile devices. In general, the energy cost of multiplying two numbers is orders of magnitude lower than that of accessing numbers from memory. Therefore, it is clear to AI developers that there is a need to explore novel computing architectures that provide better collocation of processing and memory subsystems. One suggested concept in this area is near-memory computing, which aims to reduce the physical distance and time needed to access memory. This approach heavily leverages recent advances made in die stacking and new technologies such as the high memory cube (HMC) and high bandwidth memory (HBM). In-Memory Computing: A Radical New ApproachIn-memory computing is a radically different approach to data processing, in which certain computational tasks are performed in place in the memory itself (Sebastian 2020). This is achieved by organizing the memory as a crossbar array and by exploiting the physical attributes of the memory devices. The peripheral circuitry and the control logic play a key role in creating what we call an in-memory computing (IMC) unit or computational memory unit (CMU). In addition to overcoming the latency and energy issues associated with data movement, in-memory computing has the potential to significantly improve the computational time complexity associated with certain computational tasks. This is primarily a result of the massive parallelism created by a dense array of millions of memory devices simultaneously performing computations.For instance, crossbar arrays of such memory devices can be used to store a matrix and perform matrix-vector multiplications (MVMs) at constant O(1) time complexity without intermediate movement of data. The efficient matrix-vector multiplication via in-memory computing is very attractive for training and inference of deep neural networks, particularly for inference applications at the Edge where high energy efficiency is critical. In fact, matrix-vector multiplications constitute 70-90% of all deep learning operations. Thus, applications requiring numerous AI components such as computer vision, natural language processing, reasoning and autonomous driving can explore this new technology in new and innovative ways. Novel dedicated hardware with massive on-chip memory, where part of it is enhanced with in-memory computation capabilities could lead to very efficient training and inference engines of ultra-large neural networks comprising of potentially billions of synaptic weights.The core technology of IMC is memory. In general, there are two classes of memory devices. The conventional one, in which information is stored in the presence or absence of charge, includes dynamic random-access memory (DRAM), static random-access memory (SRAM) and Flash memory. There is also an emerging class of memory devices, in which information is stored in terms of the atomic arrangements within nanoscale volumes of materials, as opposed to charge on a capacitor. Generally speaking, one atomic configuration corresponds to one logic state, and the other corresponds to another logic state. These differences in atomic configuration manifest as a change in resistance, and thus these devices are collectively called resistive memory devices or memristors. Traditional and emerging memory technologies can perform a range of in-memory logic and arithmetic operations. In addition, SRAM, Flash and all memristive memories can also be used for MVM operations.The most important characteristics of a memory device are its read and write times, that is how fast a device can store and retrieve information. Equally important characteristics are the cycling endurance, which refers to the number of times a memory device can be switched from one state to the other, the energy required to store information in a memory cell as well as the size of the memory cell. Table 1 -compares the traditional DRAM, SRAM and NOR Flash with the most popular emerging resistive-memory technologies, such as spin-transfer torque RAM (STT-RAM), phase-change memory (PCM) and resistive RAM (ReRAM).Table 1 – Comparing different memory technologies. Sources:(B. Li 2019), (Marinella 2013)

Fabrizio Del Maffeo | CEO at AXELERA AI Professor Luca Benini is one of the foremost authorities on computer architecture, embedded systems, digital integrated circuits, and machine learning hardware. We’re honored to count him as one of our scientific advisors. Prof. Benini kindly agreed to answer a few questions for our followers on his research and the future of artificial intelligence. For our readers who are unfamiliar with your work, can you give us a brief summary of your career?I am the chair of Digital Circuits and Systems at ETHZ, and I am a full professor at the Università di Bologna. I received a PhD from Stanford University, and I have been a visiting professor at Stanford University, IMEC, EPFL. I also served as chief architect at STMicroelectronics France.My research interests are in energy-efficient parallel computing systems, smart sensing micro-systems and machine learning hardware. I’ve published more than 1.000 peer-reviewed papers and five books.I am a Fellow of the IEEE, of the ACM and a member of the Academia Europaea. I’m the recipient of the 2016 IEEE CAS Mac Van Valkenburg Award, the 2019 IEEE TCAD Donald O. Pederson Best Paper Award, and the ACM/IEEE A. Richard Newton Award 2020. Which research subjects are you exploring?I am extremely interested in energy-efficient hardware for machine learning and data-intensive computing. More specifically, I am passionate about exploring the trade-off between efficiency and flexibility. While everybody is aware of the fact that you can enormously boost efficiency with super-specialization, a super-specialized architecture will be narrow and short-lived, so we need flexibility. Artificial Intelligence requires a new computing paradigm and new data-driven architectures with high parallelisation. Can you share with us what you think the most promising directions are and what kind of new applications they can unleash?I believe that the most impactful innovations are those that improve efficiency without over-specialization. For instance, using low bit-width representations reduces energy, but you need to have “transprecision,” i.e., the capability to dynamically adjust numerical precision. Otherwise, you won’t be accurate enough on many inference/training tasks, and then your scope of application may narrow down too much.Another high-impact direction is related to minimising switching activity across the board. For instance, systolic arrays are very scalable (local communication patterns) but have huge switching activity related to local register storage. In-memory computing cores can do better than systolic arrays, but they are not a panacea. In general, we need to design architectures where we reduce the cost related to moving data in time and space. Can you share more with us about the tradeoffs and benefits of analog computing versus digital computing and where they can work together?Analog computing is a niche, but a very important one. Ultimately, we can implement multiply-accumulate arrays very efficiently with analog computation, possibly beating digital logic, but it’s a tough fight. You need to do everything right (from interface and core computation circuits to precision selection to size).The critical point is to design the analog computing arrays in a way that can be easily ported to different technology targets without complete manual redesign. I view an analog computing core as a large-scale “special function unit” that needs to be efficiently interfaced with a digital architecture. So, it’s a “digital on top” design, with some key analog cores, that can win.Our sector has a prevailing opinion that Moore’s Law is dead. Do you agree, and how can we increase computing density?The “traditional” Moore’s Law is dead, but scaling is fully alive and kicking through a number of different technologies — 2.5D, 3D die stacking, monolithic 3D, heterogeneous 3D, new electron devices, optical devices, quantum devices and more. This used to be called “More-than-Moore,” but I think it’s now really the cornerstone of scaling compute density – the ultimate goal. You are a very important contributor to the RISC-V community with your PULP platform, widely used in research and commercial applications. Why and when did you start the project, and how do you see it evolving in the next ten years?I started PULP because I was convinced that the traditional closed-source computing IP market, and even more proprietary ISAs, were stifling innovation in many ways. I wanted to create a new innovation ecosystem where research could be more impactful and startups could more easily be created and succeed. I think I was right. Now the avalanche is in motion. I am sure that the open hardware and open ISA revolution will continue in the next ten years and change the business ecosystem, starting from more fragmented markets (e.g., IoT, Industrial) and then percolating to more consolidated markets (mobile, cloud). Can Europe play a leading role in the worldwide RISC-V community?The EU can play a leading role. All the leading EU companies in the semiconductor business are actively exploring RISC-V, not just startups and academia. Of course, adoption will come in waves, but I think that some of the markets where the EU has strong leadership (automotive, IoT) are ripe for RISC-V solutions — as opposed to markets where the USA and Asia lead, such as mobile phones and servers which are much more consolidated. There is huge potential for the European industry in leveraging RISC-V. What is the position of European universities and research centres versus American and Chinese in computing technologies – is there a gap, and how can the public sector help?There is a gap, but it’s not quality; it’s in quantity. The number of researchers in computer architecture, VLSI, analog and digital circuits and systems in the EU is small in relation to USA and Asia. Unfortunately, these “demographic factors” take time to change. So really, the challenge is on academics to increase the throughput. Industry can play a role, too – for instance, leading companies can help found “innovation hubs” across Europe to increase our research footprint.Companies can also help make Europe more attractive for jobs. Now that smart remote working is mainstream, people are not forced to move elsewhere. Good students in — for example — Italian or Spanish universities interested in semiconductors can find great jobs without moving. I am not saying that moving is bad, but if there are choices that do not imply moving away, more people will be attracted to these semiconductor companies and roles. Is the European Chips Act powerful enough to change the trajectory of Europe within the global semiconductor ecosystem?It helps, but it’s not enough. There is no way to pump enough public money to make an EU behemoth at the scale of TSMC. But, if this money is well spent, it can “change the derivative” and create the conditions for much faster growth. Over the last decade, European semiconductor companies didn’t bring any cutting-edge computing technology to market. Is this changing, and do you think European startups can play a role in this change?I think that some large EU companies are, by nature, “competitive followers,” so disruptive innovation is not their preferred approach, even though of course there are exceptions. The movement will come from startups, if they can attract the growth and funding of the larger companies. The emergence of a few European unicorns, as opposed to many small startups that just survive, will help Europe strengthen its position in the semiconductor market.

Interview with Torsten Hoefler, Axelera AI’s Scientific Advisor

Transformers in Computer Vision

An Interview with Marian Verhelst, Axelera AI’s Scientific Advisor

What’s Next for Data Processing? A Closer Look at In-Memory Computing

Ten questions with Axelera AI’s Scientific Advisor Luca Benini

Filter by Type

BLOG

Interview with Torsten Hoefler, Axelera AI’s Scientific Advisor

Transformers in Computer Vision

An Interview with Marian Verhelst, Axelera AI’s Scientific Advisor

What’s Next for Data Processing? A Closer Look at In-Memory Computing

Ten questions with Axelera AI’s Scientific Advisor Luca Benini

Filter by Type

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded