Culture
Jun 27, 2024
Culture
Learning from Nature Again: Neuromorphic Computing and Deep Learning
Jeongkyu Shin
Founder / Researcher / CEO
Jun 27, 2024
Culture
Learning from Nature Again: Neuromorphic Computing and Deep Learning
Jeongkyu Shin
Founder / Researcher / CEO
This article originally appeared in Engineers and Scientists for Change, May 2022.
The field of artificial neural networks has been gaining serious attention for nearly a decade now. In that short time, along with the advancements in deep learning, the field has been solving countless problems at an astonishing pace. It is considered the most promising approach for achieving artificial intelligence.
At the cutting edge, news about hyperscale deep learning models and their implementations have been garnering attention. In April 2022, news about NVIDIA's new H100 GPU flooded the headlines. AMD's high-performance computing GPUs like the MI series, along with Intel's new Ponte Vecchio GPU, are expected to drastically improve AI and blockchain mining acceleration, creating a new battleground for hyperscale AI.
Amidst the buzz around hyperscale AI, there was one piece of news that didn't receive much public interest: Intel's announcement of the Loihi 2 chip in October last year[1]. This news comes with a fascinating history and technical background. While AI training and service acceleration chips are proliferating, let's explore the science behind this intriguing tech news.
Can we code intelligence with deep learning?
Since 2013, when deep learning took off with matrix computation acceleration using GPUs, the field has begun exploring various possibilities empowered by the scale of computation. Starting with the AlphaGo shock in 2016, deep learning has gradually expanded its scope beyond research into practical applications. The transformer model architecture[2], proposed in 2017 and widely adopted from 2018, introduced the concepts of attention and self-attention, greatly improving the process of deep learning models creating their own memory structures. The transformer architecture has since been used in a wide range of deep learning models, particularly excelling in the language processing and image processing domains, where data is abundant. Transformers have enabled deep learning models to solve various problems that previously seemed intractable.
This seemingly omnipotent model has led the trend of scaling up deep learning models since 2018. The size of a deep learning model is determined by the number of its parameters, which are the connection information between the perceptrons that make up the model, corresponding to the synaptic connections of actual neurons. More connections allow the deep learning model to differentiate and judge more complex inputs. As deep learning models become more complex and massive, the number of parameters grows exponentially. Until 2019, the number of parameters in deep learning models increased roughly 3 to 5 times each year, but since 2019, it has been increasing more than tenfold annually. The massive deep learning models that have emerged in the past 2-3 years are sometimes referred to as "hyperscale AI". Well-known hyperscale deep learning models in the language processing domain include OpenAI's GPT-3 and Google's LaMDA. For these huge models, such as GPT-3, the system cost for training the model (the cost of a single training run on the cloud without purchasing equipment) is estimated to be at least around 5 billion KRW (approximately 4 million USD)[3].
Hyperscale models are solving problems that were previously difficult or impossible to solve. They discover new gravitational lenses[4] and unwarp the distortions caused by lenses[5] to unravel the mysteries of the universe. They predict protein folding structures in much shorter times and at lower costs than previous methods[6] and discover new drugs[7]. They even solve problems like StarCraft II strategic simulations[8], which require understanding flows over long periods of time.
As these models solve various problems, naturally, some questions arise. Is this approach of pouring in massive amounts of resources to create deep learning models sustainable? And can we code "intelligence" using this method?
To answer these two questions, let's quickly understand deep neural networks and today's topic, neuromorphic computing.
Deep Neural Networks: Origins and Differences
Deep learning is actually an abbreviation. The full term is Deep Neural Network (DNN), or more elaborately, Artificial Neural Network with deep layers. The theory of artificial neural networks has its roots in mathematically imitating the electrical properties of neurons. It began by mathematically imitating the electrical properties of neurons along with the plasticity[e1] of strengthening or weakening the connections between neurons during information processing, and then simplifying it. The artificial neural network model is a mathematical model that consists of a perceptron[9], which is an extreme simplification of the activation process based on the connections between neurons; an activation function that simplifies the firing process of neurons into a function of signal input to neurons, excluding time dependency; and weights as parameters representing the strength of connections between neurons.
Although artificial neural network theory has its roots in the characteristics of actual neural networks, there is a fundamental difference: the presence or absence of dynamics that determine behavior over time. In real neural networks, various outcomes are determined by the dynamics between neurons. Neurons have their own dynamic characteristics when stimulated from the outside, and they have plasticity that physically strengthens or weakens accordingly. For example, neurons that are continuously used and connected together when making a certain decision are activated at similar times when receiving an input signal. We can observe that the axons corresponding to the connections between neurons activated at "temporally" similar times become physically thicker. In contrast, general artificial neural networks simulate plasticity using backpropagation theory instead of dynamics. Backpropagation theory is a method to simplify the calculation of strengthening the weights of connections between perceptrons that were used to make a correct decision. In artificial neural networks, the process of processing input information is instantaneous using the weights between perceptrons. Since the process of input information leading to output information is not calculated as a function of time, there is no dynamic element.
There are various other differences besides dynamics. These differences are usually the result of introducing assumptions that are impossible in biological neural networks in order to overcome the limitations of artificial neural network theory in the 1990s. One example is the use of ReLU[e2] as an activation function. Ordinary neurons have thresholds and weight limits. Infinite activation values are physically impossible. So mathematical models also used functions that well represent thresholds and weight limits as activation functions. However, as deep artificial neural networks became deeper, researchers discovered that the training of artificial neural networks stopped progressing.[e3] The ReLU activation function, although physically impossible, can have infinite weights mathematically[e4]. By introducing ReLU into deep artificial neural networks, new training became possible, and the difference from biological neural networks grew.
Artificial neural networks that do not need to consider dynamics can be transformed into a sequence of matrix operations, enabling incredibly fast computation. However, they have become very different from the neural networks seen in biology. So, are deep learning models and the neurological processes occurring in our brains now on completely different foundations?
Learning from Nature Again: Dynamics of Neural Networks
Individual neurons communicate signals in various ways. Some are electrical signals, and some are chemical signals. The electrical signal characteristics within single neurons were interpreted and formulated very early[10] and became the theoretical basis for perceptrons. The problem was that the formula was too complex to calculate dynamics without simplification. Later, various mathematical models were proposed to approximate the electrical responses over time with reduced computational burden, and various single neuron simulators based on these models have been released. A representative simulator is NEURON[11].
As mentioned earlier, simulating dynamics requires an enormous amount of computation. At some point, we are entering an era of abundant computational power. What would happen if we connected these single neuron simulations based on the overflowing computational power?
There are two approaches, in terms of algorithms and hardware, to solve the problem of computational speed due to the enormous amount of computation and create dynamics-based artificial neural networks. The algorithmic approach is the spiking neural network (SNN), which attempts to create a dynamics-based artificial neural network by introducing spike-based plasticity that occurs in actual neurons. The hardware approach is neuromorphic computing, which has been in full swing since 2012. It involves implementing an artificial neural network by creating physical objects corresponding to neurons. Computers are still too slow to solve the enormous amount of computation involved in dynamics simulation with general-purpose computation. To address this, the idea is to create dedicated devices that either make objects with mathematical properties corresponding to neurons at the circuit level or hardcode computations. Recently, there has been an integration of not distinguishing between neuromorphic computing and SNN, and referring to the implementation of SNN at the device level as neuromorphic computing. Both approaches are attempts to simulate the dynamic characteristics that traditional artificial neural network theory did not use, to discover new phenomena or possibilities of deep learning.
One of the companies making strides in the field of neuromorphic computing is Intel. In the fall of 2017, Intel unveiled the Loihi[e5] chip, a research neuromorphic chip containing approximately 130,000 neurons and 130 million synapses. They ported existing DNN-based algorithms onto the Loihi chip, performed various comparative tests[12], and interestingly, showed that similar results to DNN can be obtained using SNN as well.
Intel then connected multiple Loihi chips to create massive SNN systems. Nahuku implemented 4.1 billion synapses, and the Pohoiki Springs neuromorphic supercomputer[13] implemented approximately 101 million neurons and 100 billion synapses based on 768 Loihi chips. In this process, Intel developed a software stack to implement SNN on Loihi. As a result, last fall, along with Loihi 2, they released the Lava open-source software framework[14] for developing neuromorphic applications.
It was expected that DNN and SNN would show similar results. From a physics perspective, the process of artificial neural networks inferring various problems is ultimately defining an ultra-high-dimensional discontinuous state space based on information and projecting new information onto that space. Both DNN and SNN have the characteristic of being able to define ultra-high-dimensional discontinuous state spaces. Through evolution, biology has physically created the characteristic of adapting to information, and humanity has invented artificial neural network theory through biomimetics and developed deep learning.
Always Finding Answers, as Always
So far, we have learned that networks that imitate neurons at the dynamics level can also obtain results similar to what we expected from deep learning. Then, a question arises: if the results are similar, is there a need to use SNN and neuromorphic computing? The examples introduced today are just a tiny fraction of various attempts. Research is ongoing on how SNN and neuromorphic computing produce different results from existing approaches. There are also results showing that SNN performs better, especially in robotics and sensors, and studies suggesting that reflecting dynamic characteristics would be more powerful for inferring causality. There are even attempts to simulate the chemical signals occurring at synapses[15]. This is because, in addition to the connection structure of neural networks, there may be elements in the individual components that make up neural networks that evoke the emergence of intelligence that we are not yet aware of. However, this may not be a sufficient answer to why SNN is used.
Let's ask the two questions posed at the beginning of the article again. Is this approach of pouring in massive amounts of resources to create deep learning models sustainable? And can we code "intelligence" using this method? Could neuromorphic computing be the answer? It could be, or it might not be.
The reason why DNN and SNN each show high performance and results is ultimately because there is an information optimization theory that we do not yet know at the foundation of both implementations. If we come to understand that, we may be able to implement AI in a different way. It could be one path to finding an answer to the first question: "Is this approach of pouring in massive amounts of resources to create deep learning models sustainable?" Neuromorphic computing and SNN allow us to examine this problem from a new perspective.
And it could also be the answer to the second question. We always carry a question in our hearts: 'Who are we?' The approach of neuromorphic computing and SNN is the most easily understandable method when we physically approach this fundamental philosophical question. Because it explains using a system that we already know (although we don't yet know its framework).
Various fields, including neuromorphic computing, are challenging to answer the above two questions. One of them is quantum computing. Next time, let's take the opportunity to read an article together about quantum computing and deep learning.
References
- [1] https://www.anandtech.com/show/16960/intel-loihi-2-intel-4nm-4
- [2] https://arxiv.org/abs/1706.03762
- [3] https://lambdalabs.com/blog/demystifying-gpt-3
- [4] https://iopscience.iop.org/article/10.3847/1538-4357/abd62b
- [5] https://academic.oup.com/mnras/article-abstract/504/2/1825/6219095
- [6] https://www.nature.com/articles/s41586-021-03819-2
- [7] https://www.frontiersin.org/articles/10.3389/frai.2020.00065/full
- [8] https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii
- [9] https://doi.apa.org/doi/10.1037/h0042519
- [10] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1392413
- [11] https://neuron.yale.edu/neuron
- [12] https://ieeexplore.ieee.org/document/8259423
- [13] https://arxiv.org/abs/2004.12691
- [14] https://www.intel.com/content/www/us/en/newsroom/news/intel-unveils-neuromorphic-loihi-2-lava-software.html
- [15] https://www.ibm.com/blogs/research/2016/12/the-brains-architecture-efficiency-on-a-chip/
Endnotes
- [e1] Plasticity is the ability to adapt and change one's characteristics in response to changes in the external environment or stimuli.
- [e2] It stands for Rectified Linear Unit. It is an activation function that takes the shape of y=x for values greater than 0, which means y can continue to increase as x increases.
- [e3] This is known as the Vanishing Gradient problem.
- [e4] Neurons cannot produce outputs beyond the physical limits of the cell, no matter how much larger the input they receive. It's like not being able to pass an unlimited current through a wire. ReLU is a function where the output increases linearly and indefinitely in proportion to the input.
- [e5] Intel is using various Hawaiian place names as codenames for their neuromorphic chips and systems.