In an article published in the journal Nature, researchers introduced a groundbreaking computational event-driven vision sensor capable of directly converting dynamic motion into programmable, sparse, and informative spiking signals. Overcoming the limitations of conventional frame-based image sensors, this innovation enabled in-sensor spiking neural network (SNN) formation for motion recognition.
Background
Traditional image sensors operate on frame-based systems, capturing absolute light intensity at a fixed frame rate, resulting in a substantial amount of redundant data and limiting information. In response, event-driven vision sensors emerged, inspired by biological retinas, capturing and transmitting only relevant changes in a scene. However, these sensors often entail time latency and power consumption due to data transfer between the sensor and processing units.
The existing event-driven cameras, like the dynamic vision sensor (DVS), typically monitor brightness changes in an analog format, transferring data to a separate neuromorphic processor or spiking neural networks for motion recognition. This separation introduces time delays and increased energy consumption, counteracting the benefits of event-driven sensing.
The present study addressed these challenges by introducing computational event-driven vision sensors capable of directly converting dynamic motion into programmable, sparse, and informative spiking signals.
Event-driven spike generation
This pixel cell design for event-driven spike generation featured two branches with opposite and symmetric photodiode configurations connected in parallel. In response to changing light intensity, the branches generated transient and opposite changes in photocurrent with different response times, producing positive or negative spike signals. Unlike traditional single photodetector pixels capturing static information continuously, this design eliminated redundant static vision information and efficiently represented motion information through sparse spiking signals.
Non-volatile and programmable WSe2 photodiode
The researchers presented a novel WSe2 photodiode with non-volatile and programmable features for in-sensor computing. Traditional methods required constant gate voltages, leading to additional energy consumption. The researchers designed a floating split-gate WSe2 photodiode that could locally store synaptic weights even without a gate voltage, reducing power consumption. The device exhibited various conduction regimes and demonstrated excellent rectification behavior. The researchers further configured the photodiode for event-driven spike generation based on changes in light intensity. This innovative approach offered efficient in-sensor computing with controllable photoresponse times and programmable event-driven signals directly at sensory terminals.
Programmable event-driven pixel based on the WSe2 photodiode
The WSe2 photodiode, designed with a floating split-gate configuration, represented a significant departure from conventional event-driven cameras employing silicon photodiodes with fixed photoresponsivity. Unlike chemically doped silicon, the WSe2 photodiode enabled non-volatile modulation of carrier type and density in the WSe2 channel, introducing programmable photoresponsivity. This innovation facilitated precise control over spiking signal amplitudes in pixel cells, enabling both event-driven sensing and synaptic functions.
By adjusting the photoresponsivity through electrical pulses to the gate terminal, the amplitude of spiking signals (A) increased monotonically, demonstrating a linear relationship between A and photoresponsivity (R). This programmable feature allowed the emulation of different synaptic weights in an SNN. The study successfully integrated event-based sensing with in-sensor computation, offering real-time changes in total output current under external light excitation, showcasing event-driven characteristics.
The WSe2 photodiode's unique attributes enable the SNN to generate output spikes only in response to changes in light intensity, distinguishing it from artificial neural networks (ANN) that consider absolute light intensity.
In-sensor spiking neural network
The in-sensor SNN employed a WSe2 photodiode array in a crossbar configuration for motion recognition. Each pixel extended into n subpixels, corresponding to motion classes. The array's event-driven spiking signals, generated in response to dynamic changes, efficiently distinguished motion from static information. The SNN processes dynamic motion, reducing data substantially compared to conventional frame-based sensors. The photodiode array could be programmed to emulate synaptic weights, enabling it to accurately execute motion recognition tasks. The system's potential for large-scale demonstration was demonstrated through a 3x3 pixel array, showcasing its effectiveness in motion direction recognition with a 5 μs temporal resolution.
Methods
Device Fabrication: The WSe2 devices were fabricated on a SiO2/p-Si substrate using electron-beam lithography. Cr/Au electrodes were deposited, and Al2O3/HfO2/Al2O3 dielectric layers were sequentially added. Two-dimensional WSe2 flakes were mechanically exfoliated and transferred to the substrate, followed by the addition of Au source-drain electrodes.
Device Characterization: Electrical measurements were conducted using a semiconductor analyzer. Optoelectronic measurements were performed with a 520 nm laser, and light intensity was calibrated using an optical power meter. Light pulse measurements utilized a pulse generator, oscilloscope, and a trans-impedance amplifier.
In-Sensor Spiking Neural Network Simulations: Simulations employed PyTorch v.1.7.1, creating a single-layer SNN with 128x128 input neurons and three output neurons. The dataset included 300 instances of three gestures, and weights were trained using a gradient descent strategy. The ANN model, constructed with the same number of input and output neurons, was converted to the SNN.
Conclusion
In conclusion, the study introduced a groundbreaking computational event-driven vision sensor characterized by its ability to generate adjustable current spikes exclusively in response to changes in light intensity, achieving an impressive temporal resolution of 5 μs. Leveraging WSe2 photodiodes, the researchers successfully demonstrated non-volatile, programmable, and linear photoresponsivity dependent on light intensity. This distinctive feature allowed for the emulation of synaptic weights within a neural network, contributing to the sensor's versatility and adaptability.
Furthermore, integrating output neurons enables the creation of an in-sensor SNN that exhibited a commendable 92% accuracy in motion recognition tasks. The direct execution of motion recognition within event-driven sensory terminals represented a significant advancement, offering the potential for real-time edge computing vision chips. This innovation holds promise for various applications, pointing towards the future development of efficient and high-performance vision systems with edge computing capabilities.
Journal reference:
- Zhou, Y., Fu, J., Chen, Z., Zhuge, F., Wang, Y., Yan, J., Ma, S., Xu, L., Yuan, H., Chan, M., Miao, X., He, Y., & Chai, Y. (2023). Computational event-driven vision sensors for in-sensor spiking neural networks. Nature Electronics, 1–9. https://doi.org/10.1038/s41928-023-01055-2, https://www.nature.com/articles/s41928-023-01055-2