Session 16: Circuit and Device Interaction Resistive Device Designs for von-Neumann Computing and Beyond
Tuesday, December 6, 9:00 a.m.
Imperial Ballroom B
Co-Chairs: Elisa Vianello, CEA-LETI
Meng-Fan Chang, National Tsing Hua University
16.1 Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition, H. Li, T. Wu, A. Rahimi*, K.-S. Li**, M. Rusch*, C.-H. Lin**, J.-L. Hsu**, M. Sabry, S. Burc Eryilmaz, J. Sohn, W.-C. Chiu**,M.-C. Chen**, T.-T. Wu**, J.-M. Shieh**, W.-K. Yeh**, J. Rabaey*, S. Mitra and H.-S. P. Wong, Stanford University, *University of California, Berkeley, **National Nano Device Laboratories
The ability to learn from few examples, known as one-shot learning, is a hallmark of human cognition. Hyperdimensional (HD) computing is a brain-inspired computational framework capable of one-shot learning, using random binary vectors with high dimensionality. Device-architecture co-design of HD cognitive computing systems using 3D VRRAM/CMOS is presented for language recognition. Multiplication-addition-permutation (MAP), the central operations of HD computing, are experimentally demonstrated on 4-layer 3D VRRAM/FinFET as non-volatile in-memory MAP kernels. Extensive cycle-to-cycle (up to 1E12 cycles) and wafer-level device-to-device (256 RRAMs) experiments are performed to validate reproducibility and robustness. For 28-nm node, the 3D in-memory architecture reduces total energy consumption by 52.2% with 412 times less area compared with LP digital design (using registers as memory), owing to the energy-efficient VRRAM MAP kernels and dense connectivity. Meanwhile, the system trained with 21 samples texts achieves 90.4% accuracy recognizing 21 European languages on 21,000 test sentences. Hard-error analysis shows the HD architecture is amazingly resilient to RRAM endurance failures, making the use of various types of RRAMs/CBRAMs (1k ~ 10M endurance) feasible.
16.2 Binary Neural Network with 16 Mb RRAM Macro Chip for Classification and Online Training, S. Yu, Z. Li, P.-Y. Chen, H. Wu*, B. Gao*, D. Wang*, W. Wu* and H. Qian*, Arizona State University, *Tsinghua University
On-chip implementation of large-scale neural networks with emerging synaptic devices is attractive but challenging, primarily due to the pre-mature analog properties of today’s resistive memory technologies. This work aims to realize a large-scale neural network using today’s available binary RRAM devices for image recognition. We propose a methodology to binarize the neural network parameters with a goal of reducing the precision of weights and neurons to 1-bit for classification and
16.3 A ReRAM-based Single-NVM Nonvolatile Flip-Flop with Reduced Stress-Time and Write-Power against Wide Distribution in Write-Time by Using Self-Write-Termination Scheme for Nonvolatile Processors in IoT Era, C.-P. Lo, W.-H. Chen, Z. Wang**, A. Lee*, K.-H. Hsu, F. Su**, Y.-C. King, C. J. Lin, Y. Liu**, H. Yang**, P. Khalili*, K.-L. Wang* and M.-F. Chang, National Tsing Hua University, *University of California, **Tsinghua University
Recent nonvolatile flip-flops (nvFFs) enable the parallel movement of data locally between flip- flops (FFs) and nonvolatile memory (NVM) devices for faster system power off/on operations. The wide distribution and long period in NVM-write times of previous two-NVM-based nvFFs result in excessive store energy (ES) and over-write induced reliability degradation for NVM-write operations. This work proposes an nvFF using a single NVM (1R) with self-write-termination (SWT), capable of reducing ES by 27+x and avoid over-write operations. In fabricated 65nm ReRAM nvProcessor testchips, the proposed SWT1R nvFFs achieved off/on operations with a 99% reduction in ES and 2.7ns SWT latency (TSWT). For the first time, an nvFF with single NVM device is presented.
16.4 50×20 Crossbar Switch Block (CSB) with Two-Varistors (a-Si/SiN/a-Si) Selected Complementary Atom Switch for a highly-dense Reconfigurable Logic, N. Banno, M. Tada, K. Okamoto, N. Iguchi, T. Sakamoto, H. Hada, H. Ochi*, H. Onodera**, M. Hashimoto*** and T. Sugibayashi, NEC Corp., *Ritsumeikan University, **Kyoto University, ***Osaka University
A 50×20 crossbar switch block (CSB) with two-varistors selected complementary atom switch (2V-1CAS) is newly developed for nonvolatile-FPGA. The 2V-1CAS can realize the multiple fan-outs without select transistors. The improved a-Si/SiN/a-Si varistor with a novel triple layered SiN shows superior nonlinearity of 1.1×10^5 with Jmax=1.63MA/cm^2. The developed CSB is also applicable for memory of LUTs.
16.5 Zero Static-Power 4T SRAM with Self-Inhibit Resistive Switching Load by Pure CMOS Logic Process, C. F. Liao, Y.-D. Chih*, J. Chang*, Y. C. King and C. J. Lin, National Tsing Hua University, *Taiwan Semiconductor Manufacturing Company
A full logic compatible 4T2R nonvolatile Static Random Access Memory (nv-SRAM) is successfully demonstrated in pure 40nm CMOS logic process. This non-volatile SRAM consists of two STI RRAMs embedded inside the 4T SRAM with minimal area penalty and full logic compatibility. Data is accessed through SRAM cells, and stored by switching one of the loading RRAMs by an unique self-inhibit feature. With this embedded STI RRAM storage nodes, data can be held under power-off mode with zero static power.
16.6 Experimental Demonstration of Short and Long Term Synaptic Plasticity Using OxRAM Multi k-bit Arrays for Reliable Detection in Highly Noisy Input Data, T. Werner, E. Vianello, O. Bichler*, A. Grossi, E. Nowak, J.-F. Nodin, B. Yvert**, B. De Salvo, L. Perniola, CEA LETI, *CEA LIST, **INSERM
In this paper, we propose a new circuit architecture and a reading/programming strategy to emulate both Short and Long Term Plasticity (STP, LTP) rules using non-volatile OxRAM cells. For the first time, we show how the intrinsic OxRAM device switching probability at ultra-low power can be exploited to implement STP as well as LTP learning rules. Moreover, we demonstrate the computational power that STP can provide for reliable signal detection in highly noisy input data. A Fully Connected Neural Network incorporating STP and LTP learning rules is used to demonstrate two applications: (i) visual pattern extraction and (ii) decoding of neural signals. A high accuracy is obtained even in presence of significant background noise in the input data.
16.7 Device Nonideality Effects on Image Reconstruction using Memristor Arrays, W. Ma, F. Cai, C. Du, Y. Jeong, M. Zidan and W. Lu, University of Michigan
We analyze the effects of device variability during experimental image reconstruction using memristor crossbar arrays. The effects of device variability during online and offline training were carefully studied, along with device failures including SA0 and SA1. SA1 failure was found to significantly affect image reconstruction results, and a practical approach was developed to mitigate the effects of SA1 failure in memristor crossbars.
16.8 Demonstration of Hybrid CMOS/RRAM Neural Networks with Spike time/rate-dependent Plasticity, V. Milo, G. Pedretti, R. Carboni, A. Calderoni*, N. Ramaswamy*, S. Ambrogio and D. Ielmini, Politecnico di Milano and IU.NET, *Micron Technology
Neural networks with resistive-switching memory (RRAM) synapses can mimic learning and recognition in the human brain, thus overcoming the major limitations of von Neumann computing architectures. While most researchers aim at supervised learning of a pre-determined set of patterns, unsupervised learning of patterns might be attractive for brain-inspired robot/drone navigation. Here we demonstrate neural networks with CMOS/RRAM synapses capable of unsupervised learning by spike-time dependent plasticity (STDP) and spike-rate dependent plasticity (SRDP). First, STDP learning in a RRAM synaptic network is demonstrated. Then we present a 4-transistor/1-resistor synapse capable of SRDP, finally demonstrating SRDP learning, update, and recognition of patterns at the level of neural network.