Data Multiplexed and Hardware Reused Architecture for Deep Neural Network Accelerator
G. Raut, A. Biasizzo, A. Prasad Shah, N. Gupta, G. Papa, S. K. Vishvakarma
Neurocomputing, 2021
Despite many decades of research on high-performance Deep Neural Network (DNN) accelerators, their massive computational demand still requires resource-efficient, optimized and parallel architecture for computational acceleration. Contemporary hardware implementations of DNNs face the burden of excess area requirement due to resource-intensive elements such as multipliers and non-linear Activation Functions (AFs). This paper proposes DNN with reused hardware-costly AF by multiplexing data using shift-register. The on-chip quantized based memory addressing with an optimized technique is used to access input features, weights, and biases. This way the external memory bandwidth requirement is reduced and dynamically adjusted for DNNs. Further, high-throughput and resource-efficient memory elements for sigmoid activation function are extracted using the Taylor series and its order expansion have been tuned for better test accuracy. The performance is validated and compared with previous work for the MNIST dataset. Besides, the digital design of AF is synthesized at 45 technology node and physical parameters are compared with previous work. The proposed hardware reused architecture is verified for neural network 16:16:10:4 using 8-bit dynamic fixed-point arithmetic and implemented on Xilinx Zynq xc7z010clg400 SoC using 100 MHz clock. The implemented architecture uses 25% less hardware resources and consumes 12% less power without performance loss, compared to other state-of-the-art implementations, as lower hardware resources and power consumption are especially important for increasingly important edge computing solutions.
BIBTEX copied to Clipboard