Dekompresija redkih matrik (parametrov nevronske mreže) v strojni opremi / Sparse matrix decompression (Neural network paranmeters) in hardware
V računalniških sistemih je med največjimi porabniki energije gibanje podatkov med glavnim pomnilnikom in računsko enoto (CPE, GPE, FPGA). To je še posebej značilno za nevronske mreže, ki zasedejo ogromno pomnilnika. V glavnem pomnilniku (ali DRAM-u) so podatki tipično shranjeni v nestisnjeni obliki, da je znjimi na procesorju lažje delati. V tej diplomski/magistrski nalogi bo študent raziskoval možnosti kompaktnih zapisov redkih matrik (rezanih (angl. pruned) nevronske mreže) v glavnem pomnilniku in možnosti dekompresije v strojni opremi. Slika 1 prikazuje hipotetični končni sistem za pospeševanje računanja nevroskih mrež, kjer so v DRAM-u podatki shranjeni v stisnjeni obliki, in s tem se tudi zmanjša pretok podatkov med glavnim pomnilnikom in procesnimi jedri. Naloga bo se začela z raziskovanjem načinov stisnjenih zapisov redkih matrik, in potem implementacija dekompresorske enote v izbranem jeziku za opis strojne opreme (Verilog, VHDL, Chisel3). Magistrski študenti pa bi še temu dodal integracijo v sistem (slika 1), in testiranje na dejanski plošči z DRAM-om.
A major factor in the energy consumption of a system is the movement of data between the main memory and the comput unit (CPU, GPU, FPGA). This is an even bigger problem when computing algorithms, like neural networks, which are known for consuming huge amounts of memory to store its numerous parameters. Data in DRAM is usually stored in uncompressed form to make it easier to work with by the compute units. In this bachelor/master thesis the student will investigate how to represent the parameters of the neural network in a compressed form, and creating a hardware decompressor circuit using a hardware description language (Verilog, VHDL, Chisel3) to decompress the tensors on demand for the AI compute core, as shown in Slika 1. The student will start by studying various ways of storing sparse matricies, and move on to implementing a hardware decompressor circuit. For the master thesis the student is also expected to integrate the hardware blocks into a system similar to the one shown in Slika 1, (without the AI core).
Gradivo/resources:
Učenje konvolucijskih nevronskih mrež za zaznavo magnetnega traku na sliki in postavitev na Xilinx DPU / Using convolutional neural networks for magnetic tape detection on images and deployment on Xilinx DPU:
V tej diplomski/magistrski nalogi se bo študent spoznal z naprednimi konvolucijskimi nevronskimi mrežami in jih s pomočjo orodij podjetja Xilinx implementiral z vezji FPGA, konkretno na prototipnem vezju Xilinx ZCU104. Željen objekt zaznave je magnetni trak, ki leži v industrijskem okolju in vodi robota po tovarni (glej Sliko 2). Za strojno pospeševanje se bo uporabil Xilinx DPU (Deep Processor Unit). To je strojna pospeševalna enota, ki omogoča pospeševanje nevronskih mrež na platformah Zynq, ki vsebujejo ARM procesorje in FPGA integriran na enotnem čipu. Naloga študenta je da izbere primerno arhitekturo nevronske mreže, jo nauči na podatkih, ki jih sam tudi označi, in uporabi orodja Xilinx: Vitis, Vivado in sklop orodij Vitis-AI, da generira DPU strojno opremo in pripadajočo programsko opremo na kateri bo testiral konvolucijsko nevronsko mrežo.
The task in this thesis is to train and deploy an advanced convolutional neural network for magnetic tape detection of a mobile robot in a factory setting. The deployment target is the Xilinx ZCU104 board, and the Xilinx DPU(Deep Processor Unit) configured on it. The Xilinx DPU is a soft accelerator, that is targeted to accelerating various types of neural networks. Xilinx also provides a compiler stack called Vitis-AI, that can consume Tensorflow and Pytorch graphs, and it will be the job of the student to both train the network and run it through this compiler stack. The student will also need to configure the DPU by using the Vivado and Vitis tools from Xilinx.
Gradivo/resources:
Samodejno tvorjenje strojne opreme za zelo kvantizirane nevronske mreže v Chisel3-u / Generating highly quantized neural networks in Chisel3.
Nevronske mreže imajo parametre tipično zapisane v formatu s plavajočo vejico (IEEE 754), kjer vsaki parameter zasede 32 bitov. Novejši pristopi so pokazali, da lahko s posebnimi načini učenja nevronskih mrež (učenje s kvantizacijo, angl. Quantization aware training) parametre zapišemo s celimi števili, tudi s samo eno-bitnimo predstavitvijo. Nizka natančnost parametrov omogočaja, da lahko manjše nevronske mreže implementiramo kot kombinacijska vezja, kjer se rezultat nevronske mreže izračuna v enem strojnem ciklu. Na odseku za računalniške sisteme (E7-IJS) razvijamo orodje za generacijo strojne opreme na podlagi opisa takih nevronskimi mrež (QKeras), in sicer je to orodje spisano v jeziku oz. ogrodju Chisel. Chisel je ogrodje za pisanje generatorov digitalne strojne opreme (to je program, ki generira Verilog/VHDL). Študent bo v okviru diplomske/magistrske naloge imel možnost se spoznati s kvantiziranimi nevronskimi mrežami (kako se jih uči), uporabil bo orodje za generacijo strojne opreme in mrežo integriral v sistem s procesorjem, lahko pa bo tudi sodeloval pri razvoju in nadgradnji orodja. Konkretna naloga je odvisna od zanimanja študenta.
The parameters of a neural networks are normally stored as 32-bit floating-point numbers (IEEE 754). It has been discovered that such high percision is not necessary to achieve high accuracy, and various quantization techniques have been developed, that replace 32-bit floating-point numbers with lower precision integer numbers. In the extreme cases this leads to using just 1-bit parameters. Such low precision allows some neural networks to be implemented as a combinational circuit on an FPGA. Such a neural network implementation is tailored or hardcoded for the problem, but can achieve very high throughput; one per cycle. At the Computers Systems Department (E7) at JSI we are developing a tool that takes a highly quantized neural network description (QKeras) and generates the hardware description of it; Essentially it generates a Verilog file. This tool is based on the Chisel hardware construction library/language. As part of the thesis the student will be able to learn about quantized neural networks, how they are trained, use the tool to train such a network and integrate it into a system with a processor, the student will also be able to improve and extend the tool to support new computing paradigms and different types of neural network layers. The exact thesis specifications are dependant upon students interests.
Gradivo/resources:
KONTAKT / CONTACT