A case study of a power-efficient and real-time object detection system is presented. We targeted the YOLOv4 convolutional neural network and introduced slight modifications in its architecture, in order to make it compatible with the targeted hardware accelerator platform. The obtained results demonstrate the benefits of the FPGAs over CPUs and GPUs for time and power consumption sensitive applications. The detection accuracy of the quantized model had dropped by 5% to 14%, depending on the input image size of the YOLOv4 neural network. However, in terms of data throughput, the FPGA-based implementation of the YOLOv4 neural network outperforms the CPU implementation by 284 times, as the CPU does not have the hardware structures necessary to parallelize the processing, hence the processing of deep neural networks is serial and slower compared to FPGA and GPU solutions. In comparison to the FPGA implementation, the GPU implementation achieved 3 times higher processing throughput
while consuming 5.4 times the power.