Autonomy is one of the central issues for the future robots that are expected to operate in continuously changing environments. Reinforcement learning is one of the main approaches for learning in contemporary robotics. With the rise of neural networks in recent studies, the idea of incorporating neural networks with classic Q-learning algorithm for learning policies was introduced in a form of deep Q-Learning algorithm. While supervised and unsupervised learning became widely spread within community, deep Q-Learning still remains a black-box in a sense of parameter tuning as well as neural network architecture and training. In this paper we explore and compare training performance using different parameters and different neural network architectures on a simple use-case of pendulum balancing.