Gaussian processes (GP) regression is a powerful probabilistic tool for modeling nonlinear dynamical systems. The downside of the method is its cubic computational complexity with respect to the training data that can be partially reduced using pseudo-inputs. The dynamics can be represented with an autoregressive model, which simplifies the training to that of the static case. When simulating an autoregressive model, the uncertainty is propagated through a nonlinear function and the simulation cannot be evaluated in closed-form. This paper combines the variational methods of GP approximations with a nonlinear autoregressive model with exogenous inputs (NARX) to form variational GP (VGP-NARX) models. We show how VGP-NARX models, on average, better approximate a full GP-NARX model than more commonly used GP-NARX (FITC) model on 10 chaotic time-series. The modeling capabilities of VGP-NARX models are compared with the existing approaches on two benchmarks for modeling nonlinear dynamical systems. The advantage of general-purpose computing on graphics processing units (GPGPU) for Monte Carlo simulation on large validation data sets is addressed.