An advantage of ArrayFire is that it is very easy to install and it uses internally cl/cuBLAS, cl/cuFFT, MKL, OpenMP and other basic accelerated libraries, transparently for the programmer.
If you want to try GPU computing the easy way, ArrayFire is a good option. If anybody else has interest in using directly basic CUDA or OpenCL I can upload some demo, although the difference in usability is abyssal .