We explore a GPU implementation of a Krylov-accelerated algebraic multigrid (AMG) algorithm with flexible preconditioning. We demonstrate by means of two benchmarks from an industrial computational fluid dynamics (CFD) application that the acceleration with multiple graphics processing units (GPUs) speeds up the solution phase by a factor of up to 13. In order to achieve good performance for the whole AMG algorithm, we propose for the setup a substitution of the double-pairwise aggregation by a simpler aggregation scheme skipping the calculation of temporary grids and operators. The version with the revised setup reduces the total computing time on multiple GPUs by further 30% compared to the GPU implementation with the double-pairwise aggregation. We observe that the GPU implementation of the entire Krylov-accelerated AMG runs up to four times faster than the fastest central processing unit (CPU) implementation.