We present ideas and first results on a GPU accelerationof a non-linear solver embedded into the biomedical application codeCARP. The linear system solvers have been transferred already in thepast and so we concentrate on how to extend the GPU acceleration tolarger portions of the code. The finite element assembling of stiffnessand mass matrices takes at least 50% of the CPU time and thereforewe investigate this process for the bidomain equations but with focuson later use in non-linear and/or time-dependent problems. The CUDAcode for matrix calculation and assembling is faster by a factor up to 90compared to a single CPU core. The routines were integrated to CARPsmain code and they are already used to assemble the FE matrices of thebidomain model. Further performance studies are still required for thebidomain-mechanics model.