As far as I understand it, the main bottleneck on mobile GPUs these days is memory bandwidth. They’re designed to work on a tile at a time, using a small but very fast on-chip cache to hold the data for just the current tile. Then once the tile is done, they write it out to slower shared memory. That write out or read back from shared memory tends to be the slowest part.
So, if the workload you have in mind can be divided into “small” chunks where all the working data you need fits in cache, and you only need to pull in or output a “small” amount of data to shared RAM at only “a few” well-defined phases of the calculation, you “could” see a substantial win from moving it to GPU.
But all those words in scare quotes depend sensitively on the particulars of your use case: what kinds of data, what workloads, what target hardware, how much / how frequently, etc., as well as how busy the CPU/GPU and the RAM connection are with everything else your game is doing. So we can’t boil this down to a simple rule of thumb. The only way to know the answer to any performance question for sure is to build a test and profile it, to measure what real results you get.