K4200 copy bottleneck. 4x SDI input

So I have a system:
z640
2x Xeon 12 core @ 2,4 GHZ
64 GB Ram
K4200
Decklink HD Extreme 3
Decklink Duo

I usally use the Duo for extra fill and key or logo loop output.

For this project the HD extreme has fill and key 1080i50 out and the duo is used for 4x 1080i50 input.
We want to scale the inputs down and use them in combination with CG graphics. Sliding them in and out as an overlay.

Now I’m with 2 inputs started everything is fine, with 3 it starts lagging and with for everything breaks and 12sec delay with screen jitters accur.

All systems stay low at around 40 to 50% load except the COPY of the GPU.

Is just a better videocard like the p2000 the solution?
Has anybody done something similar?

Thinking about swapping a RTX 2060 out of one of the edit workstations.