That sounds to mem that you get some blurred picture as an input and want to use a mask for it. Maybe you need to post a picture on how it should look in the end.
I use a web presenter for that, so it is just a usb port instead of a thunderbolt or a pcie.
I guess the delay is mostly due to the hardware, in my case seems to be very little, in order to 100~200ms, but I haven’t measured.
It takes a bit longer to initialise though.
It works also with NDI virtual camera, so you don’t have to waste an input and route a channel to it, but then the delay will be significantly more.