The code for the screen consumer is currently pretty basic/naive and isnt written to be very performant.
When transferring the pixels from the compositor to the window for display it is getting copied via the cpu which will likely be the source of the cost you are seeing.
This is done simply because it is good enough at typical resolutions, and avoiding that copy requires some diving into opengl to figure out the texture sharing to make it work without a copy