GPU acceleration for screen consumer

aster · December 6, 2020, 4:12pm

Hello everyone!
New user, so please be nice

First off i have to say a big Thank you to the creators of this awesome software.

To introduce myself:
I come from the vent multimedia and production field. We have certain software, hardware and ways of doing things that are not necessarily compatible with broadcast as I have come to understand things.
I do believe however that the underlaying principles are pretty much the same. We both use use Genlock to sync our gear and we both push massive amounts of pixels per second at the most consistent way possible.

Long story short i am trying to play out 4k or even better 8k video via screen consumer.
After all the reading i did, i kind of get a conflicting information whether the GPU is used and what for.
Some said if VLC can play it smooth so can Caspar, If ffmpeg/ffplay can play it it will be ok. However this does not seem to be the case. Let me explain:

I have the latest Server & client working together just fine.
Playing FullHD is just fine - after all this is a child`s play in 2020, i mean phones play and record 8k …
With FullHD load on CPU&GPU is barely registrable.

4K & above things start to change dramatically.
For example VLC & Windows movies & TV app play 8k file like this:

Caspar however struggles … badly. Take a note on the Copy operation in the GPU tab. It is not even the main Memory engine - Copy but it is secondary - Copy 1
Here is what i mean

It seems CasparCG even using ffmeg under the hood uses CPU to encode & then does copy operation to the GPU. Furthermore it seem it does it a number of times judging from the load.

With 4k File situation is pretty much 90% identical.

So far i have tested Every possible Codec & format combination exporting from Media encoder.
Tested on 5 different kinds of machines ranging from AMD ThreadRipper with RTX 8000 to a simple i7 7700k with a GTX 1080
In all tests Caspar seem to struggle badly with high res and screen consumer.
Black Magic output was no better either, It crashed a lot and was laggy and choppy.

Just as a note, the machines that i tested on are in regular use in configurations such as 4 8k outputs and multiple composited layers of video with 16k resolution and multiple SDI inputs all at the same time.
Software doing it is such as Resolume , Ventuz and Dataton Wathout.

Black magic devises tested 8K Pro SDI, Duo 2, Duo2Quad

Anyone willing to shed some light on this?
I am really trying to understand how Caspar Works what kind of file and hardware it is expecting .

Best regards!

mint · December 6, 2020, 5:53pm

Your assumptions are pretty much correct. FFMPEG is used on CPU to decode the files (and in some cases deinterlace) and composited on the GPU, the result of that is transferred back so it can be pushed to SDI or screen consumer or in some cases be composited again with something else. 1080i50 is the most common format so beyond that no optimization will have been done. The screen consumer is not very performant at all before version 2.3. (which is downloaded from GitHub releases section, not the website. an unfortunate inconvenience)

aster · December 6, 2020, 6:11pm

Thank you for the input!
I am using the Latest from github - 2.3.0 LTS
I was really hoping to add CasparCG to my toolkit but it seems to be way behind in tech for now.
May be i will find a use for it at some point.

Best!

mint · December 6, 2020, 6:43pm

Here’s an example of how CasparCG can be useful to events production Again ABN-WTT2019

But in general, it’s not good at outputting high resolution single compositions. Multiple channels of low (well… still HD) channels is where it shines and nothing at the price point can beat it.

aster · December 6, 2020, 6:59pm

Seems we are colleagues in the filed of events. So you understand me when i say i have not played something as small as 1080p in years …
Last month i built a permanent LED installation with 4800x1600px, now i am building one with 8192x3048 pixels and looking at the trends they are getting bigger and bigger - not necessarily in size but in pixel density.

Regarding the price point you are probably right, but i don’t mind paying, it just seems a bit weird to optimize only for something FHD that is considered small by todays standards.

The reason I started playing around with Caspar is because i want another tool in my toolbox. Otherwise we have a home-writen media server that can play huge things in HAP format - Multilayer 16k video across multiple outputs.

So for permanent installations i wanted something that could better play h264/h265 files as the GPUs now days are insanely fast. We may get back to the code and cut it down a bit so we can use for permanent installs and DigitalSignage type of things.

Alternatively if there are Devs here and/or someone is up for a bounty i am open for offers for a version that can do custom resolutions and is optimized for Screen consumer. As far as i searched the Internet and the forum i am not the only one looking for something like this.

mint · December 6, 2020, 7:28pm

There’s very few broadcasters running at more than FHD and since CasparCG is built by (and mostly for) broadcast it’s not that strange I guess. I actually do more broadcast than events work btw, but I can definitely see where you’re coming from.

There’s an experimental project bringing the CasparCG philosophy into 2020 tech but it’s more aimed at higher bit depth and better protocols than high resolution: GitHub - Streampunk/phaneron: Clustered, accelerated and cloud-fit video server, pre-assembled and in kit form. Something to watch in the distant future I guess.

aster · December 6, 2020, 8:13pm

True that, though interesting thing is that our On air Stations are FullHD, while most of the Cable ones are 4k and some of them preparing for 8k nowdays

tv_magic · January 3, 2025, 9:03am

@aster

Fantastic original write up of what you had observed.

We are currently embarking on a project to have Caspar power our studio’s video wall. The resolution that we have the screen consumer set to is 11520 wide by 2160 high.

We have also observed that once Caspar server 2.4 pushes beyond standard resolutions (using the custom resolution feature), GPU usage goes through the roof (in particular on copy), even when the screen consumer is outputting black.

The machine we are using is running an RTX 4090 and AMD threadripper, and while this combination gives acceptable results, I’m quite surprised at how high the GPU utilization is (above 75%).

All we want to do is output a full resolution still image, as the studio background, and then in one 4k portion of the screen we want to show a HTML5 graphics package and one decklink input.

With all this said, I completely see @mint point, that Caspar was not necessarily designed for this use case, and most certainly not optimized for it.

I respond to this post with the hope that one day further optimizations may make their way into Caspar to support very large resolutions. It is such a versatile, and fantastic tool, and we love using it!

didikunz · January 3, 2025, 9:31am

A little heretical question: Do your cameras shoot this wall close enough, that you really see that resolution? Usually a video wall in the studio is somewhere in the background and not fully framed in the picture. Maybe half of that resolution would be enough.

tv_magic · January 3, 2025, 9:39am

@didikunz great point, however, we’ve tried some testing with lower resolutions and it is far too noticeable to go live with.

Generally we are trying to aim to run our video walls at full resolution across the board to get the best quality output possible. Which is a change from the past where we would upscale our outputs.

Appreciate the response though

Julusian · January 4, 2025, 2:27pm

The code for the screen consumer is currently pretty basic/naive and isnt written to be very performant.
When transferring the pixels from the compositor to the window for display it is getting copied via the cpu which will likely be the source of the cost you are seeing.

This is done simply because it is good enough at typical resolutions, and avoiding that copy requires some diving into opengl to figure out the texture sharing to make it work without a copy