We have a use case for Caspar powering our large LED video walls with content. We are looking for someone who can help us potentially optimise Caspar to better support this use case.
We’ve identified through our testing that GPU utilisation becomes extremely high when the desktop consumer is set to very large, custom resolutions. Related post here: GPU acceleration for screen consumer - #11 by Julusian
I’m looking to have discussions with anyone interested on embarking on such an endeavour. Of course, this is paid work. There is no predefined budget, as we do not have enough information to set an appropriate budget without a conversation first.
I’m happy to share further information on the requirements via the forum, however, any discussion regarding financials will need to be done privately. Please send me a message or chat via the forum to exchange details.
If successful, this is not only of huge benefit to our use case, but also CasparCG at large!
The screen consumer is not a good choice for stuff like that. It would be better to use some Decklink cards for this. What resolution will the wall have?
The maximum resolution of our video walls will be 4x 4k outputs running at 50fps, 50hz. We stitch the canvas together using Nvidia Mosaic.
Do you think a Decklink card is capable of achieving this? Based on the advice we are working with, the preference is to feed the video wall driver via DisplayPort directly out of our Quadro A6000 graphics cards.
Definitely would love to learn more about the Decklink workflow and how you think this would be better
The Decklink consumer in Caspar does not draw the same amount of power from the system as the Screen does. So that would lighten the burden to the system.
What needs to be considered is the way to stitch everything together. But the Decklink consumer in newer versions of Caspar allow you to define sub areas of a bigger channel, that can be feed individually out to different Decklink channels. So I guess that would be at least worth a try.
To elaborate on this a little (this was my work), newer decklinks (duo2, quad2, 8k pro, maybe more) allow for doing hardware syncing of the card outputs when driven in the right way.
So by adding additional ports to the decklink consumer we can utilise the ability of the hardware to do this frame synchronisation and avoid tearing in videowall scenarios like you are doing.
This was done for a client who is using it to drive some 3x 1080 videowalls. I havent heard of any tearing issues, so I believe it works fine.
You mention that you are doing
4x 4k outputs running at 50fps
which could pretty easily be translated to using an 8k pro
Saying that, I am a little worried about the potential cpu/memory cost of generating those sdi outputs. Just because I haven’t tried it and it involves some cpu work to split the channel into the 4 regions. I’m pretty sure it wont be as much work as the screen consumer though.
As for the screen consumer, it is a pretty suboptimal implementation currently, simply noone has been bothered enough to optimise it. The main issue is likely because after a frame is composited it gets downloaded to the cpu, then the screen consumer has to reupload that to the gpu to display it. It might be doing a copy on the cpu too, I dont remember.
Ideally this should not leave the gpu, which is likely going to be a bit fiddly to get the opengl contexts setup for sharing correctly.
Interesting, okay. It’s worth a try. I’ll see if we can obtain an 8k pro to test.
That is my main concern though, and that is the syncing. Nvidia Mosaic is rock solid when it comes to synchronisation, hence using the screen consumer. We also have the most experience using Mosaic due to other video wall software we use. Using SDI to power the video wall also introduces other challenges, because we are geared to support DisplayPort inputs.
And, that is the purpose of my post. We are very willing to enlist the support of someone who has the ability to optimise the screen consumer. Not only does it help us, but I’m sure this is quite a significant contribution to the project overall.
In conclusion, I definitely appreciate the recommendation, but I think for our use case, the screen consumer is absolutely the best option for our workflow(s).
I have tried and or deployed most of these use cases though not in 8k.
I can tell you that your best workflow for success is to use a decklink card.
I do like the mosaic option and have never used it with CasparCG live.
I have used the screen consumers Live. That workflow leaves a lot to be desired.
Even at normal 1080 the CPU loads were high. Back when we were doing this there was a memory leak. That gave us some grief after 8 - 10 hrs.
The other down side of the screen consumer for us was windows had access to the screen. error messages and a crash could make air. (we were driving a set piece)
We did have a button to smash in a still on led processors. but still. Crashes and errors were rare but the margins were slim. The other issue I had was getting everything to boot the same every time. As we were on a tour, if there was a gear change or we had to replace something, the confidence that everything would map out correctly was not there.
If someone rebuilt the screen consumer i would love to revisit it. as it is an attractive workflow. but my heart could not take the stress and when 4k cards and syncing cards became a thing we ran to it. In doing so left a world of disappointment behind.
As screen pitch gets better and more attainable I am faced with problem more and more. I have a lot of other tools at my disposal and frankly when projects require large and complex screen mappings I leave CasparCG to drive my data driven content and graphics and employ other options to map screens. Caspar becomes a source on those canvases. sometimes split across outputs.
That being said, someone is going to need to drive / spear head a change to CasparCG’s screen consumer. A change that i agree would help a lot of people operating on the edge of the tech. So don’t let my experience deter you but rather encourage you to drive the change. We don’t do this because it is easy….
If you really want to spend money on this, certain NVIDIA workstation GPUs have support for ”direct display”, giving the possibility to get dedicated control of a display that is not managed by the OS. It would be a greater undertaking, but probably the “best” solution for your use case.
This is a snapshot of CPU/GPU utilization on my clients (rather beefy) machine running 4x 2160p5000 (in 16-bit color-depth mode) on a single Decklink 8k Pro using the most recent v210 PR. It almost runs smoothly on my consumer rig from a couple of years ago with a Ryzen 9 3900x and a RTX 2070 Super.
I don’t think it will be that fiddly. And even if it turns out to be a bit fiddly, it won’t be complex. It’ll be fiddly in one location.
In conjunction to test the upgraded screen consumer, we’ve also been testing the SDI workflow. However, when we attempt to run 4x 2160p 50fps outputs from Caspar, we get this error: DeckLink 8K Pro Mini [10|2160p5000] Failed to schedule primary video.
If we run at 25fps or 30fps, we get it running successfully.
Based on your post it seems your able to successfully run 4x 4k 50fps outputs - do you have any insights on how to make this work?
We are running on a PCIE 4 slot, we have a Quadro A6000 graphics card, and a threadripper CPU. I don’t understand what could be causing a bottleneck in this situation.