4K Video outputs with Decklink 8K Pro

Yesterday I had to install a new 8K Pro and I’ve been getting some weird behaviours. I was using server 2.3.3.

Have anyone tested it extensively?

On paper the card is able to output 4 separate streams, and four 4K channels should not be an issue for this card.

So I configured the card to have SDI independent connectors. (It says “SDI 1 in or out”, etc.)

Then I set caspar to 2160p25 and I got all four outputs working. All good so far.

But when I set caspar to 2160p50 it isn’t able to output video in that channel, not even to a screen consumer. Is like the channel is “broken” when I add a decklink with the card configured as four independent channels/devices at that resolution. If I remove the decklink consuner channel works fine.

The only way I managed to get it working at 2160p50 was when I set the card to be only two devices (“SDI 1 in, SDI 2 out” and the same with 3/4). So with this config I think I have two fills and two keys at 2160p50.

But it seems odd to me that the card is behaving like this if I want four video streams at more than 4k25p…

I was expecting the card to behave like a Duo2 up to 4K60p at least.

Unfortunately I won’t be able to share any logs/config files until I’m back at the office in a couple of weeks.

Any thoughts?

Thanks!

I’m making several assumptions here - so no guarantees of any solution…

Blackmagic technical specifications on their products pages state that the card needs an 8-lane Generation 3 PCIe slot, which would deliver a transfer rate in the order of 7.87 GBytes/sec, assuming that the card has unique use of the 8 lanes. I’ll assume an Ultra-1 resolution operation - so 3840 samples per line on 2160 lines.

CasparCG uses 8-bits/sample where each sample has 4 components giving us a total data volume for 1 frame equal to 3840 * 4 * 2160 = 33177600 bytes per frame. For a frame rate of 50 per second (2160p50) this produces 1658880000 bytes/second of video data to transfer to the 8k Pro card per channel of CasparCG (approximately 1.659 GBytes/sec). There is extra data to transfer such as audio, but this is a small volume of data compared to the video. Hence for 4 channels of CasparCG output we need a data transfer to the card in excess of 1.659 * 4 = 6.636 GBytes/s which should be within the capability of an 8 lane Gen 3 connection with no other devices contending.

A generation 2 connector has approximately half the data bandwidth of generation 3 at 4.0 GBytes/s (using the table in Wikipedia about PCIe cards). Droping the frame rate to 25 per second would reduce the data rate required from CasparCG to 3.318 GBytes/sec for 4 channels, within the bus capacity of a gen 2 slot. The same data rate is needed for 2 channels of 2160p50.

Using the Blackmagic Decklink SDK documentation reveals that a 4:2:2 YCbCr 10-bit operation will use 1.106 GBytes/s per frame, so a gen 2 slot should be able to output 3 channels at 2160p50. This can be checked by running multiple instances of the Blackmagic Express software each playing a 2160p50 source file.

So my suspicion is that your system is only delivering PCIe 2 8 lane operations.

2 Likes

Also the GPU needs to have enough bandwidth - and memory, to handle compositing 4x 2160p50. This is often the case when someone says BMD Express works fine for an output since that isn’t going in and out of the GPU.

Thank you for your suggestions, I’ll make sure I double check when I get back to testing.

If my memory doesn’t fail me I installed the 8KPro in a HP Z8 with a bazillion cores, a Quadro P5000 and I probably installed it on a PCIx16 slot, but I didn’t check the bios to see if there was some modifiers active on that pci slot.

Also at some point I wasn’t even trying to play 4K files, just normal 1080. At some point I was all over the place with the testing so I just needed to check presence signal… So I sticked to the test file we often use.

Thank you,
Carlos.

Thank you, good reminder about need for GPU capabilities. I sense a potential topic for the FAQ part of the wiki.

Can I ask you if I understand the memory requirements in the GPU? My assumption is that each CasparCG channel, with or without an attached consumer, will use 1 ARGB frame buffer for every active layer in that channel, plus 1 ARGB frame buffer for the combined output. Is that a correct assumption?

Does an empty channel, or a channel with just one active layer used for a lower third also use an ARGB frame buffer for the base, transparent black, image?

I also assume that the PCIe connection to the GPU card has to transfer all of the input and output GPU frame buffers, hence potentially becoming a botttleneck quite quickly if multiple Ultra-1 resolution layers are activated.

Pointers to any web references about computing GPU bandwidths very welcome.

I couldn’t find it back then with the forum search but apparently someone else had the same issue:

In the last post they seem to give the reason behind this behaviour, it comes to the server it would seem.