NDI output crashes Server after some time

server
ndi

#1

During the last couple of weeks I’ve been experiencing crashes with Server 2.1.x, official and NRK versions, after a few hours using NDI/iVGA consumers in my channels. Here’s a part of my config file:

  <channels>
    <channel>
      <video-mode>1080i5994</video-mode>
      <channel-layout>mono</channel-layout>
      <consumers>
        <screen>
          <device>1</device>
          <windowed>true</windowed>
          <stretch>uniform</stretch>
        </screen>
        <decklink>
          <device>4</device>
          <embedded-audio>true</embedded-audio>
          <channel-layout>mono</channel-layout>
        </decklink>
        <newtek-ivga />
      </consumers>
    </channel>
  </channels>

I’m puzzled about why does this happen as there are no records in the log files about this, even on trace mode. Also I’ve been unable to reproduce the issue manually as it happens over time, sometimes in a 24-hour period, sometimes even longer than that.

My suspicions rely on the NDI capabilities of CasparCG, because once I commented the <newtek-ivga /> line in my config file, no problems have ocurred so far.

Now my problem is that I have no way to stream my channel using OBS, as streaming using ADD 1 STREAM command hasn’t worked at all whenever I try it.

Any help on this issue might come in handy, as I need a way to stream my channel.

Thanks in advance.


#2

Are you running OBS on the same machine or on a separate?


#3

I’m running it on the same machine.


#4

The reason I ask, is that I have just tried the Gstreamer NDI plugin, and I experienced a serious memory leak.
Can you see if something creates a memory leak?


#5

I’d like to know how can I do that, but I see that you’re suspecting of OBS as a source of the crash.

How can I monitor that? Right now using Windows 10 Pro 1809 with all updates.


#6

CTRL-ALT-DEL click on “Task Manager” and click on “more details”
Click so you sort by memory, and see if memory usage grovs over time on one of the running programs.


#7

Perfect. I’ll be testing this weekend and come back with results.


#8

FWIW, I prefer procexp.exe from sysinternals to the generic windowz task manager- more info, better graphing, etc. There’s a lot of useful stuff in there (https://docs.microsoft.com/en-us/sysinternals/).


#9

Right now I’m using perfmon, as is included in Windows. It also provides a log.


#10

I didn’t need to go too far to experience a couple of crashes, but I saw no memory leaks whatsoever on perfmon or taskman. Actually, just now I experienced a crash without OBS opened and no registries on the Server log.

I’ll attach both a trimmed version of the Server log where I experienced the last 2 crashes, the first with OBS running and the second without it, as well as an OBS log file. Just replace the .html extension of the files to .log to look at them properly.

Now, I suspect that probably the fault might be in the dll used by Server to provide an NDI consumer, because that’s an apparently really old version of an NDI implementation. As a precaution I turned off NDI in the config file again.

I’ll run some tests replacing the Processing.AirSend.x64.dll file that I got using these instructions to setup NDI with the one that is referenced in the OBS logs, located at %ProgramFiles%\NewTek\NewTek NDI 3.8 Tools\Runtime\Processing.NDI.Lib.x64.dll.

Anyway, I’d like to know if there’s any alternative to NDI as a way to stream from CasparCG to OBS (UDP? RTMP?).

Thanks in advance.


#11

Could you try running 2.1.6? I can see you are using a html producer, and there is a crash I fixed in there recently. Or perhaps try running without the html if you cant update? I’m not sure if it will help, but its worth a try.

Unfortunately that is currently the newest version of NDI that can be used in any version of CasparCG. It doesnt yet support it natively, so via the old AirSend/iVGA is the only way to do it currently.


#12

I’ll try updating over the weekend, as well as update OBS and the NDI plugin for OBS. I’ll report my findings.

I was actually thinking about renaming the dll from the NDI Redist with the name Server recognizes. I know it’s a stretch, but it’s worth the shot.


#13

So, after a whole weekend of testing with different versions of Server, all I can say is that NDI/iVGA is the culprit. Things I tried during this weekend:

  • Updating NVIDIA Quadro drivers to latest version.
  • Updating Windows 10 to latest updates.
  • Turning off Game Mode and XBOX Game Bar in Windows 10.
  • Updating OBS to latest stable version.
  • Updating OBS NDI Plugin to latest stable version.
  • Running CasparCG Server 2.1.0b2 and 2.2.0 official, as well as all 2.1.x NRK stable releases.
  • Running CasparCG Server different versions with iVGA consumer active and inactive.

This is the log of the latest run with Server 2.1.6 NRK and iVGA active (warning: file is about 5 MB long), which is the cleanest one, as it only runs an HTML producer for about 8 hours before crashing. Other tests were with and without HTML producers on top, which yield similar results: it crashes but doesn’t notify at all.

Now, as I can see that iVGA is the problem, I have no issue ditching it for the time being, as I’m running cables for all our monitoring workflow. The only thing that really concerns me is actually being able to send Server’s output to OBS for online streaming. I’d like to know how to achieve such thing without NDI.

On the other hand, do you think this should be reported to the developers for a fix? I don’t know how much of a concern/priority is NDI support, in this case it should be a support update, or so I think.

I thank you all in advance for your help.


#14

Thanks for the thorough testing on this.

That log file is interesting, as it shows an issue with the decklink output, but nothing else that looks interesting.
Do you know what happened at 8am that day? I think I have seen this issue before but havent been able to figure out why. The decklink and caspar suddenly disagree on what the next frame pts is (by 22minutes in your case!) and caspar races to catch up on the 22mins of frames. It needs a better solution.

To confirm, the same thing happened in every version of caspar you tried?
There is a new NDI implementation coming for 2.3 (not merged yet) for both in and out. Hopefully that will solve this, but until then, I’m not sure what else to suggest

which yield similar results: it crashes but doesn’t notify at all.

Do you mean that when this crashes, it stays running but does absolutely nothing?


#15

You’re welcome. I like this kind of stuff and I actually have a lot of fun trying to triage this kind of issues. And thank you for keeping an eye on this topic. It’s really encouraging to know that someone is chasing a bug so seemingly fringe.

I’m puzzled too, because it’s the first time I’ve had these warnings on my logs. The only time it happened was when I entered the machine through TeamViewer yesterday, but it showed no issue on the output. At least that I could observe thoroughly. This log is in 2.1.6 NRK. On previous versions of CasparCG it doesn’t happen.

I’ll keep an eye on this issue and report any results. Maybe in a new thread.

Yes, it happened in every single version I tried. The only thing that varies is how long it takes to crash. It ranges from 20 minutes to 18 hours, sometimes even further, but it’s quite inconsistent.

That will be interesting to try, whenever it gets released. I’ll keep an eye on it.

Exactly. The output freezes at the time of the crash, no sound output, the server doesn’t respond to commands through console, but it doesn’t close itself or raise any exception, so my playout client (RedCast On Time, in my case) still behaves as if the server is still active. Only when I click the Screen consumer it reacts and flags the program as not responding. Whenever this happens I have to kill the process using the task manager.


#16

An interesting thing I forgot to mention in the last crash is that Windows generated events for the crash, which brings along a couple of processes as well. I don’t know if it is actually related to these processes at all or not, but as I said, this doesn’t happen if iVGA consumer is inactive.

The file can be checked using Windows Event Viewer.


#17

I’m puzzled too, because it’s the first time I’ve had these warnings on my logs.

That logging is some extra stuff we added to try and narrow down the problem it is reporting. It hasnt happened for us since adding it though. Normally, to see similar info you would need debug or maybe trace logging enabled. On the output you would have seen clips playing at 2-3 times speed, but as you were just running html it probably wouldnt have been noticable outside of perhaps a small amount of stuttering.

Yes, it happened in every single version I tried. The only thing that varies is how long it takes to crash. It ranges from 20 minutes to 18 hours, sometimes even further, but it’s quite inconsistent.

That is very interesting to hear. Was it more common on any version? I would really like to get to the bottom of this, but reproducing has always been a problem.

Exactly. The output freezes at the time of the crash, no sound output, the server doesn’t respond to commands through console, but it doesn’t close itself or raise any exception

This is exactly what we have been seeing. Out short term solution has been to restart it nightly to try and avoid this, and we have a launcher which sends a PING command over AMCP on an interval to detect this state and forcefully restart it.
Something to note is that we use Decklinks, not NDI so I don’t think this is an NDI specific issue, but perhaps is more likely with it. It gives me something else to try for reproducing


#18

Well, I’m already trying that and leave it the whole night to know whats going on. I think it will leave a gargantuan log to sift through, but it might help to shed some light on this issue.

As far as I’ve tested, this bug has no preference whatsoever regarding Server versions.

Hmmm, I didn’t know that about the launcher, I’ll check that out. Just in case, can the launcher be used to launch other tools/executables alongside CasparCG? It would be nice if it could, mostly to simplify the process of launching Server, wait a little and then launch RedCast.

If I see any issue regarding the Decklink, I’ll be absolutely reporting it. :wink:


#19

There is test build that might be interesting for you to debug this with: https://github.com/ianshade/server/releases/tag/v2.3.0-ndi-v1.1

If you’re already running cables, OBS will accept SDI inputs from decklink cards as well.

It can launch other processes but not with a timed delay.


#20

All right, I’ll check this out overnight and see what happens.

I was thinking of this, but I have no devices for an output loop, yet I have 2 unpopulated SDI ports in my Decklink. Maybe route channel 1 to channel 2 on Server, then output channel 2 through Decklink and pick that on the remaining SDI and input that into OBS.

I’ll test if it works. Anyway, nothing than a script with a sleep can’t fix. :wink: