My solution to get synced audio when playing videos is to embed to sdi on a decklink duo 2 or similar and deembed to analog.
I think because casparcg really isnāt a video playout engine in the first place it prioritizes getting frames out as quick as possible. System audio caspar is dependent on the host system giving cpu time. Video is dependent on gpu-time. This is my layman understanding why you are seeing this behaviour.
I think you may also need to look at using a (currently) Beta release of the server. Audio outputs in non-integer frame rates such as 29.97 have been troublesome on several releases.
There is a recent thread about this in discussion 7408. The last two posts are most significant.