No, this were two suggestions: Use embedded audio, that comes out of the Monitor card OR do the delay in the audio mixer. Do whatever is easier.
If you play the input, including embedded audio, on the 12G and then route that picture, including embedded audio, to the monitor card, the audio, that is embedded in the monitor cards SDI signal should still be in sync.
What version of Caspar do you use?
Can you key DIAG into the CasparCG console window and post a screenshot of the diag window?