Seeking a specific frame when using Long-GOP codecs is a very complex process. In answer to a question posted in May 2021 hreinnbeck suggests that when seeking to a frame is used the file should use I frame encoding only.
Obviously MPEG encoders can create I Frame only coding, for example when used in IMX codecs. You report that CasparCG and VLC can play your file from the start, implying the bitstream syntax is probably ok. So several questions:
Is your file using long GOP, if so is frame 301 an I frame?
What frame rate operations are you using? This impacts on audio-video presentation in the number of audio samples per picture (5 frame cyclic variance in 29.97 Hz based operations).
Is the file muxed as a program stream or a transport stream?
Which audio codec is in use?
You can obtain information about the streams using the ffprobe.exe command line tool. A copy of this tool is available in your CasparCG server folder. A summary of file properties is reported by a command line of the format below.
<path_to_CasparCG_Folder>\ffprobe <path_to_video_file>\myMpegFile.mpg -hide_banner
If your file probe says the audio is encoded using mpeg-1 layer 2 codec (also called mp2 codec) the error you see in attempting a seek is likely to be an issue related to the mechanism used to find the access point into the audio stream.
MPEG uses the concept of a presentation unit for it’s operations. In the video stream the presentation unit is 1 picture, but the audio presentation unit varies. In MPEG-1 layer 1 the presentation unit is 8ms, whereas MPEG-1 layer 2 uses a presentation unit that is 24ms duration.
The data reduced information with the header descriptor passes to the packetisation processes, along with the clock synchronisation (PCR or SCR) and the relative output timing for the presentation unit (PTS). Program stream multiplexing uses a single layer packetising operation, transport streams use a two-layer packetisation. DVD tends to use single layer multiplexing.
The data-reduced information is encapsulated into a Packetised Elementary Stream (PES) that includes the PTS timestamp. The PES packet may be split into multiple fixed size packets for program stream multiplexing requirements (for example to map a PES packet into a physical sector of a DVD disk).
In transport streams the large audio PES packet is mapped into 188-byte transport packets (4 header bytes, 184 payload bytes). The rule is that a PES packet must be mapped into an integer number of transport packets, with transport packet padding inserted as required to fill the packets. Depending on the audio data rate used this wrapping can produce very poor efficiency because large amounts of padding get inserted relative to the user payload.
To improve bitstream efficiency some multiplexers combine multiple audio presentation units into a PES packet resulting in a much lower percentage of stuffing required. Many broadcast services I have examined in detail have 5 audio presentation units per PES - this is 120ms of audio content.
When decoding starts the processing has to regenerate the system clock, and decide which stream it will initially start (video or audio). Once that stream is outputting sample data the decoder looks for a random access point into stream 2 - typically via the Payload Unit Start Indicator in the transport stream header indicating that transport packet contains the start of a PES packet.
In general, consumer based decoders in digital TVs or adapter boxes tend to output the audio first, later showing a picture when an I frame occurs. Professional decoders tend to start up the picture stream first (as this normally has the system clock stream in same transport packets as the video) then looking to output the synchronised audio once video decoding starts. Note neither decoder is seeking a specific frame number, they are just doing random access into the stream.
I would not be surprised that ffmpeg issues an error message if it does not see the two payload start markers within a small time frame.