GPT-4o accepts an array of hundreds of images (enough to make up a whole video) as well as other input like an audio transcript. See results of audio vs visual vs Audio + visual video summaries.
Use GPT-4o to produce a multi-model summary…
GPT-4o accepts an array of hundreds of images (enough to make up a whole video) as well as other input like an audio transcript. See results of audio vs visual vs Audio + visual video summaries.