A new machine-learning model learns to pinpoint exactly where a particular sound occurs in a video clip without the need for human intervention. The model could have applications in areas like journalism and film production or education and training.
AI learns how vision and sound are connected, without human intervention
