Video annotation is a method for video sequence data, combining time and spatial dimensions to annotate information such as target objects, actions, scene changes and event development in videos (such as annotating the process of "person walking" or the trajectory of "car turning" in videos), used for video analysis, action recognition and video retrieval model training.