Scene text annotation is an annotation method marking text in natural scene images (e.g., street views, billboards), including text position, content and context association, used for scene text recognition model training, adapting to text extraction under complex backgrounds.