T-Rex Label

Cross-Modal Agent

Cross-modal agents integrate text, image, and video inputs to perform tasks like visual question answering and multi-modal content generation. They improve accuracy in complex scenarios.