T-Rex Label

DINO-XSeek

DINO-XSeek is a referring object detection model built on a multimodal large language model, designed to precisely locate objects according to natural language descriptions input by users.

It can process complex instructions involving attributes, positions, interactions, and reasoning, seamlessly fusing language and visual information. With wide applicability in fields like smart homes, augmented reality, and robotics, DINO-XSeek effectively enhances the intelligence of human-machine interactions.