Introduced in the paper "Roboflow 100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models", RF100-VL is a large-scale collection of 100 multi-modal datasets with diverse concepts ...
Abstract: Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and ...
Abstract: Cross-modality can integrate complementary information from different modalities to improve the reliability and robustness of object detection effectively. However, compared to processing ...