Pytorch Encoder/Decoder

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction (CVPR 2026)

VLM-3R is a unified Vision-Language Model (VLM) framework integrating 3D reconstructive instruction tuning for deep spatial understanding from monocular video. The rapid advancement of Large ...

CNX Software

AOMedia AV2 video codec draft specification release, and a quick try at the reference implementation

After 5 years of work and over 2700 commits against the reference software, the Alliance for Open Media (AOMedia) has recently released the AV2 specification. This next-generation open video codec ...

IEEE

Multi-Modal Sleep Stage Classification With Two-Stream Encoder-Decoder

Abstract: Sleep staging serves as a fundamental assessment for sleep quality measurement and sleep disorder diagnosis. Although current deep learning approaches have successfully integrated multimodal ...

IEEE

Accurate and Efficient Event-Based Semantic Segmentation Using Adaptive Spiking Encoder–Decoder Network

Abstract: Spiking neural networks (SNNs), known for their low-power, event-driven computation, and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous ...

GitHub

MapAnything: Universal Feed-Forward Metric

MapAnything is an open-source research framework for universal metric 3D reconstruction. At its core is a simple, end-to-end trained transformer model that directly regresses the factored metric 3D ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results