A Flash-Attention-optimized CNN–Transformer framework for real-time video restoration, recognised with a Best Paper Award.
FACTNet is a hybrid CNN–Transformer architecture designed for real-time video restoration. It couples the local feature extraction strengths of convolutional networks with the long-range modelling of Transformers, while a Flash-Attention-optimized attention mechanism keeps inference fast enough for real-time use.
Real-time restoration unlocks practical deployment in streaming, surveillance, and low-quality video enhancement where latency budgets are tight. FACTNet shows that attention-heavy restoration can be made efficient without sacrificing quality.