Vision Transformers (ViT) Explained + Fine-tuning in Python
ViT论文逐段精读【论文精读】
Image Classification Using Vision Transformer | ViTs
EfficientViT Street Scene Segmentation Demo
Semantic Segmentation - Segformer trained on BDD100k drivable area
[CVPR 2023] EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention
A ViT: Adaptive Tokens for Efficient Vision Transformer | CVPR 2022
Transformers for Structural Extraction
[CVPR 2023] Neural Fourier Filter Bank
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Learned Queries for Efficient Local Attention | CVPR 2022
Lecture 20 - Efficient Transformers | MIT 6.S965
Semantic Segmentation on Cityscapes demo Video (Using CCNet)
Helm.ai Urban Driving Scene Semantic Segmentation
EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)
Semantic Segmentation | Cityscape Dataset | Overlade Results | Autonomous Driving
CVPR2023: Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Semantic Segmentation | Cityscape Dataset | CNN's output | Separate Channels |Autonomous Driving
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion | CVPR2023
BANMo: Building Animatable 3D Neural Models From Many Casual Videos | CVPR 2022