CVAT Review
CVAT is the go-to open-source choice for computer vision teams who want control and flexibility, though you'll need technical resources to self-host or accept the limitations of the free cloud tier.
What is CVAT?
CVAT (Computer Vision Annotation Tool) is the most widely adopted open-source annotation platform for computer vision. Originally developed by Intel and now backed by the OpenCV Foundation, it has over 14,000 GitHub stars and a 4.8/5 rating on G2. The platform supports images, video, and 3D point clouds with annotation types including bounding boxes, polygons, polylines, points, skeletons, and cuboids.
What sets CVAT apart is the combination of open-source flexibility with professional cloud options. You can self-host for free with full functionality, or use their cloud service starting at $23/month. The platform exports to 19 formats (YOLO, COCO, PASCAL VOC, KITTI, and more) which avoids lock-in to any specific ML framework. AI-assisted annotation with SAM 2 integration and auto-annotation can speed up labeling significantly. The tradeoff is a steeper learning curve than simpler tools, and performance can suffer with very large files. For computer vision teams who want control over their annotation infrastructure without paying enterprise prices, CVAT is the default choice.
Key Features
- ✓ Open-source with 14K+ GitHub stars and active community
- ✓ Auto-annotation with AI models (up to 10x faster labeling)
- ✓ SAM 2/3 segmentation integration
- ✓ 19 export formats: PASCAL VOC, YOLO, COCO, KITTI, and more
- ✓ 3D point cloud and cuboid annotation
- ✓ Cloud storage integration: AWS S3, Google Cloud, Azure Blob
- ✓ Three deployment options: cloud, self-hosted, or enterprise on-premise
- ✓ Backed by the OpenCV Foundation
Pros & Cons
Pros
- + Truly open-source: self-host for free with full functionality
- + 4.8/5 G2 rating — highest-rated open-source annotation tool
- + Extensive export format support (19 formats) avoids vendor lock-in
- + 3D/LiDAR annotation included, unlike many competitors
- + Active community and continuous development (14K+ GitHub stars)
- + Cloud plans more affordable than enterprise competitors ($23/mo starting)
Cons
- − Performance degrades with very large video files or thousands of images
- − Steeper learning curve — complex interface for beginners
- − Self-hosted deployment requires technical expertise to maintain
- − Free cloud tier is very limited: 1 project, 3 tasks, 1GB storage
- − No in-app notifications when annotators complete tasks
- − Computer vision only — no text, audio, or document annotation
Pricing
Pricing model: Open-source|Freemium
Who Is CVAT Best For?
CVAT is ideal for computer vision teams who value open-source flexibility and want to avoid vendor lock-in. The self-hosted option is perfect for organizations with security requirements that prevent cloud storage of training data, or teams with engineering resources who want full control. The cloud plans work well for smaller teams who want managed infrastructure without enterprise pricing. CVAT is less suited for teams needing multi-modal annotation (text, audio, documents) — use Label Studio or Labelbox instead. It's also not ideal for non-technical teams without engineering support to handle self-hosted deployment, or organizations needing extensive managed services and dedicated support.
Frequently Asked Questions
Is CVAT free?
What data types does CVAT support?
How does CVAT compare to Label Studio?
Is CVAT good for large datasets?
What export formats does CVAT support?
Does CVAT have AI-assisted labeling?
Who maintains CVAT?
Alternatives to CVAT
Enterprise teams needing high-volume, multi-language labeling with managed workforce
Computer vision teams wanting fast AI-assisted annotation with training and deployment built in
Frontier AI labs and enterprises needing LLM training data, RLHF, or autonomous vehicle annotation at scale
Teams needing AI-assisted annotation for images, video, or medical imaging with compliance requirements
Teams needing multimodal annotation with a strong free tier and path to enterprise scale
Small teams wanting AI-assisted annotation with transparent pricing and no minimum commitment
Enterprise teams building production AI pipelines who need annotation, model training, and deployment in one platform
Teams needing multi-modal annotation flexibility who can invest time in template configuration
Computer vision teams needing specialized support for medical imaging, LiDAR, or 3D data with built-in AI models
Autonomous vehicle and robotics teams needing LiDAR annotation with integrated multi-sensor calibration
Enterprise teams deploying production LLMs who need human-in-the-loop evaluation and hallucination detection
AWS-native ML teams who want a managed labeling service integrated with SageMaker training pipelines
Computer vision teams who want AI-assisted annotation combined with optional managed workforce services
Enterprise teams needing multimodal annotation with strong compliance, custom workflows, and optional managed labeling services
Enterprise teams building physical AI (robotics, autonomous vehicles) or medical AI who need multimodal annotation with 3D/LiDAR and DICOM support
Large enterprises with dedicated AI teams who want to replace manual labeling with programmatic weak supervision for text and structured data