← All articles April 8, 2026 10 min read

By Pixotter Team

Image Annotation Tool Comparison: Best Options for ML, QA, and Docs

Q: What annotation format should I use for YOLO training?

YOLO TXT format (one .txt file per image, normalized xywh coordinates per line). All major annotation tools export it: CVAT, LabelImg, Roboflow, V7. If you are using Ultralytics YOLOv8, Roboflow's direct YOLOv8 export is the most reliable — it handles the data.yaml class configuration file correctly.

The term "image annotation tool" covers two very different things, and picking the wrong category wastes real time.

ML annotation tools (Labelbox, CVAT, LabelImg, Roboflow, V7, Supervisely) produce structured labels — bounding boxes, segmentation masks, keypoints, polygons — that train computer vision models. They export COCO JSON, YOLO TXT, Pascal VOC XML, and similar formats that model training pipelines expect.

General annotation tools (Skitch, Markup Hero, Annotely) are for humans to communicate visually — drawing arrows, adding text callouts, highlighting UI bugs in screenshots. They export PNG or PDF. Feeding their output into a training pipeline would break things.

Pick the wrong one and you're either dealing with a feature-heavy platform that can't just draw a red arrow on a screenshot, or a screenshot tool that can't produce the bounding box JSON your PyTorch pipeline needs.

This guide covers both categories, with specific recommendations for each use case.

ML Image Annotation Tools

These tools produce machine-readable labels for training computer vision models. If your workflow includes YOLO, detectron2, Hugging Face datasets, or any model that needs structured ground truth data, you need one of these.

Labelbox

License: Proprietary (SaaS) | Pricing: Free tier (5,000 data rows), paid from ~$800/mo (enterprise custom)

Labelbox is the enterprise standard for large ML annotation projects. It handles bounding boxes, polygons, semantic segmentation, keypoints, and video frame annotation. The standout feature is its consensus and review workflow — multiple annotators label the same image, Labelbox calculates inter-annotator agreement, and you can route low-confidence annotations for additional review. This matters a lot when you're paying external annotation contractors and need to audit quality systematically.

Best for: Teams running annotation at scale with external contractors, compliance requirements, or complex multi-class taxonomies.

Exports: COCO JSON, NDJSON (Labelbox native), CSV. Integrations with AWS, GCS, and Azure for data storage.

Weakness: The free tier is restrictive for anything beyond prototyping. Small teams with modest budgets hit the ceiling quickly.

CVAT (Computer Vision Annotation Tool)

License: MIT | Pricing: Free (self-hosted), cloud version from $0 (free tier available)

CVAT is the open-source option most serious teams reach for. Intel originally developed it; it's now maintained by CVAT.ai as an open-core project. The self-hosted version is MIT-licensed with no seat limits or data row caps — you pay only for the compute running it. Docker deployment takes under 30 minutes.

Annotation types: bounding boxes, polygons, polylines, points, ellipses, cuboids (for 3D data), and tracks for video annotation. It has a semi-automatic annotation feature using pre-trained models — you run a detector, review the results, and correct mistakes instead of labeling from scratch. On dense datasets this can cut annotation time by 40-60%.

Best for: Teams that need self-hosted annotation with no per-seat cost, or anyone who wants full control over their data pipeline.

Exports: COCO JSON, PASCAL VOC, YOLO TXT, TFRecord, CVAT XML, MOT CSV, and more.

Weakness: The UI is functional, not polished. Onboarding new annotators requires training.

LabelImg

License: MIT | Pricing: Free (open-source, local desktop app)

LabelImg is a lightweight Python desktop app for bounding box annotation. It has been around since 2015, handles PASCAL VOC XML and YOLO TXT natively, and runs locally with zero server infrastructure. Install it with pip install labelImg==1.8.6 and launch it from the terminal.

pip install labelImg==1.8.6
labelImg

It does one thing — rectangular bounding boxes — and does it well. No segmentation, no polygon support, no video annotation. For simple object detection datasets where you need bounding boxes and want zero infrastructure overhead, it is still the fastest path.

Best for: Individual researchers and developers annotating local datasets for object detection tasks.

Weakness: No collaboration, no review workflow, limited to bounding boxes. Development is slow — the main repo has not seen significant feature updates since 2021.

Roboflow

License: Proprietary (SaaS) | Pricing: Free tier (1,000 images, 3 workspaces), paid from $249/mo

Roboflow occupies a useful position: it handles annotation, but also dataset versioning, augmentation pipelines, model training integrations, and one-click export to 30+ formats. For teams building and iterating on computer vision models — not just annotating — the end-to-end workflow is genuinely useful. You annotate, generate augmented versions (flip, crop, brightness), export to YOLO, and train, all without leaving the platform.

The annotation UI supports bounding boxes, polygons, and instance segmentation. Smart polygon mode uses an embedded SAM-style model to suggest polygon boundaries from a rough selection.

Best for: Computer vision teams who want annotation + dataset management + augmentation in one place, especially those training YOLO models.

Exports: YOLOv5/v8, COCO JSON, Pascal VOC, TFRecord, CreateML, and ~25 more.

Weakness: The free tier's 1,000-image cap runs out fast. Pricing jumps steeply.

V7 (formerly Darwin)

License: Proprietary (SaaS) | Pricing: Free tier (2,000 items), paid from $300/mo

V7 competes with Labelbox at the enterprise tier but with a stronger focus on automated annotation. Its standout feature is Auto-Annotate — you give it a few labeled examples, it labels the rest of the dataset using a fine-tuned model, and you review the results. On datasets with repetitive content (a product catalog with similar backgrounds, a medical imaging dataset with consistent anatomy), this can reduce manual annotation effort by 70%+.

It also has a proper version control model for datasets, which Labelbox and Roboflow handle less elegantly.

Best for: Teams with large datasets and at least a few hundred existing labels to bootstrap the auto-annotation model.

Exports: COCO JSON, YOLO, Darwin JSON (native), Pascal VOC, CSV.

Weakness: Pricing is enterprise-oriented. The auto-annotation quality depends heavily on how representative your seed labels are.

Supervisely

License: Proprietary (SaaS, community edition available) | Pricing: Community edition free (limited), paid from $660/mo

Supervisely is the most feature-complete option on this list. Beyond standard annotation types, it supports 3D point cloud annotation (LiDAR data), video object tracking, DICOM medical imaging, and satellite imagery. It also has a marketplace of pre-built neural networks you can run directly inside the platform.

For most teams, this is too much. The UI is complex, and the paid pricing is high. But if your use case involves medical imaging, autonomous driving datasets, or satellite imagery, Supervisely has capabilities the others lack.

Best for: Teams with specialized annotation needs — 3D point clouds, medical imaging (DICOM), or satellite data.

Exports: COCO JSON, YOLO, Pascal VOC, Cityscapes, and Supervisely native format.

ML Annotation Tools Comparison Table

Tool	License	Annotation Types	Export Formats	Pricing
Labelbox	Proprietary	BBox, polygon, segmentation, keypoints, video	COCO JSON, NDJSON, CSV	Free (5K rows); enterprise from ~$800/mo
CVAT	MIT (self-hosted)	BBox, polygon, polylines, points, cuboids, video tracks	COCO, PASCAL VOC, YOLO, TFRecord, CVAT XML	Free (self-hosted); cloud free tier available
LabelImg	MIT	Bounding boxes only	PASCAL VOC XML, YOLO TXT	Free
Roboflow	Proprietary	BBox, polygon, instance segmentation	YOLO, COCO, Pascal VOC, TFRecord, 30+ more	Free (1K images); paid from $249/mo
V7 (Darwin)	Proprietary	BBox, polygon, segmentation, keypoints, video	COCO, YOLO, Darwin JSON, Pascal VOC, CSV	Free (2K items); paid from $300/mo
Supervisely	Proprietary (community ed.)	BBox, polygon, 3D point cloud, video, DICOM	COCO, YOLO, Pascal VOC, Cityscapes	Community free (limited); paid from $660/mo

Try it yourself

Reduce file size without visible quality loss — free, instant, no signup. Your images never leave your browser.

Compress Images →

Preparing Images Before ML Annotation

Training datasets have strict requirements. ResNet models expect 224×224 inputs. YOLO variants typically train on 640×640. If your source images are 4000×3000 photos from a DSLR and your annotation tool resizes them on display but exports labels at original resolution, your coordinate offsets can go wrong at training time.

Resize training images to your target input resolution before annotating. This avoids coordinate scaling bugs and makes annotation faster — annotators work on smaller images with less panning.

Resize images before annotation

Standardize training image dimensions or resize screenshots — free, no upload, runs in your browser.

Resize Image →

If your source images are in formats your annotation tool doesn't accept well (HEIC from iPhones, TIFF from scientific instruments), convert them to JPEG or PNG first. CVAT and Roboflow handle JPEG and PNG most reliably; exotic formats can cause silent rendering issues that corrupt exported coordinates.

General Annotation Tools (Screenshots, Documentation, Bug Reports)

These tools are for humans to talk to other humans through images — adding callouts to screenshots, marking up UI bugs, annotating design mockups. They are not for producing ML training data.

Skitch

License: Proprietary (freeware) | Pricing: Free | Platform: macOS, iOS

Skitch is Evernote's screenshot markup tool. It has been free for years and is genuinely fast for its primary use case: take a screenshot on macOS, add arrows and text callouts, export. The annotation layer does not produce structured data — it burns arrows and text directly into the image.

Best for: Quick one-off screenshot annotations on macOS. Nothing faster for adding a red arrow and a label.

Weakness: Mac-only, no browser version, no collaboration, no annotation history.

Markup Hero

License: Proprietary (SaaS) | Pricing: Free tier (limited history), paid from $4/mo

Markup Hero runs in the browser and keeps an annotation history. You paste a URL, upload an image, or paste from clipboard — it loads the image, you draw on it, and it generates a shareable link. The persistent history and shareable links make it useful for bug reports in project management tools.

Best for: Teams doing async bug reporting or design feedback who need shareable annotated screenshots with revision history.

Weakness: The free tier limits how many annotations you can save. Not useful for large-scale annotation workflows.

Annotely

License: Proprietary (SaaS) | Pricing: Free tier, paid from $8/mo

Annotely is similar to Markup Hero but with a stronger focus on embedded annotations for documentation. It generates an embeddable widget showing your annotated image with numbered callouts that expand on hover — useful for product documentation and tutorials where you want interactive callouts rather than a flat annotated PNG.

Best for: Product documentation teams who want interactive annotated images on a website or knowledge base.

Weakness: Overkill for simple bug reports.

Screenshot and General Annotation Tools Comparison Table

Tool	License	Platform	Collaboration	Export	Pricing
Skitch	Proprietary (freeware)	macOS, iOS	No	PNG, PDF	Free
Markup Hero	Proprietary	Web (all platforms)	Shareable links	PNG, PDF	Free (limited history); paid from $4/mo
Annotely	Proprietary	Web (all platforms)	Team workspaces	PNG, embeddable widget	Free tier; paid from $8/mo

Which Tool Should You Use?

Training a computer vision model?

Self-hosted, no budget: CVAT (MIT). Run it on a $20/mo VPS.
Small dataset, fast iteration, YOLO: Roboflow free tier.
Large team, external contractors, compliance: Labelbox.
Auto-annotation to reduce manual work: V7.
3D, medical, or satellite data: Supervisely.

Simple object detection, working solo? LabelImg. Install it with pip, annotate your images, export YOLO TXT. No servers, no accounts, no pricing conversations.

Annotating screenshots for a bug report? Skitch (macOS) or Markup Hero (cross-platform). If you need the annotation to live in documentation with interactive callouts, use Annotely.

Annotating design mockups for developer handoff? Figma has annotation plugins (Figma is proprietary, from $12/mo per editor) that are better integrated than any standalone tool for this use case.

Preparing Annotated Images for Use

After annotation, images sometimes need post-processing before they are useful downstream. A few common cases:

Documentation screenshots annotated in Skitch or Markup Hero often come out at 2× retina resolution — 2880×1800 from a MacBook Pro screenshot. For a blog post or Confluence page, that is a 3MB PNG that makes the page slow to load. Resize it to half the display dimensions and compress it before embedding.

For format decisions on annotated assets, the same rules apply as for any web image. If you want to understand the tradeoffs before choosing JPEG, PNG, or WebP for your annotated screenshots, the best image format for web guide covers the current state with test data.

Image resolution matters for both annotation quality and downstream use. If you are working with images of unknown origin, understanding image resolution before deciding on a resize target will save you from upscaling artifacts or over-compression.

FAQ

What is the difference between image annotation and image labeling? The terms are used interchangeably. "Annotation" is more common in computer vision research contexts; "labeling" appears more in business/product discussions. Both refer to adding metadata (labels, bounding boxes, segmentation masks) to images to produce training data.

Can I use CVAT for free commercially? Yes. CVAT's self-hosted version is MIT-licensed, which permits commercial use without restriction. The CVAT.ai cloud offering has separate commercial terms — the self-hosted code does not.

What annotation format should I use for YOLO training? YOLO TXT format (one .txt file per image, normalized xywh coordinates per line). All major annotation tools export it: CVAT, LabelImg, Roboflow, V7. If you are using Ultralytics YOLOv8, Roboflow's direct YOLOv8 export is the most reliable — it handles the data.yaml class configuration file correctly.

How many images do I need to annotate before training a model? There is no universal answer, but a practical starting point for object detection: 200-500 annotated instances per class, with at least 3 instances per image. With a pre-trained backbone (transfer learning from COCO), you can get a usable detector at 500 instances per class. Starting from scratch requires 5,000+.

What is the best free image annotation tool? For ML annotation: CVAT (self-hosted, MIT). For screenshot markup: Skitch (macOS) or Markup Hero (web).

Do image annotation tools work with video? CVAT, Labelbox, V7, and Supervisely all support video annotation with frame interpolation — you annotate keyframes and the tool interpolates object positions between them. LabelImg and Roboflow's free tier do not support video.

Can I export annotations from one tool and import them into another? Yes, via common formats. COCO JSON is the most portable — all major tools export it and most import it. If you need to migrate from Roboflow to CVAT, export COCO JSON from Roboflow and import into CVAT. Expect some minor field mapping discrepancies that require a conversion script.

What is inter-annotator agreement and why does it matter? When multiple annotators label the same image, they often disagree — one annotator draws a tighter bounding box, another includes more background. Inter-annotator agreement (measured with Cohen's Kappa or Intersection over Union) tells you whether your annotation guidelines are clear and whether annotator disagreements are introducing noise into your training data. Labelbox and V7 surface this metric automatically. For high-stakes applications (medical imaging, autonomous driving), monitor it.