Back to Field Notes

Computer vision models learn from the labels you give them. The shape of that label — not just the class — determines how well the model learns to distinguish an object from its background, its neighbours, and its variations across lighting, growth stage, and season. Bounding boxes are fast to draw. Semantic masks are pixel-perfect. Polygon annotation sits between the two, and for most agricultural and industrial computer vision applications it is the right choice.

This guide explains what polygon annotation is, how it compares to other annotation types, and when it should be your first call.

What is polygon annotation?

Polygon annotation is a technique for labelling objects in images by drawing a closed multi-point outline that follows the actual contours of the object. Each polygon is defined by a set of vertices connected by straight-line segments. The resulting shape closely matches the visible boundary of the object being labelled — it is neither a forced rectangle nor a full pixel-by-pixel mask, but a flexible closed contour fitted to the object's real outline.

A polygon can have as few as 3 vertices or as many as the task demands — 150 or more for a complex leaf with a serrated margin. It can be convex or concave. It can follow the silhouette of a leaf, a stem, an annotated crop row, a diseased plant, or any object whose shape matters to what the model needs to learn.

Core goal: Polygon annotation is about contour fidelity — the labelled region should include what the model needs to learn and exclude what it does not. Every extraneous pixel included, and every important pixel excluded, changes what the model trains on.

How polygon annotation compares to other types

There are four annotation types used routinely in computer vision projects. Each has the right context.

Bounding box
A rectangle drawn around the object. Fast to draw and easy to validate. The standard for detection tasks and large-volume annotation at scale. The limitation: for non-rectangular objects, a bounding box includes significant background — soil, neighbouring plants, sky — that the model must learn to ignore. In dense imagery, adjacent bounding boxes overlap heavily, which creates ambiguity.
Polygon
A closed multi-point outline that follows the object's actual contour. More precise than a bounding box without the cost of per-pixel labelling. The right choice for irregular shapes, overlapping instances, and segmentation tasks. Used natively by instance segmentation architectures (Mask R-CNN, YOLO-seg, SAM). Requires annotators who understand the domain — edge-case decisions require judgment, not just instruction-following.
Semantic mask
Every pixel in the image is labelled with a class. The most precise annotation type, and the most expensive to produce — it requires specialised tools and significantly more annotator time per image. Appropriate when sub-pixel boundary accuracy is required, such as in medical imaging or high-resolution materials inspection. For most agricultural tasks, polygon annotation achieves equivalent model accuracy at a fraction of the cost.
Keypoint
Named points placed at specific anatomical or structural locations on the object — joints, landmarks, corners. Used for pose estimation, skeletal tracking, and geometric measurement tasks. Not a substitute for polygon annotation: keypoints describe structure, not boundary.

When to use polygon annotation

The decision comes down to three questions: what shape is the object, what does the model need to learn from its boundary, and what precision does the downstream inference task require?

Objects are not rectangular. Potato leaves, tulip stems, plant canopies, and most naturally occurring objects are irregular in shape. A bounding box around a potato plant would include soil, adjacent plants, and sky. The model then has to work out — from labelled versus unlabelled examples — what matters. A polygon removes that ambiguity by giving the model a clean boundary to learn from.

Precise boundaries matter for inference. If your model needs to estimate disease coverage — what percentage of a leaf surface shows discolouration — you need a polygon that follows the leaf margin precisely. A bounding box conflates sick leaf with surrounding healthy tissue. The model learns the wrong thing.

You are training a segmentation model. Instance segmentation architectures expect polygon-defined instances, or binary masks derived from them. Providing only bounding boxes and expecting segmentation outputs is not possible. If your model needs to output masks, your training data needs polygon annotation.

Multiple objects overlap. In dense canopy imagery — where potato plants grow so close their leaves overlap — individual polygon instances allow the model to distinguish between adjacent objects. Overlapping bounding boxes create regions labelled by two classes simultaneously, which confuses training. Overlapping polygons each define a single instance boundary, which is what segmentation models are designed to train on.

Object density varies across the frame. In wide-angle field shots, some image areas are dense and some sparse. Bounding boxes add proportionally more irrelevant background in sparse areas. Polygons adapt to each instance individually.

Polygon annotation in agricultural computer vision

Agriculture is where polygon annotation is most clearly justified. Plant shapes, growth stages, and disease presentations make rectangular annotation genuinely inadequate for training accurate models.

Three scenarios our annotators encounter regularly illustrate why:

Disease annotation. A single infected potato plant in a row of otherwise healthy ones. The diseased plant may show curled, narrowed leaves from PVY rolling, or yellowing with irregular margins from PLRV. The polygon needs to follow each visible leaf. A bounding box would include portions of the healthy plants on either side, training the model to associate those healthy tissues with the disease class.

Growth stage annotation. A plant at peak canopy density versus one at early emergence. The canopy shapes are completely different — broad and spreading at BBCH 40–69, sparse and upright at BBCH 10–19. Polygon annotation captures this variation. Bounding boxes make both look like the same rectangular region of the image, stripped of shape information the model could use.

Multi-class annotation. Annotating leaf, stem, and flower as separate classes on the same plant. Polygons allow each structure to be labelled as its own instance without ambiguity, even when structures overlap. Bounding boxes cannot represent this without severe overlap between class regions.

How we do it: At H2L Robotics India, every diseased plant identified by the POTECTOR300 is annotated with a closed polygon following its visible leaf boundary. The POTECTOR300's detection model is trained exclusively on polygon-labelled instances — which is why it can distinguish a diseased plant from healthy neighbours in dense, overlapping Dutch potato canopies at 3 km/h.

What makes polygon annotation high quality?

Not all polygon annotation is equal. The difference between a training dataset that produces an accurate model and one that doesn't often comes down to how rigorously annotation quality is defined and enforced.

Consistent vertex density. Too few vertices and the polygon becomes an approximation that misses critical contours — particularly on irregular leaf margins with disease lesions. Too many vertices make annotation slow to produce and harder to validate without adding useful information. For most agricultural leaf annotation, 15–35 vertices is the right range. The SOP should define this and annotators should be calibrated to it.

Clean boundary decisions. When annotating a diseased leaf, the polygon follows the visible leaf margin — not the shadow cast by the leaf, not the region of discolouration alone. The class label captures what is sick; the polygon captures what is leaf. These are different decisions and they should be defined explicitly in the annotation spec.

Consistent occlusion handling. When part of a plant is hidden behind another plant, annotators need a single consistent rule: annotate only the visible region, or annotate the estimated full outline. Either is defensible. Inconsistency is not. The rule should be stated in the SOP and applied uniformly across the dataset.

No gaps, minimal overlaps between adjacent instances. In dense scenes with multiple labelled objects, polygon edges between adjacent instances should meet cleanly. Gaps leave unlabelled pixels that the model must infer; heavy overlaps introduce conflicting class signals at the boundary. Both degrade segmentation accuracy.

Senior Annotator review on every image. Polygon annotation errors are harder to catch with automated geometry checks than bounding box errors are. Inter-annotator agreement metrics help, but they do not replace a domain expert reviewing each image for boundary accuracy before the batch is committed to the training set.

Choosing a polygon annotation service

When evaluating annotation vendors for polygon work specifically, ask three things beyond the usual quality and pricing questions.

First, do the annotators understand the domain they are labelling? Polygon annotation in agricultural imagery requires judgment — knowing what a healthy leaf margin looks like in order to correctly follow a diseased one. Generic crowdwork platforms cannot provide this. Annotators trained on diverse unrelated datasets will follow instructions but cannot exercise domain-specific judgment at the boundary decision level.

Second, what is the QA process for polygon-specific errors? Common polygon errors include vertex snapping to the wrong structure, polygon self-intersections, inconsistent vertex density across annotators, and gap-and-overlap issues at instance boundaries. Ask whether the vendor's QA process checks for these specifically, or whether they only review class-level errors.

Third, can they demonstrate consistent annotation across your specific object class? Ask for a small paid pilot on a set of representative images from your dataset before committing to volume. Review the output for vertex density consistency, boundary decision consistency, and inter-annotator agreement. The pilot will tell you more than any SLA or quality claim.

What we offer: H2L Robotics India provides polygon annotation for agricultural computer vision projects — specifically for teams working with field imagery of crops, disease detection, or growth stage classification. All annotation is done in Label Studio with COCO JSON export. Every image is reviewed by a Senior Annotator trained on agricultural imagery before delivery.

If you are building a model for agricultural robotics, disease detection, or crop monitoring and need polygon-labelled training data you can rely on, get in touch via our contact page.

Further reading

  1. Lin, T.Y. et al. Microsoft COCO: Common Objects in Context. ECCV 2014. Defines the COCO annotation format and instance segmentation benchmark using polygon annotations.
  2. He, K. et al. Mask R-CNN. ICCV 2017. The foundational instance segmentation architecture trained on polygon-derived masks. arxiv.org/abs/1703.06870
  3. Kirillov, A. et al. Segment Anything. Meta AI Research, 2023. arxiv.org/abs/2304.02643