Computer use agents create new privacy risks: training data collected from real websites inevitably contains sensitive information, and cloud-hosted inference exposes user screenshots. Detecting personally identifiable information in web screenshots is critical for privacy-preserving deployment, but no public benchmark exists for this task. We introduce WebPII, a fine-grained synthetic benchmark of 44,865 annotated e-commerce UI images designed with three key properties: extended PII taxonomy including transaction-level identifiers that enable reidentification, anticipatory detection for partially-filled forms where users are actively entering data, and scalable generation through VLM-based UI reproduction. Experiments validate that these design choices improve layout-invariant detection across diverse interfaces and generalization to held-out page types. We train WebRedact to demonstrate practical utility, more than doubling text-extraction baseline accuracy (0.753 vs 0.357 mAP@50) at real-time CPU latency (20ms). We release the dataset and model to support privacy-preserving computer use research.
The dataset comprises 44,865 images spanning 10 e-commerce websites and 19 page types.
Distribution across form-fill variants—enabling anticipatory detection training.
Median of 19 boxes per image (mean 22.1), ranging from 0 to 145 annotations per image.
Most annotations target rendered text where PII appears on confirmation and review pages.
Sample images from the WebPII dataset. Toggle between form states and annotation modes.
Visual detection substantially outperforms text-based approaches on held-out Amazon layouts.
| Method | mAP@50 | Latency | Real-time? |
|---|---|---|---|
| OCR + Presidio | 0.183 | 1,300ms | No |
| LayoutLMv3 + GPT-4o-mini | 0.357 | 2,900ms | No |
| WebRedact (ours) | 0.753 | 20ms | Yes (30 FPS) |
| WebRedact-large (ours) | 0.842 | 312ms | Near Real-time (3 FPS) |
WebRedact achieves 2.1× better accuracy (0.753 vs 0.357 mAP@50) with 145× faster inference than the best text-based method (LayoutLMv3 + GPT-4o-mini), enabling real-time privacy protection at 30 FPS. WebRedact-large further improves accuracy to 0.842 mAP@50 with near-real-time performance at 3 FPS.
@inproceedings{anonymous2026webpii,
title={WebPII: Benchmarking Visual PII Detection for Computer-Use Agents},
author={Anonymous Authors},
booktitle={International Conference on Machine Learning},
year={2026}
}