ATTICA: A Dataset for Arabic Text-Based Traffic Panels Detection

Boujemaa, Kaoutar Sefrioui; Akallouch, Mohammed; Berrada, Ismail; Fardousse, Khalid; Bouhoute, Afaf · 2021 · DOAJ

DOI: 10.1109/ACCESS.2021.3092821

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper introduces ATTICA, a new open-source dataset designed to address the scarcity of high-quality data for detecting traffic panels and recognizing Arabic text in driving assistance systems. While existing research heavily favors Latin scripts and standard traffic signs, Arabic text-based guide panels remain underrepresented due to challenges in global standardization and the complexity of cursive scripts. The authors aim to fill this gap by providing a comprehensive resource for multi-task detection and recognition, specifically targeting Advanced Driving Assistance Systems (ADAS) and road safety applications. The ATTICA dataset comprises 1,215 images collected from diverse roadways across North Africa and the Gulf region, capturing various weather conditions, times of day, and roadway types. It is divided into two sub-datasets: ATTICA_Sign, which contains 4,607 bounding box annotations for five object categories (traffic panels, traffic signs, other signs, km-points, and add-panels); and ATTICA_Text, which includes 7,293 Arabic text boxes annotated at both line and word levels. The annotation process involved manual labeling using the Labelme tool, addressing challenges such as low resolution, complex backgrounds, and class imbalance. Notably, the dataset includes unique categories like "unreadable lines" to assist in maintenance detection and semantic analysis. To validate the dataset’s utility, the authors conducted experiments using state-of-the-art deep learning models. For object detection, they evaluated four architectures: Faster R-CNN, SSD, R-FCN, and RetinaNet. For Arabic text line detection, they employed the Connectionist Text Proposal Network (CTPN) and the Efficient and Accurate Scene Text (EAST) model. The experimental setup utilized stratified sampling to split data into training and testing sets, with models implemented on TensorFlow using pre-trained backbones. The results demonstrated promising performance across these architectures, confirming the dataset's effectiveness for training robust detectors in challenging real-world environments. The significance of this work lies in providing the first internet-sourced dataset covering multiple Arab countries, enabling the development of standardized detectors for Arabic regions. By addressing the lack of Arabic script data and the neglect of guide panels in previous studies, ATTICA facilitates improved ADAS capabilities, such as automatic visual inspection and driver assistance. The inclusion of diverse visual conditions and specific text annotations supports the creation of more resilient computer vision models, contributing to enhanced road safety and automated maintenance systems in Arabic-speaking contexts.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	DOAJ	—	—	1	2026-06-19
archive	success	unpaywall	—	—	1	2026-06-25
extract	success	cached	—	—	2	2026-06-26
clean	success	clean	—	—	1	2026-06-19
chunk	success	chunk	—	—	1	2026-06-19
embed	success	embed	Qwen/Qwen3-Embedding-8B	—	1	2026-06-19
promote	success	—	—	—	1	2026-06-19
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	1	2026-06-26
tag	success	vector_similarity	—	—	6	2026-06-19
verify	success	—	—	—	1	2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

sign visibility legibility