Automated Video Analysis: Analyzing Large Quantities of Transportation Research Data

Cobb, Lincoln · 2015 · ROSA P / United States. Federal Highway Administration

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the critical challenge of analyzing the massive dataset generated by the Strategic Highway Research Program 2 (SHRP 2) Naturalistic Driving Study (NDS). The NDS collected approximately 2 petabytes of data, primarily video, from nearly 3,000 volunteers over two years. While this data offers unprecedented insights into driver behavior and road design, the sheer volume creates a significant bottleneck; manual feature extraction would require nearly 600 technicians working for a full year. To overcome this, the Federal Highway Administration (FHWA) funded a project through its Exploratory Advanced Research Program, conducted by the National Robotics Engineering Center at Carnegie Mellon University (CMU), to develop automated tools for efficient data analysis. The research team implemented a machine-learning-based approach to automate feature extraction from the video data. This method utilizes learning algorithms trained on large datasets to identify important features and classify targets with high accuracy. A key innovation involves exploiting contextual cues to interpret ambiguous video data. Rather than relying on traditional graphical models, which can yield inaccurate predictions, the team developed a novel inference procedure that builds a sequence of context-dependent predictions (e.g., inferring the presence of wheels near car corners). This approach reduces the need for extensive expert guidance and manages computational effort more effectively. To handle the scale and complexity of the data, the researchers integrated several specific technical strategies. They developed powerful labeling tools to facilitate fast human analysis and employed anomaly detection to prioritize data access, focusing labeler efforts on significant events. Semi-supervised learning was used to build predictive models while minimizing the amount of labeled data required. Furthermore, to address the computational demands of petabyte-sized datasets, the team explored two scaling approaches: parallelization, which distributes work across multiple network cores, and "anytime predictors," a technique that provides a sequence of results that improve in accuracy as more processing time is allocated. The primary outcome of this project is a software framework designed to support the efficient application of machine-learning-based feature extraction for large transportation video datasets. This framework includes specific target detectors for various analysis tasks, capable of identifying both moving targets, such as cars and pedestrians, and static targets, such as traffic signs. The significance of this work lies in its potential to dramatically reduce the time and cost associated with analyzing safety-related video data. By making the SHRP 2 NDS data more accessible, the project expands the pool of researchers able to utilize these rich datasets, thereby advancing the understanding of transportation safety and providing a scalable model for processing future large-volume data.

Key finding

The project developed a machine-learning-based software framework that automates feature extraction from large transportation video datasets, significantly reducing the time and cost required for analysis.

Methodology

modeling

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (7 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	rosap	—	—	2	2026-05-23
archive	success	—	—	—	1	2026-05-23
extract	success	cached	—	—	3	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	—	—	—	1	2026-05-23
promote	success	—	—	—	1	2026-05-23
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	4	2026-06-10
tag	success	vector_similarity	—	—	19	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

naturalistic crash near crash

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Methodological Resource: dataset resource, tool software