Knowledge Discovery in Massive Transportation Datasets : Merging Information from Disparate Sources to Enhance Traffic Safety : [fact sheet]

NHTSA · 2018 · ROSA P / United States. Federal Highway Administration

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This fact sheet outlines research initiatives under the Federal Highway Administration’s (FHWA) Exploratory Advanced Research (EAR) Program, aimed at enhancing traffic safety through the integration of massive, disparate transportation datasets. Despite engineering and policy advances, traffic fatalities in 2017 remained at levels not seen since 2007. The research addresses the challenge of extracting safety-related insights from vast, expanding datasets covering driver behavior, vehicle performance, traffic patterns, weather, and infrastructure. The goal is to develop new techniques for processing structured, semistructured, and unstructured data to identify safety issues that traditional datasets alone might miss. Two primary projects are highlighted. The Palo Alto Research Center (PARC) is developing automated machine learning methods to replace slower manual processes for extracting, cleaning, and restructuring data. PARC is utilizing video, radar, and still photography collected at Chicago intersections to refine these tools for use with other data-rich traffic resources. Concurrently, CUBRC, a systems integration research organization, is developing the Transportation Research Informatics Platform (TRIP). TRIP is a layered infrastructure designed to ingest, store, analyze, and display information. The platform operates on a Linux foundation and integrates open-source tools like Apache and SQL to handle data warehousing and querying. It also links to popular analytics packages and visualization tools to provide flexibility for researchers. TRIP is designed to make massive amounts of transportation data accessible for knowledge discovery. It integrates legacy data with novel sources, including the Highway Safety Information System (HSIS), the Strategic Highway Research Program 2 (SHRP2) Naturalistic Driving Study (NDS), and the SHRP2 Roadway Information Database (RID). Additional data sources include Clarus roadway-weather data, Nexrad weather information from the Iowa Environmental Mesonet, and video logs capturing roadway features. The platform allows users to set search parameters such as weather, accident type, and collision severity to identify driver patterns. CUBRC is currently testing TRIP using sample datasets from the Seattle, Washington, region. The significance of this work lies in its potential to leverage decades of legacy data alongside non-traditional data sources to extend understanding beyond conventional safety countermeasures. By merging traffic-related information from disparate sources, researchers can detect safety issues that are not visible through traditional analysis. The EAR Program supports these higher-risk, longer-term research efforts to achieve transformative improvements in transportation systems. The resulting tools and platforms aim to provide rapid, versatile methods for visualizing streaming and historical data, ultimately supporting more effective traffic safety research and policy development.

Key finding

The development of automated machine learning methods and the Transportation Research Informatics Platform enables the integration and analysis of massive, disparate transportation datasets to identify traffic safety patterns that are not apparent in traditional data sources.

Methodology

other

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (8 acquisition events logged).

StageOutcomeToolModelPromptAttemptsCompleted
discover success rosap 2 2026-05-23
archive success 1 2026-05-23
extract success cached 4 2026-06-10
clean success 1 2026-06-01
chunk success 1 2026-06-01
embed success 1 2026-06-02
enrich success 1 2026-05-23
promote success 1 2026-05-23
summarize success llm qwen3.6-27b-prismaquant summ-v5 6 2026-06-10
tag success vector_similarity 19 2026-06-11
verify success 2 2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).