Efficient Smartphone Sensor Analysis for Behavioral Profiling in Transportation Research: A Case Study of Driver and Passenger Classification

Lozano, Wilson; Neal, Tempestt · 2024 · ROSA P / National Institute for Congestion Reduction

archive: archived pipeline: cataloged verified

Get this paper ↗ (full text — opens at the source; we link to it, we don't host it)

Summary

This study addresses the challenge of classifying drivers versus passengers using smartphone sensor data while balancing classification accuracy with device resource constraints, such as battery life and computational load. Prior research often relied on strict assumptions regarding phone placement or required continuous monitoring that drained resources. This work aims to determine if short, naturalistic data collection windows can effectively distinguish user roles without intrusive constraints, thereby supporting efficient multimodal transportation data collection. The researchers conducted 145 staged trips around the University of South Florida campus using six volunteers and a diverse set of Android smartphones. Data was collected naturally via the Sensor Logger app, sampling accelerometer, gyroscope, gravity, magnetometer, and orientation sensors at up to 100Hz. Raw data was partitioned into 21 consecutive time windows ranging from 3 to 90 seconds, from which 133 statistical features were extracted. The experimental design evaluated three classifiers—Decision Tree, Random Forest, and Support Vector Machine—across various preprocessing techniques, including feature standardization, variance-based feature selection, and dimensionality reduction methods (PCA, Kernel PCA, and random projection). A total of 504 experiments were performed using 10-fold cross-validation to assess accuracy, precision, and recall. The results demonstrated that shorter data collection windows significantly improved classification performance, with accuracy increasing as window size decreased. Feature standardization proved critical for longer data periods, where non-standardized features caused notable accuracy degradation, though its impact diminished in shorter windows. Feature selection consistently outperformed dimensionality reduction techniques, particularly when paired with Decision Tree and Random Forest classifiers. The Random Forest classifier achieved the highest overall performance, with accuracies ranging from 0.892 to 0.97 for standardized features. Conversely, Support Vector Machines showed lower accuracy with feature selection on non-standardized data. These findings imply that efficient behavioral profiling in transportation research is feasible using brief, naturalistic sensor data windows. By leveraging short time frames and feature selection, developers can minimize battery consumption and computational overhead while maintaining high classification accuracy. This approach supports the deployment of intelligent models in real-world applications, such as traffic congestion reduction and safety monitoring, by enabling passive, resource-efficient data collection from open-source platforms.

Key finding

Shorter data collection windows significantly improved driver versus passenger classification accuracy while reducing the computational necessity for feature standardization.

Methodology

field_study

Sample size: 6

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed. Discovered via bulk_ingest_rosap on 2026-05-23 (6 acquisition events logged).

Stage	Outcome	Tool	Model	Prompt	Attempts	Completed
discover	success	rosap	—	—	2	2026-05-23
archive	success	—	—	—	1	2026-05-23
extract	success	cached	—	—	2	2026-06-10
clean	success	—	—	—	1	2026-06-01
chunk	success	—	—	—	1	2026-06-01
embed	success	—	—	—	1	2026-06-02
enrich	success	—	—	—	1	2026-05-23
promote	success	—	—	—	1	2026-05-23
summarize	success	llm	qwen3.6-27b-prismaquant	summ-v5	3	2026-06-10
tag	success	vector_similarity	—	—	19	2026-06-11
verify	success	—	—	—	2	2026-06-10

Summary generated by qwen3.6-27b-prismaquant on 2026-06-10; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.

Information type

What kind of knowledge this paper contributes, grouped by family — independent of topic (what it is about) and method (how it was studied).

Empirical Findings: observational prevalence
Methodological Resource: validation psychometrics, dataset resource