An improved deep learning architecture for multi-object tracking systems

Urdiales, Jesús; Martín, David; Armingol, José María · 2023 · Crossref

DOI: 10.3233/ica-230702

archive: archived pipeline: cataloged verified

Get this paper ↗ (DOI — opens at the source; we link to it, we don't host it)

Summary

This paper addresses the challenge of robust 3D multi-object tracking (MOT) in crowded urban environments, a critical requirement for autonomous driving. The primary difficulty in MOT is accurate data association—matching new detections to existing tracks—especially when objects are close together or occluded. While traditional systems rely on Kalman filters for state estimation and algorithms like the Hungarian method for association, these approaches often struggle with identity switches and trajectory instability. The authors propose an improved deep learning architecture, termed SMSBoxNet, to enhance the association stage of a Kalman filter-based tracking system, aiming to reduce identity switches and improve trajectory accuracy. The proposed system follows a tracking-by-detection paradigm and integrates three neural networks to solve the association problem. First, a Convolutional LSTM (convLSTM) network extracts spatiotemporal features from a sequence of previous detections for each track, allowing the system to leverage historical context rather than relying solely on the most recent detection. Second, a Siamese network, built on EfficientNet, calculates the visual similarity between the convLSTM output and new detections. Third, a recurrent LSTM network, called BoxNet, processes 3D and 2D bounding box information to extract positional features. The outputs from the Siamese and BoxNet networks are concatenated and passed through fully connected layers to produce a match probability score. These scores form a cost matrix solved by the Hungarian algorithm. To optimize computational efficiency, a fault-detection system based on the Kalman filter’s innovation vector and Mahalanobis distance is employed to discard unlikely associations before the deep learning stage. The system was validated using the Argoverse dataset, which provides both 2D image and 3D positional data. The authors conducted extensive hyperparameter tuning for the convLSTM, Siamese, and BoxNet components, selecting configurations that minimized loss functions such as double-margin loss and binary cross-entropy. The experimental design included training with false positive detections to enhance robustness against erroneous past data. The results demonstrate that the integration of convLSTM and Siamese networks effectively captures spatiotemporal and visual features, leading to more stable object IDs. The fault-detection mechanism successfully reduced the computational load by filtering out improbable matches early in the process. The significance of this work lies in its hybrid approach, combining the predictive capabilities of the Kalman filter with the pattern recognition strengths of deep learning. By addressing the association problem with a neural network that considers historical track data, the system achieves more reliable re-identification in complex scenarios. This improvement leads to more stable trajectories and reduced identity switches, thereby enhancing the overall performance of MOT systems. The study confirms that integrating deep learning into specific stages of traditional tracking pipelines can yield superior results compared to purely classical or entirely deep learning-based methods, offering a viable path for robust autonomous driving applications.

Provenance

The full processing record for this entry. Every stage of this paper's journey through the pipeline is logged — what ran, with which tool and model, how many attempts it took, and when it last completed.

StageOutcomeToolModelPromptAttemptsCompleted
discover success Crossref 1 2026-06-20
archive success unpaywall 2 2026-06-26
extract success cached 2 2026-06-26
clean success clean 1 2026-06-20
chunk success chunk 1 2026-06-20
embed success embed Qwen/Qwen3-Embedding-8B 1 2026-06-20
promote success 1 2026-06-20
summarize success llm qwen3.6-27b-prismaquant summ-v5 1 2026-06-26
tag success vector_similarity 6 2026-06-20
verify success 1 2026-06-26

Summary generated by qwen3.6-27b-prismaquant on 2026-06-26; verification: verified.

Topics

Ranked by relevance to this paper. Hover a topic for its definition.