Semiconductor USA
Back to Careers

Intern 2020-2021 - Transformers for tracking

Location: Paris, France
Job Category: Internship

Apply Now
2020-2021 - Transformers for tracking
AI/Machine Learning Internship - Samsung Strategy and Innovation Center - Paris - 6 months
Keywords: machine learning, computer vision, deep learning, tracking, transformers
Transformers for multi object tracking
The objective of this internship is to explore if transformer models can improve current SOTA solutions for (multi) object tracking.
Here we highlight the current trends and limitations for each:
Pure visual object tracking networks, as SiamMask (, CVPR 2019), are single shot networks which
detect (and sometimes segment) objects comparing input image with some reference template (e.g. ground truth bounding box in first
These networks do not rely on temporal information, solely on appearance, so tracking fast dynamics and/or objects which
suddenly change appearance can be an issue.
September 2020’s SOTA, FairMOT (, train a re-identification branch so that it can produce
embeddings in a space where same object bounding boxes are closer (similar to face recognition). This embeddings are used to
compute a distance score that then is used for classical techniques (Hungarian algorithm for association, Kalman filter for prediction).
Thus the network itself can only cope with visual cues, and temporal ones are treated in a classic fashion.
Some trials have been performed with RNNs. The idea is that by processing subsequent bounding boxes of a person a RNN can
produce an internal memory of that person (same concept as keeping internal representation of text semantic).
However, approaches like RAN say that RNNs alone are not enough to store rich information
about the target, so they help the network with external memory (template bounding boxes at previous frame). Moreover the
association is still manual, using scores provided by RAN network.
Those approaches do not achieve comparable accuracy than FairMOT (less powerful object feature representation?), but are
still the best by far in avoiding ID switches, so recurrent methods are promising in term of temporal consistency prevention.
We ask ourselves how transformers can help tracking in several ways:
Provide much richer representational power than RNNs (can we avoid template selection and rely solely on transformer long-term
tracking history encoding?)
Can we pair transformers with powerful visual backbones for robust visual representation?
Can we avoid explicit (hard) association in the multi-object case, and rely solely on self-attention mechanism?
Multiple ways can be explored:
Encoder-only: train encoder to reconstruct missing pieces of tracklet
Generative: train encoder-decoder (or decoder only) to predict next position
Hierarchical (encoder for each track and transformer on top for association and prediction)
Bonus: trajectory prediction
SOTA trajectory prediction approaches, as SocialGAN ( and its variants (Social Ways
/1904.09507.pdf), rely on LSTM encoders for modeling each trajectory independently, and then use ad-hoc modules to pool the encoded
trajectories (with or without attention) and to predict next move for a given target trajectory.
Can we use transformers to model individual trajectories (as in the tracking part of the internship)?
Can we use transformers to model attention between trajectories?
We believe that such task is perfectly suitable for exploiting capability of transformers of modeling complex dependencies.
Samsung Strategy and Innovation Center
With offices in San Jose (US), Menlo Park (US), New York (US), Paris (France), Tel Aviv (Israel) and Seoul (Korea), the goal of Samsung
Strategy and Innovation Center (SSIC) is to smartly add artificial intelligence into Samsung products and to promote innovation. Our first lines of
work are the Automated Mobility and the Internet of Things, in order to seek and develop high impact solutions to revolutionize uses. We are
customer-centric, making our technologies respecting privacy. In collaboration with Samsung's business teams, SSIC brings the latest research
innovations to create products optimized by AI, and quickly accessible to users.
Apply Now