Carnegie Mellon University
A transformer-based single particle tracker for fluorescence microscopy that uses multi-hypothesis attention and position-based relinking to handle low SNR and high particle density.
MoTT (Motion Transformer Tracker) is a deep learning framework for single particle tracking in fluorescence microscopy images, developed by Yudong Zhang and Ge Yang at Carnegie Mellon University. Presented at MICCAI 2023 and available as a bioRxiv preprint (DOI: 10.1101/2023.07.20.549804), MoTT addresses the fundamental challenges that make particle tracking in live fluorescence microscopy difficult: low signal-to-noise ratio (SNR), high particle density, and complex, unpredictable particle motion.
Single particle tracking (SPT) is a cornerstone technique in cell biology for characterizing the spatiotemporal dynamics of subcellular structures — including vesicles, organelles, motor proteins, and signaling molecules — at the nanoscale. By linking particle detections across sequential fluorescence microscopy frames into continuous trajectories, SPT enables measurement of diffusion coefficients, directed transport rates, and confinement properties that reveal the biophysical mechanisms governing intracellular organization. However, the technique is technically demanding: fluorescent particles are often dimmer than the background, appear and disappear due to photobleaching or movement out of the focal plane, and can be densely packed enough that proximity makes assignment ambiguous.
MoTT applies transformer self-attention to learn complex particle motion patterns from trajectory history, simultaneously evaluating multiple competing hypotheses for how each active tracklet might evolve. This multi-hypothesis approach, combined with a novel relinking strategy that substitutes predicted particle positions for missed detections, substantially improves robustness in the challenging imaging conditions typical of live-cell SPT experiments. The result is a method that outperforms prior state-of-the-art across the standardized ISBI Particle Tracking Challenge benchmarks, providing a powerful and general tool for quantitative analysis of subcellular dynamics.
MoTT is built on the scaled dot-product attention mechanism from the original Transformer architecture (Vaswani et al. 2017). The tracker represents each active (live) tracklet as a sequence of past detection positions and timestamps, and generates hypothesis tracklets representing possible future continuations. Self-attention is computed across the joint set of live and hypothesis tracklet tokens, allowing the model to capture dependencies between different particles' motion patterns and resolve ambiguous assignments. The matching probability between each live tracklet and each detection candidate, as well as the existence probability for each live tracklet, are predicted from the attention outputs. Final trajectory assignments are determined by global linear programming optimization on the predicted matching probabilities.
The model was trained and evaluated on the ISBI Particle Tracking Challenge datasets, which provide standardized fluorescence microscopy sequences spanning a range of particle densities (low, medium, high) and SNR levels (SNR1 through SNR4), along with ground truth trajectories for quantitative benchmarking. Tracking performance is measured using the standard ISBI challenge metrics including JSC (Jaccard Similarity Coefficient for detection), JSCD (Jaccard for detection), and track-linking accuracy metrics. MoTT achieves state-of-the-art performance across these benchmarks, with the largest improvements observed in the low-SNR, high-density settings where track fragmentation is most problematic and where the relinking strategy has the greatest impact.
MoTT is broadly applicable to any live-cell fluorescence microscopy experiment where subcellular structures or molecular complexes must be tracked over time. In membrane biology, the model enables tracking of individual receptor molecules, lipid droplets, or endocytic vesicles to characterize confinement zones, diffusion modes, and transport kinetics. In organelle biology, MoTT can follow mitochondria, lysosomes, and early endosomes through complex intracellular trafficking pathways. In viral infection studies, tracking of individual viral particles during cell entry, endosomal trafficking, and nuclear import provides mechanistic insights into the infection process. The model's compatibility with detections from established particle detection tools (such as deepBlink) makes it straightforward to integrate into existing SPT pipelines, and the open-source GitHub repository provides code for training on custom datasets with domain-specific motion characteristics.
MoTT contributed to the growing application of transformer attention mechanisms to bioimage analysis, demonstrating that the multi-hypothesis reasoning and global optimization enabled by attention are particularly well-suited to the tracking assignment problem in dense, noisy fluorescence microscopy data. Its strong performance on the ISBI Particle Tracking Challenge benchmarks — the community-standard evaluation for SPT methods — established MoTT as a competitive tool in the field at the time of publication. The position-based relinking strategy, which turns the model's trajectory predictions into a practical mechanism for bridging missed detections, is a practically valuable contribution that addresses a common failure mode of tracking methods deployed on real experimental data. A limitation of MoTT is that it depends on upstream particle detections as input; its performance is bounded by detection quality, and in extremely low-SNR conditions where detectors themselves fail to find particles, the tracker cannot recover missing information.