Featured

MDD-reID — Person Re-Identification with Soft-Biometry

AIContext: Polytech Nantes (R&D)
PyTorchTorchVisionVision TransformerFAISSDeep LearningCNN
MDD-reID — Person Re-Identification with Soft-Biometry

R&D project conducted at LS2N (Laboratoire des Sciences du Numerique de Nantes, CNRS-affiliated) under the supervision of a research scientist from the DUKE team. Published as a 73-page research paper covering a full literature review, architecture design, experimental protocol, and cross-domain evaluation. 25+ models trained, 60+ evaluation runs across 3 benchmark datasets.


Part-Based Feature Extraction (Vision Transformer)

1775928958430-wvqfwv.png

Custom Vision Transformer architecture producing a body-part-aware identity representation. Processes 224x224 images through 12 Transformer blocks with a local awareness mechanism that maps 196 image patches into 14 body regions, producing a 10,752-dimensional identity vector (14 x 768).

  • CLS token blending: each local region is fused with the global CLS token via a learnable lambda parameter, capturing both holistic appearance and fine-grained body-part detail
  • Progressive unfreezing: all 12 Transformer blocks frozen initially, then unfrozen one at a time every 2 epochs with learning rate decay (x0.8) to prevent catastrophic forgetting of pretrained weights
  • 14 parallel classification heads with ensemble-style voting: aggregated softmax scores across all parts, joint CrossEntropy loss backpropagated through the full network
  • Trained on CUDA (NVIDIA 4060 Ti), ~1h40 per training run

Soft Biometric Attribute Prediction

Attribute prediction module operating on the Transformer embeddings. 5 architectures systematically compared (logistic regression, MLP, CNN1D, Transformer encoder, CNN1D+Transformer hybrid), CNN1D selected for the final pipeline.

  • Attribute engineering: reduced 28 raw binary annotations to 4 balanced attributes (gender, clothing top category, clothing bottom type, bag presence) by recategorizing mutually exclusive binary columns into single categorical variables, solving a severe class imbalance problem (e.g., 85/15 split on color attributes)
  • Dual-head output: sigmoid activation for 3 binary attributes + softmax over 10 classes for the categorical attribute
  • Evaluated oversampling strategies (SMOTE, ADASYN) on the embedding space, identified that attribute interdependencies made synthetic generation ineffective, opted for the recategorization approach instead

Embedding + Attribute Fusion

1775929076545-nwzx1s.png

3 fusion strategies designed and benchmarked to combine the 10,752D identity embeddings with predicted soft biometric attributes:

  • Weighted additive fusion: learnable scalar alpha balancing embeddings vs. projected attributes (optimal alpha ~0.92, confirming embeddings dominate but attributes add complementary signal)
  • Gating fusion: dimension-wise sigmoid gates for adaptive per-feature weighting via Hadamard product
  • Concatenation fusion: attribute projection to 1,024D via linear layers + learned embeddings, concatenated to produce a 12,800D combined vector

Similarity Search & Cross-Domain Evaluation

Two parallel retrieval backends implemented for large-scale gallery matching, systematically compared on 10,752D vectors.

  • FAISS (IndexFlatIP): GPU-accelerated inner product search scaling to millions of gallery entries
  • Cosine similarity (PyTorch): reference implementation with L2 normalization scaled by sqrt(14)
  • Cross-domain evaluation matrix: each model trained on one dataset, then evaluated on all 3 test sets to measure generalization beyond same-domain performance
  • Metrics: Rank-1, Rank-5, Rank-10, and mAP (mean Average Precision)

Results

Evaluated against state-of-the-art models across 3 standard re-identification benchmarks (Market-1501: 1,501 identities / 32K images, DukeMTMC-reID: 1,812 identities / 36K images, CUHK03: 1,360 identities):

1775930117411-y2pgok.png
  • Market-1501: 99.2% Rank-1, 99.6% Rank-5 (same-domain)
  • DukeMTMC-reID: 94.0% Rank-1, 92.7% mAP (cross-domain)
  • CUHK03: 81.3% Rank-1, 76.5% mAP (cross-domain)
  • Fusion with attributes maintained Rank-1 while improving Rank-10, validating the multi-modal approach as a complementary signal for harder queries

Tech Stack

PythonPyTorchtorchvisiontimmFAISSCUDAscikit-learnNumPyPandasMatplotlibSciPy