Featured

MDD-reID — Person Re-Identification with Soft-Biometry

AIContext: Polytech Nantes (R&D)

PyTorchTorchVisionVision TransformerFAISSDeep LearningCNN

MDD-reID — Person Re-Identification with Soft-Biometry

R&D project conducted at LS2N (Laboratoire des Sciences du Numerique de Nantes, CNRS-affiliated) under the supervision of a research scientist from the DUKE team. Published as a 73-page research paper covering a full literature review, architecture design, experimental protocol, and cross-domain evaluation. 25+ models trained, 60+ evaluation runs across 3 benchmark datasets.

Part-Based Feature Extraction (Vision Transformer)

Custom Vision Transformer architecture producing a body-part-aware identity representation. Processes 224x224 images through 12 Transformer blocks with a local awareness mechanism that maps 196 image patches into 14 body regions, producing a 10,752-dimensional identity vector (14 x 768).

CLS token blending: each local region is fused with the global CLS token via a learnable lambda parameter, capturing both holistic appearance and fine-grained body-part detail
Progressive unfreezing: all 12 Transformer blocks frozen initially, then unfrozen one at a time every 2 epochs with learning rate decay (x0.8) to prevent catastrophic forgetting of pretrained weights
14 parallel classification heads with ensemble-style voting: aggregated softmax scores across all parts, joint CrossEntropy loss backpropagated through the full network
Trained on CUDA (NVIDIA 4060 Ti), ~1h40 per training run

Soft Biometric Attribute Prediction

Attribute prediction module operating on the Transformer embeddings. 5 architectures systematically compared (logistic regression, MLP, CNN1D, Transformer encoder, CNN1D+Transformer hybrid), CNN1D selected for the final pipeline.

Attribute engineering: reduced 28 raw binary annotations to 4 balanced attributes (gender, clothing top category, clothing bottom type, bag presence) by recategorizing mutually exclusive binary columns into single categorical variables, solving a severe class imbalance problem (e.g., 85/15 split on color attributes)
Dual-head output: sigmoid activation for 3 binary attributes + softmax over 10 classes for the categorical attribute
Evaluated oversampling strategies (SMOTE, ADASYN) on the embedding space, identified that attribute interdependencies made synthetic generation ineffective, opted for the recategorization approach instead

Embedding + Attribute Fusion

3 fusion strategies designed and benchmarked to combine the 10,752D identity embeddings with predicted soft biometric attributes:

Weighted additive fusion: learnable scalar alpha balancing embeddings vs. projected attributes (optimal alpha ~0.92, confirming embeddings dominate but attributes add complementary signal)
Gating fusion: dimension-wise sigmoid gates for adaptive per-feature weighting via Hadamard product
Concatenation fusion: attribute projection to 1,024D via linear layers + learned embeddings, concatenated to produce a 12,800D combined vector

Similarity Search & Cross-Domain Evaluation

Two parallel retrieval backends implemented for large-scale gallery matching, systematically compared on 10,752D vectors.

FAISS (IndexFlatIP): GPU-accelerated inner product search scaling to millions of gallery entries
Cosine similarity (PyTorch): reference implementation with L2 normalization scaled by sqrt(14)
Cross-domain evaluation matrix: each model trained on one dataset, then evaluated on all 3 test sets to measure generalization beyond same-domain performance
Metrics: Rank-1, Rank-5, Rank-10, and mAP (mean Average Precision)

Results

Evaluated against state-of-the-art models across 3 standard re-identification benchmarks (Market-1501: 1,501 identities / 32K images, DukeMTMC-reID: 1,812 identities / 36K images, CUHK03: 1,360 identities):

Market-1501: 99.2% Rank-1, 99.6% Rank-5 (same-domain)
DukeMTMC-reID: 94.0% Rank-1, 92.7% mAP (cross-domain)
CUHK03: 81.3% Rank-1, 76.5% mAP (cross-domain)
Fusion with attributes maintained Rank-1 while improving Rank-10, validating the multi-modal approach as a complementary signal for harder queries

Tech Stack

PythonPyTorchtorchvisiontimmFAISSCUDAscikit-learnNumPyPandasMatplotlibSciPy