UAV Human DetectionThermal Imaging AI
SeeingfromAbove
Can data augmentation improve robustness of UAV detection in adverse conditions?
Search and Rescue (SAR) operations rely on UAVs equipped with thermal imaging to locate missing persons in remote areas. However, current detection models are trained on clean aerial images and fail when faced with adverse conditions like heavy snowfall or wildfire smoke.
We investigate whether data augmentation strategies can simulate realistic SAR environmental conditions to improve model robustness without significantly sacrificing baseline performance on clean images.
Applications
Search & Rescue
Locate missing persons in remote or blocked-off terrain where manual image review is slow and error-prone.
Disaster Response
Operate in wildfire smoke and heavy snowfall conditions that degrade standard detection models.
Reduced False Negatives
Minimize missed detections in SAR operations where a false negative can mean a life lost.
All-Weather Operation
Extend operation windows beyond ideal weather conditions using augmentation-trained models.
IntelligenceintheSky
Faster R-CNN with ResNet-50 backbone for thermal human detection
Model Architecture
Input Layer
512×512 thermal image
HIT-UAV thermal dataset
Backbone
ResNet-50 + FPN
Layers 1–3 frozen, Layer 4+ trained
Detection Head
Faster R-CNN
79.4% params trainable
Output
Bounding Boxes
Class + Confidence
Model Variants
Model A
Baseline
Trained on clean thermal images only. F1 = 0.78 on clean data, but degrades 20% on perturbed data.
- Clean image training only
- F1: 0.78 clean, 0.62 perturbed
- mAP drops 27.9% under adverse conditions
Model B
SAR Augmented
Trained with 50% SAR augmentation (snow + smoke). Only 2% F1 cost on clean data for 19.4% robustness gain.
- Snow + smoke augmentation at 50% rate
- F1: 0.76 clean, 0.70 perturbed
- mAP drops only 8.6% under adverse conditions
Key Finding
Model B is 19.4% more robust under adverse conditions
SAR augmentation trades only ~2% clean-data F1 for a 19.4% mAP robustness improvement on perturbed data with snow and smoke effects.
ImpactAcrossIndustries
From rescue missions to commercial deployment
Search & Rescue
Remote TerrainOur primary target application. SAR teams need reliable detection in snow, smoke, and low-visibility conditions where current models fail. Our augmented model retains 80% recall under adverse conditions.
TrainingVision
Building accuracy through diverse aerial perspectives
Thermal Imaging
Infrared captures heat signatures day and night at 60–130m altitude
2,898 Images
Sampled from 43,470 UAV video frames with camera angles 30–90°
YOLO Format Labels
Bounding box annotations for Person, Car, Bicycle, OtherVehicle
HIT-UAV Dataset
Published in Scientific Data by Suo et al., CC0 licensed
SAR Augmentations
Using the Albumentations library, we perturb images with snow and smoke effects to simulate realistic SAR conditions. Applied to 50% of training images for Model B.
Snow Effect
50%Albumentations-based snow perturbation with Gaussian noise for realistic SAR winter conditions
Smoke/Fog Effect
50%Albumentations-based fog overlay with Gaussian noise to simulate wildfire and disaster scenarios
Training Configuration
Training Progress
SeeItInAction
Try the detection models yourself
Upload Image
Drop any thermal or aerial image for instant detection
Adjust Threshold
Fine-tune confidence threshold for precision vs recall trade-off
Compare Models
Side-by-side comparison of baseline vs augmented model
Robustness Test
Test both models under snow and fog conditions
The demo runs on HuggingFace Spaces. First inference may take a moment to warm up. Upload your own thermal images or use the provided examples.
NextHorizons
Advancing capabilities for tomorrow's challenges
Multi-Class Detection
Next PriorityExpand beyond person detection to also detect cars, bicycles, and other vehicles. These objects can indicate a missing person's location in SAR operations.
Ensemble Models
ProposedCombine multiple model types and average results to reduce overfitting and bias, creating a more robust detection system overall.
Video Tracking
ProposedExtend from single-frame detection to continuous tracking across video sequences, leveraging the 43,470 frames available in the HIT-UAV source data.
Longer Training
RecommendedBoth models were still improving at epoch 6. Extending to 15–20 epochs with early stopping and learning rate warmup would likely improve F1 by 5–10%.
Improved Localization
RecommendedAP@0.75 is below 0.43 for both models. Switching to GIoU/CIoU loss and tuning anchor box sizes for UAV data would improve bounding box precision.
Additional Augmentations
ProposedAdd more perturbation types beyond snow and smoke (e.g., rain, dust, varying altitudes) to further improve robustness across diverse SAR scenarios.
Development Roadmap
Training Improvements
- Extend to 15–20 epochs with early stopping
- Increase batch size to 8–16
- Add LR warmup schedule
Model Enhancements
- Multi-class detection (person + vehicles)
- GIoU/CIoU loss for localization
- Ensemble model averaging
Data & Robustness
- Additional perturbation types (rain, dust)
- Video frame tracking pipeline
- Larger thermal datasets
Real-World Readiness
- Confidence threshold tuning per scenario
- Human-in-the-loop validation
- Ethics & privacy safeguards
Explore the Project
View our code, training notebook, and results on GitHub. Built by Lindsay Gross, Andrew Jin, and Shreya Mendi.