Apple’s Way of using Machine Learning in Wearables Through Knowledge Distillation


With my recent purchase of Apple watch, I always amazed about the how this awesome technology makes our lifestyle healthier and easier. Wear the watch to track your workouts, monitor your sleep patterns, and get your vitals —it’s like having a personal health assistant on your wrist.
The recent wearable devices ecosystem has more accurate health tracking, enabling real-time monitoring of vital signs in daily life. These devices capture bio-signals, which has different variation in quality and power requirements:
High-fidelity Bio-signals (e.g., Photoplethysmogram - PPG) provide rich physiological insights but require optical sensors with high power consumption.
Lower-fidelity Bio-signals (e.g., Accelerometry) have a much smaller power footprint, making them more feasible for continuous monitoring but traditionally less explored for deeper health insights.
Apple recently released a research paper about how wearables can make use of these lower-field bio-signals like accelerometry values to find vital signs like heart rate & respiratory rate.
Accelerometers are widely used for activity recognition (e.g., step counting, motion detection), their potential in health biomarker analysis and diagnosis remains underutilized. However, a machine learning based approach called Knowledge Distillation is changing this landscape, allowing accelerometry models to predict a wide range of health metrics.
The Power of Knowledge Distillation
What is Knowledge Distillation?
Knowledge Distillation is a model compression technique from Deep learning methodology where a smaller and lightweight model (called student) learns from a larger and more complex model (called teacher).
Instead of learning from labeled or pre-trained data, the student model mimics the representational knowledge of the teacher model, learning its patterns and relationships.
This method is widely used in deep learning for reducing model size and computational cost without decline in the performance.
Knowledge distillation is a popular approach for many SLMs. As such, a larger model’s knowledge is essentially “distilled” into a smaller one.
Why Use Knowledge Distillation in Wearables?
Wearable devices like watches need to balance data quality and power efficiency. High-fidelity PPG sensors consume more power, whereas accelerometers are low-power but less informative.
Knowledge Distillation address this gap by transferring deep physiological insights from PPG models to accelerometry models with,
Lower computational cost → Lightweight models that run efficiently on wearables and smartphones.
Improved efficiency → Capturing richer health insights without high-end sensors.
Cross-modal knowledge transfer → Accelerometers can predict health metrics once limited to optical sensors.
Accelerometry Foundation Model
Researchers from Apple have successfully trained an Accelerometry Foundation Model that predicts health conditions using only accelerometer data, thanks to knowledge distillation from PPG models.
Here’s how it works:
The Training Process:
Large-Scale Data Collection
- 20 million minutes of data from 172,000 participants (Apple Heart and Movement Study).
PPG Model as the Teacher
- A high-fidelity PPG encoder extracts rich cardiovascular insights.
Accelerometry Model as the Student
- Learns to approximate PPG embeddings, enhancing its predictive power.
Cross-Modal Alignment
- Achieves 99.2% accuracy in retrieving PPG embeddings from accelerometry data.
Performance Gains:
Distilled accelerometry models outperform traditional accelerometer-based health models by 23%-49% in heart rate and heart rate variability prediction.
Acts as a generalist foundation model, capable of predicting multiple health-related outcomes.
Accelerometry as a Universal Health Sensor
With this breakthrough, any wearable device equipped with an accelerometer could become a powerful tool for continuous health monitoring. Potential applications include:
Chronic Disease Management → Passive monitoring for conditions like heart disease and diabetes.
Stress & Mental Health Tracking → Predicting HRV-related biomarkers linked to stress and well-being.
Personalized Healthcare → Enabling AI-driven, real-time insights across consumer wearables.
The integration of Knowledge Distillation with Accelerometry Foundation Models marks a major advancement in digital health. By leveraging low-power sensors with AI-driven insights, we can unlock a new era of affordable, continuous, and non-invasive health monitoring.
Subscribe to my newsletter
Read articles from Prabakaran Marimuthu directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
