How Accurate Is the Sleep Tracking on Your Watch?
by Doug Stewart
Many watches and other devices such as Whoop bands and Oura rings provide sleep tracking features and are becoming more advanced in the physiological signals they capture. The data collected is then used to calculate the time spent in the various phases of sleep and also total time sleeping. However, the information provided is based on algorithms which are calculated typically on movement, cardiac activity and in some cases temperature. These wearables are increasingly popular, but given that they are not actually tracking sleep, but combining various other metrics, how accurate are they?
A study published earlier this month compared 5 devices against a research grade actigraphy and the gold standard measure, polysomnography (PSG). The devices were:
53 healthy adults participated in the research (31 female, 22 male, aged between 18 and 30 years). Prior to the experimental days, participants were asked to abstain from caffeine and alcohol, and were instructed to get quality sleep the night before the assessment period.
On the night being studied, the Fitbit Versa and Garmin Vivosmart were put on one wrist and the Fitbit HR and Actiwatch on the other. The Oura ring was put on the finger that it fitted best. The sleep tracking mat was placed under the mattress.
All the devices experienced some failures, such as poor fitting so heart rate readings were not achieved or data syncing issues. The Fitbit Versa had the most failures of the devices tested.
Compared to the PSG, the Actiwatch, Garmin and Sleep Mat overestimate total sleep time, while the two Fitbits and the Oura Ring underestimate it. The devices were typically showing around 10 to 15 minutes difference compared to the PSG recorded sleep time. All the devices tended to overestimate the time awake after sleep onset.
Comparing the time the devices estimate Light, Deep and REM sleep, all tended to overestimate REM sleep, whilst for Light and Deep sleep, it varied from device to device as to whether they over or underestimated it.
Mean absolute percentage error is used to measure how far off predictions are on average, or the average magnitude of error. When comparing this metric for the devices versus the PSG, the total sleep time is relatively good, but for the other elements measured, such as time spent at the various stages of sleep, they have a larger percentage of error.
Therefore, if looking to track your sleep, the devices used here may be helpful for determining your total time sleeping, but not for determining the distribution of time spent in Light, Deep and REM sleep. This is important, as some of these devices combine sleep data to provide sleep and other ‘scores’ that are designed to help inform the training you should do. However, based on the metrics shown here, they are not accurate enough for these purposes. Rather, tracking your sleep in a sleep diary, speaking to a qualified expert in this area, or using your subjective feelings about how you feel overall are likely more suitable options to monitor your sleep and inform the training you should be doing.
References:
Kainec, K. A., Caccavaro, J., Barnes, M., Hoff, C., Berlin, A., & Spencer, R. M. (2024). Evaluating Accuracy in Five Commercial Sleep-Tracking Devices Compared to Research-Grade Actigraphy and Polysomnography. Sensors, 24(2), 635.