I’ve owned an Apple Watch for just over three months now and like many of the fitness trackers I’ve used, it’s taken a while to get used to. What’s interesting about the Apple Watch is that it’s the first tracker with a genuine chance to function as a platform, a platform other apps can connect to through HealthKit, and a device developers can hook into via Watch OS2. Apple has a knack for getting people behind its platform even if the technology and feature set might not be new or even industry leading.
That brings the capabilities of the watch into focus because ultimately, the measurements from the device need to provide some confidence that they at least represent the trends and habits that most of us try and monitor, and then consequently (or hopefully) make informed decisions from.
1: The data was really easy to get hold of, slightly more work required to put it into a useful format.
HealthKit App, from left to right: Categories, daily summary, raw row level data and source device and finally sharing settings and the export window in sync solver.
Because of the way HealthKit works, I was able to use an app (Sync Solver Export, but there are several other options)to export the data into CSV files which I could access through the file sharing interface for apps in iTunes.
File transfer from iPhone to Desktop from sync solver.
Each app formats the data differently, I would recommend trying QS Accessfirst and go from there as it aggregates data and tabulates it for you and its free. In my case I wanted the raw data as granular as it comes so I could work it into a format I could use to push it through Alteryx & Tableau. If you don’t have access to these tools, you can use Excel to clean the data instead of Alteryx and you can use Tableau Public for free instead of the professional version. All in, I spent 15 minutes from export to the visualisations in this post.
The step and the heart rate data are the two most compelling bits of data from the device. Calorie and distance data are also useful but the accuracy is really hit and miss (at least for me) largely because they’re based on assumptions and calculations that vary for each individual: height, weight and fitness.
2: Watch OS2 increased the sampling rate for heart rate tracking.
Every heart rate reading over the last 3 months. The watch has a tendancy to like generating readings at certain heart rate bands.
I also noticed a whole range of stealthy modifications to the HealthKit app in iOS 9 and 9.1 as well as the addition of some new categories and measures. It also appears to be more stable.
3: For activity tracking the heart rate data really isn’t great at all if you’re going to want to take it seriously.
To test this I did a basic test. I wore both the Apple and Polar devices as advised (I’ll come back to this), used a general open activity on the Apple Watch, and with the Polar heart rate tracker I used a custom activity in the Wahoo Fitness activity app. I started with a couple of minutes standing and walking then moved to sitting down on my sofa whilst watching TV, in the final few minutes, I scaled the equivalent of three flights of stairs, an activity that I know gets my heart rate rising quickly.
So below we have a trace of the data from the two devices. At first glance, the Apple Watch seems to match the same general trend as my Polar heart rate monitor but then looking at it for a while you start to realise some issues.
Heart Rate on the Y axis and Time as a continuous element on the X axis.
The Polar chest based tracker was far more sensitive to real time changes in my heart rate largely due to the sampling rate. On further analysis, the Apple watch was sampling at roughly 10 times per minute where as the data from the Polar heart rate tracker was being sampled every second by Wahoo Fitness. In reality the Polar tracker might be capable of sampling at a sub second level but the data I collected doesn’t allow me to arrive at this conclusion. Sampling more often allows the device to have a better profile of your work rate and how that ramps up and down in a given activity.
Total number of data points collected by each device
In this chart we have each minute as a discrete item broken down by the two devices. It’s clear that at times the Apple Watch sample rate is much lower and waivers from the usual sample rate of about 12 times per minute. This coincides with small and large movements I was making with my arm.
One of the downsides of the watch having a low sample rate at times is the absence of current heart rate information. In the image below, towards the end of the test, I saw the following message for about 20 seconds before getting a reading.
For me this is a classic case of good design with negative consequences. You see in most current apps and tracking devices, the number 72 would show in the same colour for some time, I wouldn’t know if it was current or not, but because the design here gives you more information and context, (tells you it’s measuring but also greys it out to let you know it’s not current) you start to have a greater sense of awareness then frustration that the data isn’t fresh and considering you see this screen quite often especially during high intensity activities like running, the frustration only grows more and more to the point where you generally feel like you can’t rely on the device for a reading when you need it.
This is difficult to truly measure because everything has a margin of error. Most people don’t have access to an ECG machine either but from my own experience and research, I’ve found the polar chest strap to be the most accurate off the shelf device you can get and it’s trusted by
] % difference to the Polar HR device between the minute level average hear rate readings from the Apple Watch.
So if we use the data from the Polar device as our base, how far off is the Apple Watch if we were to average out heart rates from the two devices at a minute level then calculate the percentage difference?
In short the following calculation for each minute:
( AVG(Apple Watch HR) - AVG(Polar HR device) ) / AVG(Polar HR device)
… the watch was off in most cases and considering I wasn’t even that active during this test, it further affirmed that the heart rate from the Apple Watch wasn’t something I could trust during even light levels of activity but at a resting state it was within an acceptable margin of error. I use the phrase “at least for me” because I have dark skin colour and this isn’t the only device that I’ve seen have an issue with light based readings. Similar issues exist with the Fitbit Charge HR and the Fitbit Surge. It’s also why earlier I highlighted that at times I’ve considered going against the advice and actually wearing devices on the inside of my arm to get a better reading.
All in all the data’s not perfect but here’s the thing, you mostly see the data in aggregated views so really just knowing what’s going on in different ranges is what will matter and for that it’s reasonable. I should also mention that the watch supports direct connection to health sensors via bluetooth so if you want to use the watch but would rather pair your own heart rate monitor to it and get the read from that, you can, that’s a promising sign for the platform, the fact that virtually anything can plug into it.
4: Build your own visualisations / stories to show the things that matter to you.
The Apple watch activity app is okay but I’d urge developers and hobbyists to build their own creations from the data. Displaying it differently allows you to use the data to answer different questions.
It’s really difficult to ascertain meaning from these views and some of them are totally pointless not to mention the poor adherence to data viz best practice.
I just don’t get on well with the standard visualisations and this is the case with most fitness trackers. They always tell the version of the story I’m least interested in. I rebuilt a visualisation in Tableau. For me it was more of an overview, sort of a step back but something that still allowed me to see the small details. You could argue this approach introduces more noise in terms of deriving use or understanding but I take the view of Kenneth Cukier, N=All read more about it here. View all the data in its glory, build something that shows all the detail, from here you can then formulate questions to then drive deeper analysis.