Looking at CGM data
That's why computers are useful when it comes to put some objectivity in accuracy assessments. But computers also have issues. We write the code and have a tendency to insert nasty bugs in that code.
This is also why I tend to be extremely verbose and slow when I examine data. (in this case calibration values vs CGM reported data - I also use a similar analysis for BG meter data not used as calibration points)
Here is, for example, how I keep track of the current performance of our CGM - some of the output of my programs is given below, slightly edited for brevity (comments are in bold)
Period Length: 13 days, starting 2014-10-31 00:02:41 and ending 2014-11-13 07:06:59
Time and date issues are extremely important: meter and CGM clocks drift, use different date formats and generally ignore things such as winter/summer time. I keep track of the time deltas between the different devices and adjust the data accordingly before processing it.
Calibration values [72, 86, .., 128, 130]
Measured values [49, 87, .., 148, 148]
It's always good to have some visual confirmation of the data you are working with.
Number of Calibrations: 33 excluding double start-up calibs and calibs after loss of signal or error
Obviously, if you are using a CGM that requires calibrations, the start-up calibrations should be ignored and so should double calibrations entered very close to one another. Entering five calibrations in one hour and then none for twelve hours is likely to introduce a significant bias in the data.
Number of Calibrations per day : 2.54 on average
Average Absolute Delta : -4.48 mg/dL - objective
General Trend at calib time : 1.26 mg/dL - speculative
Average Delta debiased 5 min : -3.22 mg/dL - speculative
Average Delta debiased 10 mins : -1.96 mg/dL - speculative
In the calculation part, we see that the CGM is reporting, on average, values that are 4.48 mg/dL BELOW the meter values. That's interesting, but this is not the whole story. As we all know, CGMs are delayed therefore I calculate a trend at the moment of calibration and project what the results could have been if no delay was taken into account. You see that the CGM would have been "more correct" if the trend had continued. Note: the maniacal observer will notice that the average projected value and the rate of change don't strictly match. There are two reasons for this, the first one is that we don't really bother checking if the calibration occurred 4 mins 59 sec or 1 sec before the next value; the second one is that we can't rely on the corrected post calibration value to estimate an error.
Average % Error vs BG : 15.35 % - objective
That would be the MARD vs meter we are currently running at. Decent, but not great for that sensor.
Average Rate of Change : 0.12 mg/dL min (linear)
That's a check to make sure we aren't calibrating when rising or falling too fast.
No calibration in low range
0% of calibrations in the < 60 range. This is intentional, calibrating in that range is playing the lottery.
High Range Calib Percentage : 6.06 % - above 160 mg/dL
Average % error in high calib : 6.11 % - above 160 mg/dL
Not very significant, we have very few calib in that range
Average Delta in high calib : -11.50 mg/dL - above 160 mg/dL
Norm Range Calib Percentage : 93.94 % - between 60 and 160 mg/dL
Average % error in norm calib : 15.94 % - between 60 and 160 mg/dL
Average Delta in norm calib : -4.03 mg/dL - between 60 and 160 mg/dL
So now that we have our MARD, what about a CLARKE error grid?
Errors aren't created equal. Some are really insignificant and don't matter from a clinical point of view. But other errors can kill you. The Clarke error grid is a somewhat arbitrary look at the data that takes that human risk factor into account, based on what the clinical consequences of the error could be. Roughly speaking, falling in zone A is excellent (this is ideally where you would want the CGM that controls an artificial pancreas to be at all times). Zone B is acceptable. Zone C and D could get you into trouble if you based a treatment decision on it and will certainly lead to sub-optimal adjustments. A decision taken in zone E has a very high probability to lead to an emergency room visit.
The Clarke grid analysis of CGM vs meter data is a bit tricky to implement, that is why before running it on my data, I always check that my program still works with a test validation run with a test data set. This is the result of such a test. The use of colors helps spotting an eventual bug.
|CLARKE - actual data|
But what about the recommendation not to take any decisions based only on CGM data? That's a tricky question, and while I certainly wouldn't recommend it in general, I have to confess that I do correct based on CGM data alone at times. But you can't do this blindly. You need
- to be sure of the reliability of your BG meter (I run separate analysis on that part, maybe the topic of a later blog post)
- to understand the physiological climate you are in (stress, post-exercise, dawn phenomenon)
- to be able to analyze and understand your CGM data extensively. (most of the user submitted data files I have analyzed do not fit nicely in the A and B zones)
- to take clinical symptoms into account. You are treating a patient, not fixing numbers.
- to be confident that you will be able to react in case something goes wrong.