I think it is rather wrapped up in your use of the term, spurious correlation.
Yes, I used the term cover a multitude of sins and save time before going to bed. In particular the interpretation of statistics by non-statisticians to support points of view without understanding the underlying 'mechanisms' or even looking at the data.
During my postgraduate studies I was asked to analyse the key factors causing children from deprived areas in a metropolitan region to be stunted in comparison with children from more affluent areas. There had been a major survey designed to show this was the case and a huge amount of data had been collected. A statistician had done some form of cluster analysis showing the children fell into two distinct groups: case proven and accepted by the authorities. A few minutes looking at the data revealed that one group was only girls and the other only boys.
Coming closer to home in this forum, studies have shown that fatty liver is a risk factor for T2D and T2D is a risk factor for fatty liver. They have also shown about 60%-70% of T2Ds have fatty liver. However Prof Roy Taylor and his team have demonstrated that all T2Ds they have measured, other than those with pancreatic and other complications, have a fatty liver. The difference between 60%-70% and 95%-100% could well be due to undetected cases of fatty liver.
Anyway from the papers I have seen it is only just being generally accepted that fatty liver is a precursor to T2D, and then T2D only increases the risk of cardiovascular disease, liver disease and other serious conditions associated with metabolic syndrome. In other words we need models as well as data and statistics.
Take the graph I showed above. Mk 1 eyeball would suggest that there is a correlation between my HbA1c results and the mean of my waking results in the preceding 30 days. The R^2 value tells me that there appears to be linear relationship with a very high probability that it is not due to random chance. I can use the numerical relationship with confidence in predicting my Hba1c results. I have looked at this data further and found that the correlation coefficient increases until you get to the BG average in the last 30 days. Beyond 30 days and up to 90 days it changes very little.
This tells me that my average waking reading over the last 30 days can be used to predict, with a very high degree of confidence, my current HbA1c, something shown by my last couple of Hba1c tests where my predictions were very close to the actual result.
As always, this conclusion applies to me and my data. I would love to know whether it has wider validity but do not have enough data to express an opinion on the point. 😉
I should have complimented you on your graph. To my mind it's evidence that HbA1c and BG are likely to closely related at a personal level. We know they are loosely related in a population due the factors which cause 'everyone to be different'.
It also supports the 50% of HbA1c due to the last 30 days rule of thumb. BG levels for previous 60-90 days might have to be included if BG levels were (say) coming down due to dietary changes.
Bayes tells me your conclusion is highly like to prove of wider validity.