Tuesday, April 28, 2009

Junk statistics at Natural Resources Canada

Dr Frits P Agterberg is Emeritus Scientist with the Geological Survey of Canada. In the early 1990s he was but one of many scientists with NRCan’s precursor. Several are members of Canadian Advisory Committees to the International Organization for Standardization. I have never met Agterberg at any such event. I derived a method that gives confidence limits for metal contents and grades of mined ores and mineral concentrates. It was approved as ISO DIS 13543, Determination of Mass of Contained Metal in the Lot. What I wanted to do was apply the same method to in-situ ores and coals. Cominco’s geologists taught me a bit about kriging and smoothing but I didn’t get the gist of it. A mining company gave me a set of gold assays for ordered rounds in a drift to play with.

My son and I studied a few geostat books. We found David’s 1977 Geostatistical Ore Reserve Estimation to be short on statistics and long on geostat drivel. David did pay tribute to the ‘famous’ Central Limit Theorem but didn’t take to working with it. Neither did he take to counting degrees of freedom. Degrees of freedom played a cameo role when he pointed to an earlier work of geologists. David didn’t derive confidence limits for metal grades and contents of ore deposits. He was right on cue when he wrote “…statisticians will find many unqualified statements…” David didn’t write he would throw a temper tantrum if any one dared to.

David was but one of a score of geostat thinkers who thrashed Precision Estimates for Ore Reserves and made a mockery of peer review in the process. But did they ever know how to stake out their own turf! They coined all sorts of terms and never stopped nattering nonsense neologisms among themselves. I’m not gifted in verbal discourse. It took me a while to grasp that kriged estimates, kriged estimators, estimated values, and simulated values, are birds of a feather in geostat speak. What did blow my mind were infinite sets of kriged estimates, zero kriging variances, and a dreadful disrespect for degrees of freedom. Who could have cooked up so much poppycock?

Agterberg did quite a bit of it. He cooked up a distance-weighted average point grade that didn’t have a variance. He failed to put in plain words why his function lost its variance. Neither did he ever tell me why his zero-dimensional distance-weighted average point grade didn’t have a variance. It led me to guess that this lost variance wasn’t his proudest feat. What is still beyond Agterberg’s grasp in 2009 is one-to-one correspondence between functions and variances.

Matheron’s new science of geostatistics drifted across the Channel and the Atlantic Ocean and made landfall on the North American continent in 1970. The mining industry was gung-ho to swallow least biased subsets of infinite sets of kriged estimates with hook, line and sinker. Kriging and smoothing sounded so soothing. How its practitioners could beat the odds of selecting least biased subsets of infinite sets of kriged estimates troubled but a few. The list of those who couldn’t care less would stack a Mining Hall of Shame. I got to the bottom of Matheron’s odd statistics long before the Bre-X fraud. But no one cared!

I took my time to find out who lost what, when and where. It was Agterberg who brought to light a typical geologic prediction problem in 1970. I took a look and saw a distance-weighted average. He found a typical kriging problem in 1974. But I saw the same distance-weighted average.

Typical geologic prediction problem
Typical kriging problem

Here are a few of Agterberg’s real problems. He didn’t know how to test for spatial dependence between his ordered point grades. He didn’t know how to derive the variance of his distance-weighted average point grade. He didn’t know how to count degrees of freedom either for the set or for the ordered set. Yet, Agterberg does point to degrees of freedom on pages 174, 190 and 254 of his 1974 Geomathematics.

What’s more, Agterberg didn’t take to door-to-door sales walks. Such a walk would visit each point only once and cover the shortest possible distance between all points. He could then have applied Fisher’s F-test to the variance of the set and the first variance term of the ordered set. Agterberg himself did apply Fisher’s F-test on page 187 of his 1974 Geomathematics. And he does refer to Sir Ronald A Fisher‘s work on nine (9) pages!

Agterberg does not refer to Dr Jan Visman’s work. Visman was a Dutch coal mining engineer who worked with the Dutch State Mines during the war. His PhD thesis proved the variance of the primary sample selection stage to be the sum of the composition variance and the distribution variance. Visman immigrated to Canada in 1951 and worked with the Department of Mines and Technical Surveys in Ottawa. He wrote Towards a Common Basis for the Sampling of Materials (Research Report R 93, July 1962). The advent of ash analyzers for coal and on-stream analyzers for slurry flows led to a fundamental understanding of what spatial dependence between measured values in ordered sets is all about. Yet, spatial dependence in sampling units and sample spaces stayed as profound a mystery to the geostatocracy as were the properties of variances.

Agterberg prefers oral criticism. Once upon a time he did reply in writing. On October 11, 2004, he called me …an iconoclast with respect to spatial statistics and kriging.” He insisted, “By now this approach is well established in mathematical statistics.” He got it all wrong again. Kriging and mathematical statistics have as little in common as alchemy and chemistry. What is ringing kriging's bell is climate change. That’s were Agterberg’s zero-dimensional distance-weighted average will always have a variance. Whether he likes it or not!

Wednesday, April 01, 2009

Warming up a little or a lot

Our world is not getting hot any time soon. Where it’s getting hot is under the collars of those who thought up global warming. Al Gore and the UN Intergovernmental Panel on Climate Change were awarded the 2007 Nobel Peace Prize for thinking up global warming. The almost US president and his UN think tank do think it’s getting warmer. So, it’s got to be so! They are telling tall tales and crafting cool books. A few worked up a frenzy worrying it doesn’t get warm soon enough. Others upped the odds by predicting it does so at an alarming rate. That sort of scare works so long as nobody knows the standard rate of warming for a little planet like ours.

Al Gore’s Inconvenient Truth is a tour de force in child psychology. It shows a polar bear on an ice floe drifting at some cool spot somewhere in our world. It’s scene that pulls a child’s heartstring just as much as does a puppy under a Christmas tree. Our offspring may be around long enough to witness that what goes up must come down if only because the sun is beyond control of church and state. And that’s just as well. Nobody worries much that the sun itself is running out of hydrogen at an astounding rate. The good news is that its hydrogen will last some 10 billion years. The bad news is that it's numerically a lot less than IMF’s trillion dollars.

Apollo’s Blue Marble photograph

NASA satellites transmit more than stunning photographs. Massive sets of temperatures have been transmitted ever since this famous photograph was shot in December 1972. It took me aback that annual temperatures in the lower troposphere display spatial dependence. Until I was told that long term cycles in ocean currents do impact lower troposphere temperatures. That’s why it makes scientific sense to verify spatial dependence in our own sample space of time. By inverse logic, it is a scientific fraud to assume spatial dependence without proof.

Bad luck had it awhile back that a NASA satellite failed to deploy. This one was to measure carbon dioxide concentrations in the troposphere. Geoscientists might have found out some 30 years later how carbon dioxide concentrations drive the greenhouse effect. That’s why patience is as much a virtue in the study of climate change as is a good grasp of statistics. Geostatistical data analysis is a catch-22 in the sense that interpolation between measured values creates an appearance of spatial dependence where it doesn’t exist.

Much of the USA and most of Canada is missing in the Apollo photograph. The USA may not have felt like showing a lot below the 49th Parallel in those days. Canada’s vastness stretches from the Atlantic ocean to the Pacific ocean, and winds up into the arctic where Northern Lights shimmer when the sun takes leave during long winters. Canada’s vastness twists and turns into a multitude of different climate zones. Environment Canada (EC) manages a treasure trove for those who take the study of climate change seriously.

EC’s Adjusted Historical Canadian Climate Data Base gives temperatures for a large number of locations dating back to the 1930s. I was given permission to access EC's database. I downloaded temperatures for the international airports at Calgary, Alberta, at Ottawa and Toronto, Ontario, and at Vancouver and Victoria, British Columbia. I also downloaded temperatures for Coral Harbour, Territory of Nunavut. I did so at different times and for different reasons. Excel 2007 spreadsheet templates give the statistics for each set, a plot of the annual means, and a chart with the sampling variogram. The most relevant statistics are summarized below.

Summary of statistics for six locations in Canada

For the Toronto Lester B Pearson International Airport the difference of 2.30 centigrade between the first annual mean of 6.00 in 1940 and the last annual mean of 8.30 centigrade in 2008 is statistically significant at 95% probability. Higher annual temperatures do account for the observed difference of 2.30 centigrade. Observed differences at other locations are not significant. Randomly distributed variations in measured temperatures account for those observed differences.

Annual means at Toronto Lester B Pearson International Airport

A plot of annual differences in a chart shows a distinct trend toward higher temperatures. Such a trend also indicates spatial dependence between annual temperatures in the ordered set. A sampling variogram is a chart in which the variance terms of the ordered set are plotted against the variance of the set and the lower limits of its asymmetric 95% and 99% confidence ranges. It shows where orderliness at the selected location in our sample space of time dissipates into randomness.

Toronto Lester B Pearson International Airport

The sampling variogram for annual temperatures measured at the Toronto Lester B Pearson International Airport displays a significant degree of spatial dependence. The question is then why geostatisticians like to assume spatial dependence rather than verify it by applying Fisher’s F-test to the variance of the set and the variance terms of the ordered set. The mining industry is pleased to krige and smooth from here to eternity. Professional engineers and geoscientists with provincial securities commissions, too, do krige and smooth with the best. But here's the cinch! NASA and NOAA are not about to krige and smooth because the mining industry says so. What should the Harper Government do when the rules of statistics are rigged? Goverments do not sort out abuse of statistics. Good grief! It's the Great Lake Study where sound statistics will resurface. Come cold or warm water!