Monday, July 19, 2010

NRCan stuck with geostatistics

Dr Frederik P Agterberg went to work with geostatistics long before NRCan stood short for Natural Resources Canada. He is still Emeritus Scientist with NRCan’s Geological Survey of Canada. He is one of the most gifted geostatisticians in the world. As such, he has a soft spot for Professor Dr Georges Matheron and a penchant for his magnum opus. So much so that he called him the Founder of Spatial Statistics. He did so after Matheron had passed away in 2000. Matheron’s disciples didn’t agree with Agterberg’s view. Matheron taught them how to assume, krige and smooth with infinite confidence. So, they thought of him as the mastermind behind the Centre de Géostatistique and the Centre de Morphology Mathematique. What Matheron taught his disciples was inspired by one or other innovative theme that would call on his most creative thinking. That’s why they thought of him as the Creator of Geostatistics.

Professor Dr Georges Matheron (1930-2000)
Creator of Geostatistics
Founder of Spatial Statistics
Self-made Wizard of Odd Statistics

My son and I checked out what sort of new science Matheron had created. We found out in 1989 that his new science of geostatistics is an invalid variant of applied statistics. As a matter of fact, the author of the very first textbook on geostatistics couldn’t possibly have scored a passing grade on Statistics 101. He praised the “famous Central Limit Theorem” but failed to work with it when he should have. He never tested for spatial dependence between measured values in ordered sets. That’s why it didn’t take us long to get to the bottom of what was wrong. The author had seen fit to shelve degrees of freedom. But it took a long time to unravel what else was shelved, who did it, when, where and why. The power of the internet did make it possible to trace Matheron’s new science to its roots in the early 1950s. In those bygone days young Matheron was a budding geologist who went to work with what he then thought was applied statistics. As such he proved a tenuous grasp of applied statistics early on during his calling.

What struck me as a bad omen for Matheron’s new science of geostatistics was the fact that his grasp of the properties of variances can be traced to the French school of sampling in-situ ores and mined ores. What gets me hopping mad is sloppy sampling and statistics. That’s why I want to put in plain words what Matheron did in the early 1950s. In those days he was a novice geologist with the French Geological Survey (BRGM) in Algeria. His very first paper was called Formule des Minerais Connexes. It was dated November 25, 1954 and marked Note Statistique No 1 straight above its title. It would seem that Matheron saw himself as a statistician of sorts. All the same, CdG’s webmaster early in this century saw fit to mark his very first paper as Note Géostatistique No 1. Perhaps a dash of deception but too little and too late to fool anyone but the odd hardcore kriger.

Matheron’s Note Statistique No 1 was not reviewed by his peers. It took me quite a while to find out that Matheron was without peers. That was just as well since he didn’t quote literature on statistics. What was worse is that he didn’t report any primary data for l’Oued-Kebir. In fact, reporting primary data ranked just as low on his list of things to do as did counting degrees of freedom. Matheron derived population means of μ1=0.45% for lead and μ2=100 g/t for silver, and population variances of σ1=1.82%2 for lead and σ2=1.46 (g/t)² for silver. How he could have done so much with so little is a mystery. A finite set cannot possibly give population means and variances. But that’s what Matheron thought he got! He tested for associative dependence between lead and silver grades and got a correlation coefficient of ρ=0.85. He didn’t point out whether this correlation coefficient was statistically significant at 95%, 99% or 99.9% probability. What put a monkey wrench in Matheron’s first crack at statistics is that he didn’t test for spatial dependence. As luck would have it, Stanford’s Journel and Matheron’s most talented disciple put forward in 1992 that spatial dependence between measured values in ordered sets may be assumed. Testing for spatial dependence was just as trendy in 1992 as it was in 1954.

Matheron may have had a bit of an epiphany when it hit him that l’Oued-Kebir core samples did vary in length. That’s why on January 13, 1955 he tagged on a Rectificatif to his Note Statistique No 1. What he didn’t tag on were the lengths of his core samples. How he derived weighing factors was as clear as drill mud. The same weighting factors should have been applied when he tested for associative dependence between lead and silver. Weighting factors should also have been applied to test for spatial dependence between metal grades determined in ordered core samples of variable length from a single borehole. He didn’t know that degrees of freedom are positive irrationals when core samples vary in length. In other words, he didn’t even know how to fingerprint boreholes. What a shame that peerless Matheron kept on writing more of the same.

No matter what odd statistics Matheron did cook up it would smoothly pass his own peer review. It did make a mockery of statistics that his new science of geostatistics made it all the way to the USA in 1970. Tagged along on the trip were A Marechal and J Serra. They would assist Matheron in making a strong case for his novel science. The stage was set on campus at the University of Kansas for a geostatistics colloquium on 7-9 June 1970. In those days, Matheron, Marechal and Serra were geostatistical scholars at the Centre de Morphology Mathematique at Fontainebleau, France. Matheron himself had thought up Brownian motion along a straight line. It would set the stage for random functions to be continuous between measured values. He did so in his rambling Random Functions and their Application in Geology. What was still beyond Matheron’s grasp is how to verify spatial dependence by applying Fisher’s F-test to the variance of a set of measured values and the first variance term of the ordered set. It did so since counting degrees of freedom had not yet made Matheron’s brief list of significant things to do.

Marechal and Serra in Figure 10 of Random Kriging show how to derive a set of sixteen distance-weighted averages from a set of nine measured values. What M&S didn’t derive was the variance of each and every distance-weighted average. David in 1977 took a long look at M&S’s data but didn’t derive the variance of each and every distance-weighted average either. What David did run into were infinite sets of distance-weighted averages. Journal in 1978 was so taken with David’s infinite set of distance-weighted averages that he took the zero variance with hook, line and sinker. Matheron in 1960 found out about D G Krige’s work at the Witwatersrand gold complex in South Africa and came up all sorts of krige-inspired adjectives and verbs. That’s in a nutshell why Matheron’s new science of geostatistics took on a life of its own for no reason whatsoever.

Agterberg’s 1970 Autocorrelation Functions in Geology was about what he then called “a geologic prediction problem”. He defined a set of measured values at unevenly distributed positions in a sample space. His problem was not so much how to derive the value of the stochastic variable at the selected position. His real problem was that he didn’t derive the variance of his distance-weighted value at the selected position. On a positive note, he didn’t make a point of the fact that his set of measured values didn’t define an infinite set of variance-deprived distance-weighted average values. He didn’t test for spatial dependence by taking a systematic walk that visits each measured value but once, and that covers the shortest possible distance between all positions. Agterberg’s 1970 geologic prediction problem popped up as “a typical kriging problem” in his 1974 Geomathematics.

A geologic prediction problem in 1970! A typical kriging problem in 1974! The eulogy for Matheron in 2000! The silence of NRCan’s Emeritus Scientist in 2010! What’s the matter with NRCan’s brass? When will Dr Frederik P Agterberg be asked to explain why his distance-weighted average doesn’t have a variance?

Back when NRCan was Canmet I knew several of its scientists. Most of all I remember Dr Jan Visman, a Dutch mining engineer with a keen interest in coal processing. He was an accidental sampling expert of sorts because of his need to understand coal processing. We were members of ASTM D05 on sampling and analysis of coal. He headed the Western Regional Laboratories of the Department of Mines and Technical Survey until his retirement in 1976. I owe him a debt of gratitude. His 1947 PhD thesis made it clear that the variance of the primary sample selection stage is the sum of composition and distribution components. That's why I’m keeping his memory alive on Wikipedia.

Dr Robert Sutarno was a true expert on applied statistics in general and interlaboratory test programs in particular. He taught me a lot about statistical analysis of interlaboratory test programs, and about preparation and certification of reference samples. I met him when we were members of the Canadian Advisory Committee to ISO Technical Committee 102 on iron ore. We traveled to Japan in 1974. The fact that Bob spoke Dutch was a bonus for me. I was more conversant with German and French than with English when we came to Canada in 1969. But that's not the end of my case against geostatistics!