Monday, October 27, 2008

How to lie with geostatistics

Here’s how to in a nutshell. The most brazen lie of all was to deny that weighted averages do have variances. The stage for this lie was set at the French Geological Survey in Algeria on November 25, 1954. It came about when a novice in geology with a knack for probability theory put together his very first research paper. The author had called his paper Formule des Minerais Connexes. He had set out to prove associative dependence between lead and silver in lead ore. He worked with symbols on the first four pages. Handwritten on page 5 are arithmetic mean grades of 0.45% lead and 100 g/t silver, variances of 1.82 for lead and 1.46 for silver, and a correlation coefficient of 0.85. He had worked with symbols until page 5 and did omit his set of primary data. Neither did he refer to any of his peers. Those were peculiar practices that would remain this author's modus operandi for life.

This budding author was to be the renowned Professor Dr Georges Matheron, the founder of spatial statistics and the creator of geostatistics. What young Matheron had derived in his 1954 paper were arithmetic mean lead and silver grades of drill core samples. But he had not taken into account that his core samples varied in lengths. So he did derive length-weighted average lead and silver grades and appended a correction to his 1954 paper on January 13, 1955. What he had not done is derive the variances of his length-weighted average lead and silver grades. Neither did he test for, or even talk about, spatial dependence between metal grades of ordered core samples. Matheron’s first paper showed that testing for spatial dependence was beyond his grasp in 1954.

Why was Formule des Minerais Connexes marked Note statistique No 1? Matheron had not derived variances to compute confidence limits for arithmetic mean lead and silver grades but applied correlation-regression analyis. Statisticians do know that the central limit theorem underpins sampling theory and practice. So why didn’t young Matheron derive confidence limits? Surely, he was familiar with this theorem, wasn't he? Or was it because he thought he was some sort of genius at probability theory? That would explain why he worked mostly with symbols and rarely with real data. Had he worked with real data, he would still have cooked up odd statistics because the variances of his central values went missing. That’s why he was but a self-made wizard of odd statistics. It was Matheron who called the weighted average a kriged estimate as a tribute to the first mining engineer who took to working with weighted averages. Matheron never bothered to differentiate area-, count-, density-, distance-, length-, mass- and volume-weighted averages. But then, neither did any of his disciples.

Matheron’s followers, unlike real statisticians, didn’t take to counting degrees of freedom. Statisticians do know why and when degrees of freedom should be counted. Geostatisticians don’t know much about degrees of freedom but they do know how to blame others when good grades go bad. They always blame mine planners, grade control engineers, or assayers whenever predicted grades fail to pan out. They claim over-smoothing causes kriging variances of kriged estimates to rise and fall. Kriging variances rise and fall because they are pseudo variances that have but squared dimensions in common with true variances. Of course, Matheron’s odd new science is never to blame for bad grades or bad statistics.

It is a fact that Matheron fumbled the variance of his length-weighted average in 1954. Several years before the Bre-X fraud I derived the variance of a length- and density-weighted average metal grade. The following example is based on core samples from an ore deposit in Canada. The mine itself is no longer as Canadian as it once was. The Excel template with the set of primary data and its derived statistics are posted on a popular but wicked website.

My website was set up early in the Millennium. I loved to send emails with links to my reviews of Matheron’s new science of geostatistics. Students at the Centre de Géostatistique (CDG) in Fontainebleau, France, ranked on high on my list of those who ought to pass Statistics 101. I was pleased when PDF files of Matheron’s work were posted with CDG’s online library. But I was surprised to find out that Matheron’s first paper was no longer listed as Note statistique No 1 in the column marked Reference but as Note géostatistique No 1. Just the same, the PDF file of this paper and its appended correction are still marked Note statistique No 1. On October 27, 2008, five out of six of Matheron's 1954 papers were still marked Note statistique Nrs 2 to 6.

What was going on? Was the birth date of Matheron’s new science of geostatistics under review? Who reviewed it? And why? Why not retype the whole paper? Why not add the variances of length-weighted average lead and silver grades? And how about testing for spatial dependence between metal grades of ordered core samples? Where have all of Matheron’s sets of primary data gone? And what has happened to his old Underwood typewriter? I have so many questions but hear nothing but silence!

Matheron himself moved from odd statistics to geostatistics in 1959 when he went without a glitch from Note statistique no 19 to Note géostatistique no 20. Check it out before geostat revisionists strike again. I admit to having paraphrased Darrell Huff’s How to lie with statistics. But I couldn’t have made up that this delightful little work was published for the first time in 1954. That’s precisely when young Matheron was setting the stage for his new science of geostatistics in North Africa. Matheron, the creator of geostatistics, never read Huff’s work. But then Huff didn't read Matheron’s first paper either. Thank goodness Darrell Huff’s How to lie with statistics is still in print!