Sunday, June 28, 2009

Teaching junk statistics at Stanford

Stanford University is Professor Dr Andre G Journel’s world. He has put down deep roots at Stanford since 1978. Journel teaches the same flaky stats that Professor Dr Georges Matheron taught him between 1969 and 1978. Journel was Matheron’s most gifted student. Matheron taught him all of the ins and outs of his novel science of geostatistics. Matheron may not have told Journel that he thought in 1954 he was a statistician. It took almost ten years to teach Journel how to assume, krige, and smooth with a lot of confidence and pride. Journel was Mining Project Engineer at the Centre de Morphology Mathematique from 1969 to 1973, and Maitre de Recherches at the Centre de Geostatistique from 1973 to 1978. Not surprisingly, he worked as profusely with symbols as Matheron did in his magnum opus. What Matheron failed to show his star disciple is how to test for spatial dependence between ordered sets of measured values in sample spaces and sampling units. Matheron and Journel never found the lost variance of Agterberg's distance-weighted average point grade.

Journel is the lead author of Mining Geostatistics. When the ink had dried in 1978 he took his book to Stanford’s students and taught them all about assuming, kriging and smoothing. My copy is a “1981 reprint with corrections.” Matheron’s Foreword makes a deeply dense read. In contrast, Dr Isobel Clark’s Preface to her 1979 Practical Geostatistics makes an easy read. Her cradle once rocked on the side of the Channel where Sir R A Fisher was knighted. Clark confessed it was Journel who taught her all she knows about the Theory of Regionalized Variables. Clark messed up degrees of freedom for ordered sets of measured values. She slashed for "mathematical convenience" the factor 2 in df₀=2(n-1) degrees of freedom for ordered sets, cooked up her silly semi- variogram, and scolded the poor souls who “sloppily call it a variogram”. Clearly, Clark and Journel disagreed about semi-variograms and variograms. Neither knew how to test for spatial dependence, how to chart sampling variograms, or how to count degrees of freedom.
Matheron’s 1978 Foreword to Mining Geostatistics went off on a tangent just as much as did his 1954 Note statistique No 1. He beat around the bush about geologists who “stress structure” and statisticians who “stress randomness.” Matheron’s point of view flies in the face of Visman’s sampling theory with its composition and distribution variances. Matheron predicted, “The user of Mining Geostatistics will come across nothing more than variances and covariances, vectors and matrices”. Matrices and vectors do indeed abound from cover to cover but so do pseudo variances and pseudo covariances. What all those so called “variances” and “covariances” in Mining Geostatistics do have in common with genuine variances and covariances are squared dimensions. The concept of degrees of freedom, too, failed to make the grade in Matheronian geostatistics. And that’s what will kill the kriging game!
I came across a genuine variance in a numerical example on page 63 of Mining Geostatistics. The authors divided a stope into four equal units, and assigned to each unit a grade equal to the outcome of a cast of “an unbiased six-sided die.” Now that does indeed give a genuine variance. Casting an unbiased die a large number of times gives a uniform probability distribution with a population mean of μ=3.5 and a population variance of σ²=2.917. The authors deserve praise for giving correct values, and for pointing out that the die ought to be unbiased. Surely, Stanford’s students ought to be taught how to measure the risk of playing all sorts of games of chance.

No real data in 1954 - Casting dice in 1978

The set of three (3) stopes is given on the same page. Each set of four units within its stope was put together with a six-sided unbiased die such that each unit has the same mean of 3.5. That sort of applied research is time-consuming but of critical importance when teaching all of the intricacies of geostatistics. A touch of classical statistics is required to test whether or not a given die is unbiased. The question of whether Journel's die was biased may have been solved by assuming it was unbiased. Fisher’s F-test shows that the variances of the sets and the first variance terms of ordered sets are statistically identical. Read what Journel said about “Fischerian (sic) statistics” in October 1992. How’s that for creative thinking and writing?
The zero kriging variance of σ²k=0 is given on page 308, Chapter V The Estimation of in situ resources in Mining Geostatistics. Another unique feature of Matheronian geostatistics is one-to-one correspondence between zero kriging variances and infinite sets of kriged estimates. Even the OCS might find it a bit of a stretch to report a 95% confidence interval of zero ounces of gold for a mineral inventory with 9.9 million ounces.
Armstrong and Champigny solved this Catch-22 with a strict caution against over-smoothing. They did so in A Study on Kriging Small Blocks, CIM Bulletin, March 1989. The study implies that requirement of functional independence may be violated a little but not a lot All that geostatistical gobbledycook is cooked up because one-to-one correspondence between distance-weighted averages and variances became null and void in Agterberg's 1974 Geomathematics.
On a positive note, Dr John L Hennessy, Stanford’s President, is but one of the few leaders at institutes of higher learning who did bother to respond to my letters.

On August 23-28, 2009, IAMG’s Annual Conference will be held at Stanford University. What a wonderful opportunity for Stanford's President to peek around the corner and ask why the variance of Agterberg’s distance-weighted average point grade is still missing. Or he might ask Professor Dr Persi Diaconis to pose a few questions on his behalf. Diaconis is Stanford's Mary V Sunseri Professor of Statistics and Mathematics. He’ll know all about the Central Limit Theorem and its role in sampling theory and practice.

Monday, June 15, 2009

Geostatistics continues to evolve as a discipline

That's what Mark Corey wrote when Canada's Minister of Natural Resources asked him to respond to my message. Mark Corey is Director General Mapping Services Branch and Assistant Deputy Minister, Earth Sciences Sector. He is the chief mapmaker for NRCan so to speak. I was ticked off big time when he called geostatistics a discipline. But I told myself it could have been worse. He could have called it a scientific discipline. He is also one of several experts behind NRCan's 2008 "bulletproof" climate report. He testified at the Senate Committee for Energy, Environment and Natural Resources. I wish I could have asked him a few questions.
What I wanted him to tell me in plain words is why each and every distance-weighted average point grade doesn't have its own variance. Dr Frits P Agterberg thought his distance-weighted average point grade didn't have a variance in the early 1970s. Agterberg was wrong then. He's wrong now. It's high time for NRCan's Emeritus Scientist to explain why his distance-weighted average point grade still doesn't have a variance in 2009!
None of the five (5) points in the next picture have anything to do with pixels on a map. Each point stands for some sort of hypothetical uranium concentration that was measured in some way in samples selected in this sample space at positions with known Easting and Northing coordinates. I didn't make it up but Dr Isobel Clark did in her 1979 Practical Geostatistics. She worried whether or not the Central Limit Theorem would hold so she didn't derive it. Clark's figure would have been a dead ringer for Agterberg's 1970 and 1974 figures if it were not for her hypothetical uranium concentrations.

Fig. 1.1. Hypothetical sampling and estimation situation
Fig. 4.1. Hypothetical sampling and estimation situation - a uranium deposit

I want to prove Clark's set of hypothetical uranium concentrations does not display a significant degree of spatial dependence. So, let's take a systematic walk that visits each point only once and covers the shortest possible distance. Clark's selected position is not equidistant to each of her hypothetical uranium concentrations. That's why the number of degrees of freedom is not a positive integer but a positive irrational. Applying Fisher's F-test to var(x) = 4,480, the variance of the set, and var1(x) = 3.640, the first variance term of the ordered set, gives an observed F-value of F = 4,480/3,640 = 1.23. This observed F-value does not exceed the tabulated F-value of F0.05;4;4.90 = 6.38 at 95% probability. Therefore, Clark's distance-weighted average hypothetical uranium concentration of 371 ppm is not an unbiased estimate.
Clark didn't need Agterberg's approval to derive confidence limits and ranges for this point grade. Neither did I and came up with a 95% confidence interval of 95% CI = +/-111 ppm or 95% CI = +/-29.8%rel, and a 95% confidence range with a lower limit of 95% CRL=261 ppm and an upper limit of 95% CRU=482 ppm.
Here's what I would want Mark Corey to do. Visit NRCan's Emeritus Scientist in the privacy of his ivory tower and borrow his 1974 Geomathematics. Go to Chapter 6 Probability and Statistics and look at Fisher's F-test in Section 6.13. That will be all. At least for now!

Monday, June 01, 2009

Not quite fit for professional statisticians

Professor Dr Michel David said so himself. He pointed out his textbook is not for professional statisticians. He was talking about his very first textbook. I bought a copy of Geostatistical Ore Reserve Estimation, and worked my way through it. David was dead on when he predicted, "…statisticians will find many unqualified statements here.” All I really wanted to know is how David derived unbiased confidence limits for metal grades and contents of ore deposits. But he didn't do it! Why would the author of the very first textbook on geostatistics fail to show how to derive unbiased confidence limits?

I had derived unbiased confidence limits for metal grades and contents of concentrate shipments. Mines and smelters want to know the risks associated with trading mineral concentrates. Metal traders were keen to work with my method and several ISO Technical Committees approved it. So, we put together an analogous method, called it Precision Estimates for Ore Reserves, and submitted it for review to CIM Bulletin. I still don’t know why our paper ended up on David’s desk. What I do know is that David blew a fuse when he saw we didn’t even refer to geostatistics let alone work with it.

In Section Combination of point and random kriging, David refers to Maréchal and Serra’s Random kriging. These authors were with the Centre de Morphology Mathematique when they presented it at the celebrated Geostatistics colloquium on campus at the University of Kansas, Lawrence on June 7-9, 1970. In a section called Punctual Kriging these authors showed nine measured grades and sixteen functionally dependent grades.

Figure 10 - Grades of n samples belonging to
nine rectangles P of pattern surrounding x

M&S’s Figure 10 morphed into Figure 203 on page 286 of David’s 1977 book. On the same page David claimed, “Writing all the necessary covariances for that system of equations is a good test to find out whether one really understands geostatistics.” What David didn't do was take a systematic walk that visits each hole only once and covers the shortest distance. But neither did Agterberg in 1970. Nor did M&S take a systematic hike on campus at that time.

Fig. 203. Pattern showing all the points within B,
which are estimated from the same nine holes

Each of David's sixteen points within B is in fact a distance-weighted average point grade. It makes no sense at all to derive the false covariance of a set of functionally dependent values and ignore the true variance of the set of nine measured values. David did sense something was amiss. In Section 12.2 Conditional Simulations of Chapter 12 Orebody Modelling he confessed , “There is an infinite set of simulated values which will have these properties.”

Infinite set of distance-weighted average point grades
each derived from the same set of nine holes

Counting degrees of freedom for his set of nine holes would have been a foolproof test to find out whether David really understood statistics. What he looked at in this black hole were Agterberg's distance-weighted average point grades. Each is a zero- dimensional point grade. And each one of them lost its variance on Agterberg's watch. Dr F P Agterberg, Emeritus Scientist with Natural Resources Canada, did approve Abuse of Statistics but wasn't himself into testing for spatial dependence and counting degrees of freedom.

David's 1977 Geostatistical Ore Reserve Estimation and Journel and Huijbregts's Mining Geostatistics rank among the worst textbooks I've ever read. Until David's 1988 Handbook of Applied Advanced Geostatistical Ore Reserve Estimation came along. His work was founded by the Natural Science and Engineering Research Council of Canada with Grant No 7035. What a waste!