Thursday, January 29, 2009

Born krigers do more with less

Teaching statistics to born krigers takes a long time. What born krigers do best is fit curves. Think what happens when a curve is fitted through a set of measured values. Most of all there’s much pride and joy. A perfect curve is indeed a thing of beauty. Look at the Fourier transform of Wölfer annual sunspot counts from 1700 to 1987. Isn’t it as stunning as the original plot? It does put into perspective the power of mathematics when applied to an ordered set of measured values in our own sample space of time. A perfect fit is of less interest in my work than the statistics behind ordered sets of measured values. For example, I applied mathematical statistics to derive the statistics of Wölfer annual sunspot counts for the period from 1749 to 1924. I work with spreadsheet software because it is such a powerful tool to show and tell. Several Excel files are posted on my website under Statistics for geoscientists.

Wiki’s Kriging doesn’t test for spatial dependence between measured values in an ordered set. Wiki’s keepers of Krige’s grail didn’t even try. Here’s what they wrote about kriging, “The theory behind interpolation and extrapolation by Kriging was developed by the French mathematician Georges Matheron based on the Master's thesis of Daniel Gerhardus Krige.” It’s short and crisp but not to the point.

Krige’s 1951 Master thesis brings up ‘knowledge of mathematical statistics’, ‘careful statistical analysis’, ‘science of statistics’, ‘modern statistical basis’, ‘application of statistics’, and so on. It does read like a thesis on statistics, doesn’t it? Nowhere did Krige bring up ‘geostatistics’. A 2003 Tribute to Krige alluded to “…his pioneering work in the application of mathematical statistics…” The same tribute alluded to Krige’s 1952 paper in which he “introduced, inter alias, the basic geostatistical concepts of ‘support’, ‘spatial structure’, ‘selective mining units’, and ‘grade-tonnage curves’. Did it take Krige one year and a bit of inter alias to switch from real statistics to a pinch of between-the-lines geostatistics? Not quite! He was a committed geostatistician when he wrote the Preface to David 1977 Geostatistical Ore Reserve Estimation. But when did Krige really take to kriging?

Matheron’s Note Statistique No 1 saw the light of day in North Africa on November 25, 1954. He coined the first krige-inspired eponym in his 1960 Krigeage d’un panneau rectangulaire par sa périphérie. Matheron didn’t refer to Krige’s 1951 Master thesis. Neither did he much refer to anyone’s work but his own. In those early days Matheron himself dawdled between statistics and geostatistics. But he was not much of a statistician even though he thought he was one.

It makes sense to compare Wiki’s Kriging with Krige’s teachings. Look at Figure 1 in Wiki’s Kriging. The graph didn’t irk me quite as much as did the confidence intervals for measured values. Once upon a time I tried to get the set of measured values that underpin Figure 1 but its caretaker(s?) didn’t respond. So, I waited until it was time to take a stand against junk statistics on Wikipedia.

Figure 1

One-dimensional data interpolation by kriging, with confidence intervals.

Squares indicate the location of the data.

Kriging interpolation is in red.

Confidence intervals are in green.

I enlarged Figure 1 and measured X- and Y-coordinates for all points in mm. I tested for spatial dependence by applying Fisher’s F-test to the variance of the set and the first variance term for the ordered set. I applied weighting factors because of unevenly spaced measured values. That’s why degrees of freedom become irrational numbers.

Given that the observed value of F=var1(x)/var(x)=1,504/1,408=1.07 is below the tabulated value of F0.05;dfo;df=6.04, it follows that the ordered set of measured values does not display a significant degree of spatial dependence. Hence, measured values in the ordered set are randomly distributed within this sample space. Therefore, interpolation between measured values makes as much sense as extrapolation beyond the set. As a matter of fact, it does give junk statistics of the worst kind wherever and whenever randomness rules. I do not know how Wiki’s Kriging caretakers cooked up the confidence intervals in Figure 1. I applied plain vanilla statistics and plotted 95% confidence intervals in this bar graph.

The bars in this graph, unlike the measured values in Figure 1, are evenly spaced. I’ll show in another block that interpolation between measured values in an ordered set does indeed give the same sort of junk statistics as did Bre-X and the kriging game.

Tuesday, January 20, 2009

Working with Wikipedians

Wikipedia is a wonderful source of information for all of us while we are doing our time in this universe. Wiki is reliable as a rule and tries to do right when in doubt. For example, under Geostatistics Wiki points out, “This article is in need of attention from an expert on the subject. WikiProject Geography or the Geography Portal may be able to help recruit one”. No kidding! Wiki’s expert would have to be some kind of jack-of- all-sciences. So many disciplines do have a role to play in geography.

Geologists and mining engineers got stuck with geostatistics when Matheron goofed but thought he had dug up a new science. They were taught not work with the Central Limit Theorem and to infer ore between widely spaced boreholes. To infer ore between step-out boreholes at a spacing of 200-m worked well indeed in the Bre-X case. On the other hand, to infer spatial dependence between closely spaced pixels makes sense. When I tested for spatial dependence between gold grades of ordered rounds in a drift, Journel called me "too encumbered" with Fisher’s statistics. It’s not surprising then that geoscientists at Stanford are taught to assume, krige and smooth voodoo variances. Geoscientists with a passion for order tend to do curve-fitting. Too many are led to believe that geostatistics is good for geoscientists. I know that geoscientists would enjoy working with real statistics just as much as Sir Ronald A Fisher once did.

I tried to add applied statistics to Wiki’s Geostatistics when it was still called Kriging. I did so when I was a new Wikipedian in 2005. I knew then that geostatistics is an invalid variant of applied statistics. My son and I had known why since the early 1990s. What I didn’t know in 2005 is who stripped the variance off the distance-weighted average. But I do know now who did and when! What I do not know is why. I’ll continue to explain my case against geostatistics in concise terms and with significant symbols. I do so not only as a member of several ISO Technical Committees but also as a blogger, as a webmaster, and, last but not least, as a Wikipedian.

Most Wikipedians have a strong need to leave a better informed world than we found. I’m no exception. I hold an edge in always having worked with applied statistics and grasped Visman’s sampling theory and practice. I know that geostatistics converted Bre-X’s bogus grades and Busang’s barren rock into a massive phantom gold resource. What I also know is that bogus assays for three to five salted boreholes would have been enough to nip this mind-boggling fraud in the bud. The world’s mining industry doesn’t want to know is what I would have done!

Neither does Pierre-Jean Lafleur want to know. He is a Professional Engineer and a reserve and resource expert with Watts, Griffis, and McOuat Limited. He doesn’t believe I called the Bre-X fraud several months before the boss salter vanished. Lafleur wrote, “The information he provides is unclear, and most likely untrue”. So he wiped it off Wiki’s Bre-X Minerals. Neither may he believe it was not I who put my name on that Wiki subject. But what I did do when my name came up with the wrong context was add the facts and a few links to subjects such as spatial dependence and sampling variogram.

Lafleur deserves some praise because he doesn’t work under a nom-de-plume. Too many Wikipedians work anonymously. When scientists and engineers want to be taken seriously on Wikipedia they should stand up and be counted. Wikipedia should not allow Wikipedians who hide behind pseudonyms to delete indisputable scientific facts. Examples in my discipline of sampling and statistics are the Central Limit Theorem, functional dependence, spatial dependence and degrees of freedom.

Look and see which stats derive from Matheron’s Formule des Minerais Connexes. What a pity that the seminal work of the Creator of Geostatistics and the Founder of Spatial Statistics is no longer posted on the web. In fact, Matheron was a self-made wizard of odd statistics. Here's the link to Matheron's correction of his very first paper. Enter a different number of core samples and see how it impacts 95% confidence limits. Real statistics does all kind of cool things!

Professional engineers and geoscientists claim to be guided by codes of ethics that protect the public at large. Provincial securities commissions in Canada employ reserve and resource experts to set the rules. But should foxes run henhouses? That’s a good-enough reason why a National Securities and Exchange Commission should turn provincial fiefdoms into branch offices. Reserve and resource experts in branch offices should then be asked to testify under oath and explain why the Central Limit Theorem and degrees of freedom are null and void in geostatistics.

Working with applied statistics is fun. And it’s kind of cool for our planet! Wikipedians should read what the International Association for Standardization is all about. ISO may violate the odd copyright, and ignores priority once in a while. And the UN is not perfect either. Only Wikipedia can bring scientific integrity to the world.