Thursday, December 30, 2010

Praise for a scientific fraud

The First Coming of Dr Roussos Dimitrakopoulos from Down Under all the way to McGill University came to pass in 1993. He had come to chair an International Forum on Geostatistics for the Next Century. The stage for this early forum was set at McGill’s Conference Office on June 3-5, 1993. Geostatistical scholars from far and wide had come to praise Professor Dr Michel David. For it was he who had crafted the very first textbook. His 1977 Geostatistical Ore Reserve Estimation does refer to “the famous central limit theorem”. Even degrees of freedom made a token appearance in Table 1.IV. His work did qualify for support from the National Research Council of Canada (Grant NRC7035). I do have my own copy of his book and have worried a lot about it. That’s why I mailed an abstract titled “The Properties of Variances”. In fact, I did send it twice by registered mail. The first was lost and the second rejected. All I wanted to ask was why each and every distance-weighted average AKA kriged estimate did not have its own variance in his 1977 textbook. The more so since one-to-one correspondence between functions and variances is so sine qua non in our work!

David in November 1989 had rejected Merks and Merks Precision Estimates for Ore Reserves. He had done so because our paper was short on references to 20 years of geostatistical literature. In contrast, Erzmetall praised its “splendid preparation” and published it in October 1991. David didn’t know how to test for spatial dependence, how to count degrees of freedom, and how to derive unbiased confidence limits for gold grades and contents of in-situ ore. We did what David never got around to doing. We derived precision estimates for the mass of contained gold based on assays determined in a set of ordered rounds in a drift. The set of primary increments from each mined round had been put in the same basket. That’s why we couldn’t estimate the intrinsic variance of gold.

My son and I had taken at different times the same stats courses at Simon Fraser University. I had done so shortly after we came to Canada in October 1969. My problem in those days was that I spoke German and French better than English. I left SGS in 1980 to work with Cominco. I wrote a lot on sampling and statistics and lectured all over the world. These days I still write about sampling and statistics but my son travels a lot. Ed has a PhD in Computing Science and was twice awarded the Dean’s Silver Medal. He worked with IBM in Toronto. Nowadays he leads the top-level Eclipse Modeling Project and the Eclipse Modeling Framework subproject. Ed set up Macro Modeling, a small but independent company. My wife and I are pleased that Ed and his partner settled in Vancouver, BC. Applied statistics is still fun and games for both us. Here’s what keeps us thinking about McGill University.

Variance of a general function

This formula finds its origin in calculus and probability theory. It shows that the population variance of a general function is the sum of n variance terms, each of which is the squared partial derivative toward an independent variable multiplied by its variance. It has built a bridge between probability theory with its infinite set of possible outcomes and sampling practice with its finite sets of measured values. The formula underpins the variance of any central value. The arithmetic mean is the central value of a set of measured values with constant weights. Area-, count-, density-, distance-, length-, mass-, and volume-weighted averages are central values of sets of measured values with variable weights. The transition from sampling theory to sampling practice with finite sets of measured values demands that degrees of freedom be counted. No ifs or buts! Functions without variances have gone where dodos fly!

In 1970 Professor Dr Georges Matheron brought his new science of geostatistics to North America. In his Random Functions and their Application in Geology Matheron invoked Brownian motion along a straight line. It was just as richly embellished with symbols and as short on primary data as is his magnum opus. Maréchal and Serra in Random Kriging applied the same symbols that Matheron had taught all of his disciples. Figure 10 puts in plain view how to do more with less.

Figure 10 – Grades of n samples belonging to
nine rectangles P of pattern surrounding x

David may have thought that what Maréchal and Serra were doing was kind of cool. So, he explains it on page 286 of his 1977 textbook in Chapter 10 The Practice of Kriging. David dressed up M&S‘s Figure 10 with a slightly different caption.

Fig. 203. Pattern showing all the points within B,
which are estimated from the same nine holes

David added a dash of subterfuge when he called his points within B “estimated”. Each point within B derives from the same set of measured values for nine (9) holes. As such, each and every one is a function of the same set of nine (9) holes. Of course, each distance-weighted average does have its own variance in applied statistics. Thus it came about that variance-deprived and zero-dimensional distance-weighted average point grades morphed into kriged estimates.

Matheronian madness makes a mess in B

In Section 12.2.1 Using a simulated model of Chapter 12 Orebody Modelling (see page 324) David prevaricates, “The criticism to this model is obvious. The simulation is not reality. There is only one answer: The proof of the pudding is…! So far the few simulations made which it has been possible to check have a posteriori proved to be adequate”. Good grief! What about Bre-X? Why didn’t they ask Merks and Merks?

McGill University had set the stage in 1993 to praise Professor Dr Michel David. Those who had come to McGill’s Conference Center to praise him may still not have a clue what was wrong with geostatistics. But Bre-X's rigs were drilling at its Busang property! The Bre-X fraud came about because the geostatocracy had failed to grasp the properties of variances.

Tuesday, December 21, 2010

Unscrambling the French sampling school

My grandma taught me not to put all my eggs in one basket. She was a caring matriarch who told inspiring stories. She played card games but odds were beyond her grasp. She played for pennies but not with other people’s pennies. She didn’t have a PhD in anything. I took her word and never put all my eggs in one basket.

Dr Pierre Gy (1924-...) and Professor Dr Georges Matheron (1930-2000) put the French sampling school on the world map. Matheron never put core samples from a single hole in one basket so to speak. But Gy did put a set of primary increments taken from a sampling unit in one basket. So he didn’t even get a single degree of freedom. The interleaved sampling protocol is described in several ISO Standard Methods. It is also described in Chapter 6 Spatial Dependence in Material Sampling of a textbook on Approaches in Material Sampling. Dr Bastiaan Geelhoed edited the text. IOS Press published the book in 2010.

Matheron marched to a new low when he sampled in situ ores. So he didn't put in one basket a set of core samples from a single borehole. But he failed to derive measures for precision, to test for spatial dependence between grades of ordered core sections, and to count degrees of freedom. Quelle dommage! Matheron thought that Gy knew a lot about sampling theory and sampling practice. Gy’s L’Échantillonage des Minerais en Vrac was printed in two parts and on 656 pages. Tome 1 is dated January 15, 1967, and Tome 2 hit the shelves on September 15, 1971.

Gy's sampling slide rule

Gy pioneered a slide rule of sorts to simplify the sampling of mined ores. His sampling constant C is a function of c, the mineralogical composition factor, of l, the liberation factor, of f, the particle shape factor, and of g, the size range factor. Hence, Gy’s sampling “constant” is a function of a set of four (4) stochastic variables. As such, Gy's constant C does have its own variance.

Some sampling constant!

Matheron wrote a three-page Synopsis to Gy’s Tome 1 Theory Generale. He praised Gy’s work for defining, “... accuracy and precision, bias and random error, etc...” Gy, in turn, praised Matheron’s 1965 PhD thesis. Gy did refer to Visman’s 1947 PhD Thesis and to his 1962 Towards a common basis for the sampling of materials. Gy didn’t mention Sir R A Fisher, Anders Hald, Carl Pearson, or William Volk. Why then did Gy deserve Matheron’s praise?

Dr Pierre M Gy is a chemical engineer with a deterministic take on sampling. He is the most prolific author of works on sampling. He sent me a copy of his 1979 Sampling of Particulate Materials, Theory and Practice. It was marked Christmas 1979 and signed underneath. Gy pointed to degrees of freedom in Chapter 14. His Index does not list degrees of freedom between “degenerate splitting processes” and “degree of representativeness”. Another odd entry in this Index is “SF = Student-Fisher”. Student’s t-test proves or disproves bias between paired data. Fisher’s F-test proves or disproves whether two variances are statistically identical or differ significantly. Both statistical tests demand that degrees of freedom be counted!

Matheron praised Gy’s work in 1967 and Gy, in turn, praised Matheron’s work in 1979. Here’s what Gy wrote literally:

"The sampling of compact solids and more specifically mineral deposits
is covered by the science known as ”Geostatistics”. The fundamentals
of this science, established by Krige, Sichel, deWijs were developed by
Matheron and his team (references in appendix). Worked out in France,
Matheron’s theories are slowly but steadily gaining acceptance in
English speaking countries around the world thanks to
an increasing teaching and to technical textbooks such as
Michel David’s “Geostatistical Ore Reserve Estimation” (1977)".

Now that's a nice little tit-for-tat between scholars who created the French sampling school! Matheron and his disciples cooked up quite a variant of applied statistics! Thank goodness, his magnum opus is posted on CdG’s website. I scanned his 1965 PhD Thesis for degrés de fidelité but didn’t find any at all in 301 pages of dense probability theory. But I did find two sets of numerical data. Matheron’s Set A looks a lot less variable than Set B but both sets have the same central value. So, I applied Fisher’s F-test to the variances of the sets and the first variance terms of the ordered sets.

Data sets in Matheron's PhD thesis

I have pasted Matheron’s A- and B-sets on a truncated title page of his 1965 PhD Thesis. This title page and Fisher’s F-tests for his A- and B-sets are posted on my website. Matheron and Gy didn't know how to test for spatial dependence in sampling units and sample spaces. The root of the problem is these scholars didn’t grasp the properties of variances. But then, neither did my grandma!