geostatscam: 2008

Wednesday, December 31, 2008

Agterberg's way

Here’s what Agterberg wrote to me, “It seems that you are an iconoclast with respect to spatial statistics including kriging.” He did so in his reply to my email of October 7, 2004, on the subject of The Silence of the Pundits. That’s not quite what I had written to him. I didn’t bring up spatial statistics or kriging. It seemed as if Agterberg’s tribute to Matheron had become his new reality. All I had asked were questions about the distance-weighted average. I didn’t know in 2004 that Agterberg himself had derived this distance-weighted average point grade first in his 1970 Autocorrelation Functions in Geology and once more in his 1974 Geomathematics. What kept me spellbound in this Millennium was Matheron’s mind-numbing opus after it was posted on the website of the Centre de Géosciences. Since December 12, 2008, all I get to look at is “Not found.” I was used to Matheron’s prose and symbols but did miss his primary data. I wish his collected works were posted for posterity. It is such stunning stuff.

Agterberg brought up a friend of mine with similar criticisms who had “orally presented his views at IAMG meetings.” Agterberg thought I might wish to do the same. Good grief! What I do is put my thoughts in writing. I did so with The Properties of Variances in 1993. I wanted to bring the properties of variances within the grasp of geostatistical thinkers. Many had gathered at McGill to celebrate Geostatistics for the Next Century. It sounded somewhat premature but geostatistics was growing in leaps and bounds in those heady days. The properties of real variances were rather late in coming and the Bre-X fraud was just around the corner. As luck would have it, the properties of variances didn’t quite suit the tribute to David’s work with its infinite sets of simulated values and zero pseudo variances. That sort of science fiction still underpins McGill’s curriculum for budding geoscientists. McGill University is a source of goofy geosciences.

Philip and Watson’s Matheronian Geostatistics: Quo Vadis? (MG, Vol 18, No 1, 1986) made Matheron fit to be tied up. His rebuttal took the form of a Letter to the Editor (MG, Vol 18, No 5) on the subject of Philipian/Watsonian High (Flying) Philosophy. Agterberg’s way is oral criticisms but I really liked Matheron’s written rebuttal. On the other hand, Matheron’s temper tantrum driven tirade might have boggled the odd geostatistical mind. I wrote about voodoo statistics in the 1990s but it failed to trigger another mind numbing tirade.

Matheron was called the Founder of Spatial Statistics and the Creator of Geostatistics. Why did his ramblings merit twin epitaphs? The more so since Berry and Marble’s 1968 Spatial Analysis, a Reader in Statistical Geography, makes no mention of Matheron’s work. Chapter 8 Fourier Analysis in Geology in Section IV Analysis of Spatial Distributions refers to Agterberg’s Methods of Trend-Surface Analysis. Agterberg talked about it at a 1964 symposium with Applications of Statistics in its lengthy title. Just the same, Matheron did dismiss trend surface analysis at the 1970 geostatistics colloquium. Why did the masterminds not see eye-to-eye on spatial statistics when Matheron brought his new science to the USA?

All that gibberish troubled me even more when I read Agterberg’s response to my questions of October 11, 2004. On September 23, 2004, I had posed the same questions to the Councilors of the International Association for Mathematical Geology, and to the Editor and his Associate and Assistant Editors of the Journal for Mathematical Geology.

Who lost the variance of a single distance-weighted average?

Who found the variance of a set of distance-weighted averages?

Only one Assistant Editor responded by pondering, “If geostatistics is not furthering a certain problem, a different type of mathematics may solve it.” Now there’s one partially open JMG mind at work! It didn’t tempt me into giving oral criticisms at any IAMG meeting.

Here’s what I wrote on October 12th in response to Agterberg’s Aberdeen message of October 11, 2004. “I just want to know when and on whose watch the variance of the single distance-weighted average vanished, and when and under whose tutelage the kriging variance and covariance of a set of kriged estimates became the cornerstones of geostatistics, spatial statistics, kriging, smoothing, or any other popular computation that violates the requirement of functional independence and the concept of degrees of freedom”. His way was not to respond.

Agterberg had failed to derive the variance of his distance-weighted average point grade first in 1970 and again in 1974. What he did do was make a sham of scientific integrity when he was IAMG’s President. He did call it the International Association of Mathematical Geosciences. Agterberg’s way was to stay silent. It’s the wrong way in science. The right way would be to revise Geomathematics.

Sunday, December 21, 2008

Agterberg's tribute

It’s high time to try and read Agterberg’s state of mind in his tribute to the life and times of Professor Dr George Matheron. It taught me so much more about his way of thinking than I had learned when we talked in the early 1990s. Neither could I have found out what I needed to know had the Centre de Géosciences (CG) not posted Matheron’s works on its website. When I looked at CG’s spiced up website for the first time I found out that he wrote his Note statistique No 1 in 1954. So, it seems safe to assume Matheron thought he was working with statistics. His thoughts are accessible again since CG’s website is back online.

Agterberg said in his tribute that Matheron “commenced work on regionalized random variables inspired by De Wijs and Krige.” Let’s take a look at Matheron’s very first paper and try to figure out what he did in his Formule des Minerais Connexes. He tested for associative dependence between lead and silver grades in lead ore. He derived length-weighted average lead and silver grades of core samples that varied in lengths. What he didn’t do was derive variances of length-weighted average lead and silver grades. Neither did he test for spatial dependence between metal grades of ordered core samples. He didn’t give his primary data but scribbled a few stats in this 1954 paper. He didn’t refer to De Wijs or to Krige. In fact, Matheron rarely referred to the works of others.

Where’s the Central Limit Theorem?

Matheron was a master at working with symbols. Yet, he wouldn’t have made the grade in statistics because the Central Limit Theorem was beyond his grasp. The Founder of Spatial Statistics did indeed have a long way to go in 1954. So, he penned nothing but Notes Statistique until 1959. That's when he tucked Note géostatisque No 20 tightly behind Note statistique No19. So, why did he switch from stats to geostats? It took quite a while to explain but here’s what Matheron said in 1978. He did it because “geologists stress structure” and “statisticians stress randomness.” That sort of drivel does stand the test of time in Matheron’s Foreword to Mining Geostatistics just as much as Journel’s mad zero kriging variance does in Section V.A. Theory of Kriging.

What did D G Krige do that so inspired young Matheron? In 1954 Krige had looked at, “A statistical approach to some mine valuation problems on the Witwatersrand.” It still reads like real statistics, doesn’t it? In 1960 he did reflect, “On the departure of ore value distributions from the lognormal model in South African gold mines.” Isn't that the nasty reality at gold mines? So, Krige did indeed work with statistics in those days. He may since have had some epiphany because he cooked up in 1976, “A review of the development of geostatistics.” This is why Krige was highly qualified to put a preface to David’s 1977 Geostatistical Ore Reserve Estimation with its infinite set of simulated values in Section 12.2 Conditional Simulations.

Why did H J De Wijs wind up in Agterberg’s tribute to Matheron? Agterberg had found out in 1958 that De Wijs worked with formulas that “differed drastically from those used by mathematical statisticians.” Agterberg preferred “the conventional method of serial correlation.” Why would Agterberg talk about mathematical statistics and serial correlation in 1958 when he himself had stripped the variance of his own distance-weighted average point grade in 1970 and in 1974? Agterberg ought to explain why in 2009!

De Wijs brought vector analysis without confidence limits to mining engineering at the Technical University of Delft in the Netherlands when he left Bolivia after the Second World War. Jan Visman worked at the Dutch coal mines during the war and surfaced with tuberculosis, a novel approach to sampling theory and practice, and a huge set of test results determined in samples taken from heterogeneous sampling units of coal. So much information, in fact, that he was encouraged to write his PhD thesis on this subject. And that’s exactly what he did! He continued to work as a mining engineer at the Dutch State Mines. When he found out that the Dutch Government was thinking of closing its coal mines he migrated to Canada in 1951. He worked briefly in Ottawa until 1955, and moved to Alberta where his formidable expertise was put to work in the coal industry.

Going, going, gone in geostatistics

Visman’s sampling experiment with pairs of small and large increments is described in ASTM D2234-Collection of a Gross Sample of Coal, Annex A1. Test Method for Determining the Variance Components of a Coal. Visman’s sampling theory has been quoted in a range of works. Following are some surprising references to Visman’s work, and to the lack thereof after Gy's work was widely accepted for no apparent reason.

Gy’s 1967 L’Échantillonnage des Minerais en Vrac, Tome 1 … two

Gy’s 1973 L’Échantillonnage des Minerais en Vrac, Tome 2 … eight

David’s 1977 Geostatistical Ore Reserve Estimation … two

Journel & Huijbregts’s 1978 Mining Geostatistics … zero

Clark’s 1979 Practical Geostatistics … zero

Gy’s 1979 Sampling Particulate Materials, Theory Practice … zero

Visman's sampling theory is based on the additive property of variances. None of the above works deals with the additive property of variances in a measurement hierarchy.

Monday, December 01, 2008

How to measure what we speak about

NASA satellites have been measuring lower troposphere global temperatures since 1979. At that time I went around the world at a snail’s pace. Lord Kelvin’s thoughts about how to measure what we speak about were much on my mind in those days. I thought a lot of metrology in general, and of sampling and statistics in detail. I was to visit all of Cominco’s operations around the world. My task was to assess the sampling and weighing of a wide range of materials. Of course, it couldn’t possibly have crossed my mind that I would look in 2008 at the statistics for 30 years of lower troposphere global temperatures.

My job with Cominco did have its perks. When I was at the Black Angel mine in Greenland, I saw Wegener’s sledge on a glacier above the Banana ore zone. I knew how geologists had struggled with Wegener’s continental drift, and how they slowed it down to plate techtonics.

Southeast Coast of Greenland

I knew geologists were struggling with Matheron’s new science of geostatistics. I travelled around the world with a bag of red and white beans, a HP41 calculator and a little printer to make the Central Limit Theorem come alive during workshops on sampling and statistics. I lost my bag of beans because it was confiscated at customs in Australia.

On-stream analyzers that measure metal grades of slurry flows at mineral processing plants ranked high on my list of tools to work with. The fact that the printed list of measured values was just peeled of the printer at the end of a shift rubbed me the wrong way. I got into the habit of asking who did what with measured values. It was not much at that time because on-stream analyzers were as rare as weather satellites. Daily sheets made up a monthly pile, and that was the end of it. I entered the odd set in my HP41 to derive the arithmetic mean and its confidence limits for a single shift. But that was too tedious a task. That’s why spreadsheet software ranked high on my list of stuff to work with.

I met a metallurgist who tried to put to work Box and Jenkins 1976 Time series analysis. So, he did have a few questions. I explained what Visman’s sampling theory had taught me. First of all, the variance terms of an ordered set of measured values give a sampling variogram. Secondly, the lag of a sampling variogram shows where orderliness in a sample space or a sampling unit dissipates into randomness. The problem is that Time series analysis doesn’t work with sampling variograms. So, the metallurgist got rid of his Box and Jenkins and I took his Time series analysis. Box and Jenkins referred to M S Bartlett, R A Fisher, A Hald, and J W Tukey but not to F P Agterberg or G Matheron. Box and Jenkins provide interesting data sets. I’ve got to look at the statistics for Wölfer’s Yearly Sunspot Numbers for the period from 1770 to 1869.

Sunspots

Visman’s sampling theory did come alive while I was working with Cominco. So much so that I decided to put together Sampling and Weighing of Bulk Solids. The interleaved sampling protocol plays a key role in deriving confidence limits for the mass of metal contained in a concentrate shipment. So, I was pleased that ISO Technical Committee 183 approved ISO/DIS 13543–Determination of Mass of Contained Metal in the Lot. I was already thinking about measuring the mass of metal contained in an ore deposit! But CIM’s geostatistical thinkers had different thoughts. For example, CIM’s Geological Society rejected Precision Estimates for Ore Reserves. In contrast, CIM’s Metallurgical Society approved Simulation Models for Mineral Processing Plants.

In other words, testing for spatial dependence is acceptable when applied to an ordered set of metal grades in a slurry flow. Testing for spatial dependence is unacceptable when applied to metal grades of ordered rounds in a drift. So I talked to Dr W D Sinclair, Editor, CIM Bulletin. He was but one of a few who would listen to my objection against such ambiguity. In fact, I put together a technical brief and called it Abuse of Statistics. I mailed it on July 2, 1992, and asked it be reviewed by a statistician. A few weeks later Sinclair called and said Dr F P Agterberg, his Associate Editor, was on the line with a question. What Agterberg wanted to know is when and where Wells did praise statistical thinking. That was all!

H G Wells

I didn’t know when or where Wells said it! I didn’t even know whether he said it or not! What I did know was that Darrell Huff thought he had said it. In fact, he did quote it in How to Lie with Statistics. I didn’t know much about Agterberg in 1992. What I did know then was that David in his 1977 Geostatistical Ore Reserve Estimation referred to Agterberg’s 1974 Geomathematics. And I found out that Agterberg didn’t trust statisticians when he reviewed Abuse of Statistics.

F P Agterberg

Agterberg , CIM Bulletin’s Associate Editor in 1992, was a leading scholar with the Geological Survey of Canada. Yet, he didn’t know that functions do have variances. It does explain why he fumbled the variance of his own distance-weighted average zero-dimensional point grade first in 1970, and again in 1974. He could have told me in 1992 that this variance was gone but chose not to. Agterberg was the President of the International Association for Mathematical Geology when it was recreated as the International Association for Mathematical Geosciences. He is presently IAMG’s Past President. He still denies that his zero-dimensional distance-weighted average point grade does have a variance. Agterberg was wrong in 1970, in 1974, and in 1992. And he is still wrong in 2009. That's bad news for geoscientists!

Friday, November 14, 2008

How to work with real statistics

Lorne Gunter called on skeptics to unite. He did so in the National Post. His story was about scientists who don’t warm up to “the orthodoxy on global warming”. What a shame that but few got this call because it came on Monday, October 20, 2008. The timing couldn’t have been worse. It was another Monday when Wall Street and Bay Street watchers saw stock indices move straight south. Global warming isn’t of as much concern as are shrinking stock portfolios. It may explain why Lorne’s tale was told on a Monday. Sandra Rubin’s story, too, ran on a Monday. NP’s head honchos run their own stories mostly in weekend editions. NP’s very first edition was printed on October 27, 1998. At that time, it was Lord Black’s pride and joy. At this time, Lord Black is doing time and NP’s kingpins are still timing things their own way.

Lorne need not have urged skeptics to unite since they did so long ago. Skeptics do hold a dim view of pseudo scientists who play games with scientific integrity. I may well have been a born skeptic. I was taught more than I could grasp about heaven and hell from a pulpit in a Dutch village. Nowadays I teach how to test for spatial dependence in sampling units and sample spaces. Stanford’s Journel taught in 1992 that spatial dependence between measured values may be assumed. I never thought much of Journel’s thinking. Neither did JMG’s Editor. All I thought about at that time was to rid the world of Matheron’s junk statistics. Come hell or high water! And I still do!

The National Post brought to light on November 7th that President-Elect Barrack Obama is set to “Stop global warming”. It brought back that off the wall “Stop continental drift” slogan. Geologists slowed down continental drift by calling it plate tectonics. Plates are still moving, and earthquakes, magma flows and tsunamis are tagging along. The National Post on November 10 claimed that climate change, too, is on some kind of yes-we-can list. Surely, geoscientists should study climate change. What the study of global warming has done so far is set the stage for a constant belief bias.

Lorne’s story about skeptics and global warming came about because of the work of Professor Dr John R Christie. More than 300,000 daily temperature readings around the globe with NASA’s eight weather satellites over 30 years gave Christy and his coauthor a massive data set to work with. It was marked “Lower Troposphere Global Temperature: 1979-2008.” The authors had drawn a trend line thru a see-saw plot. It was the shape of this trend line that piqued my interest. What I wanted to do was test for spatial dependence between measured values and determine where orderliness in our own sample space of 30 years dissipates into randomness. So I asked Lorne and he did sent me the whole set that underpins the plot in his story!

The first step in the statistical analysis is to verify spatial dependence between observed temperatures in this sample space of time by applying Fisher’s F-test to the variance of the set and the first variance of the ordered set.

The observed value of F=6.27 exceeds the tabulated value of F0.001;df;df_o=1.32 at 99.9% probability by a margin of magnitude. Hence, monthly temperatures display an extraordinary high degree of spatial dependence. The probability that this inference is false is much less than 0.1%.

The second step is to verify whether or not the weighted average difference of 0.063 centigrade is statistically identical to zero. Since the first set and the last one have different degrees of freedom than intermediate sets, Student’s t-test is applied with a month-weighted average variance. Such weighted variances are called pooled variances in applied statistics.

The observed value of t=4.245 exceeds the tabulated value of t0.001;df_o=3.674. Hence, the probability is less than 0.1% that this weighted average difference of 0.063 centigrade is statistically identical to zero. Alternatively, this probability of 99.9% points to a statistically significant but small change of 0.063 centigrade during this 30-year period. Detection limits that take into account Type I risk only and the combined Type I and II risks are of critical importance in risk analysis and control. In this case, the Type I risk is ±0.031 centigrade, and the combined Type I risk and Type II risk is ±0.056 centigrade.

The third step is to verify whether or not the variances of ordered temperatures in centigrade constitute a homogeneous set.

Bartlett’s chi square test shows that the observed χ²-value of 22.979 falls between 42.557 at 5% probability and 17.708 at 95% probability. Hence, the set of variances for this 30-year period is homogeneous.

Sir Ronald A Fisher was knighted in 1953 for his work with analysis of variance. Dr F P Agterberg fumbled the variance of his distance-weighted average point grade in 1970 and in 1974. NASA started to measure Lower Troposphere Temperatures in 1979. I showed how to test for spatial dependence between metal grades in ordered sets for the first time in 1985. So why would any geoscientist assume spatial dependence between measured values in ordered sets? Agterberg is the President of the International Association for Mathematical Geosciences. He should explain why his distance-weighted average point grade does not have a variance.

Monday, October 27, 2008

How to lie with geostatistics

Here’s how to in a nutshell. The most brazen lie of all was to deny that weighted averages do have variances. The stage for this lie was set at the French Geological Survey in Algeria on November 25, 1954. It came about when a novice in geology with a knack for probability theory put together his very first research paper. The author had called his paper Formule des Minerais Connexes. He had set out to prove associative dependence between lead and silver in lead ore. He worked with symbols on the first four pages. Handwritten on page 5 are arithmetic mean grades of 0.45% lead and 100 g/t silver, variances of 1.82 for lead and 1.46 for silver, and a correlation coefficient of 0.85. He had worked with symbols until page 5 and did omit his set of primary data. Neither did he refer to any of his peers. Those were peculiar practices that would remain this author's modus operandi for life.

This budding author was to be the renowned Professor Dr Georges Matheron, the founder of spatial statistics and the creator of geostatistics. What young Matheron had derived in his 1954 paper were arithmetic mean lead and silver grades of drill core samples. But he had not taken into account that his core samples varied in lengths. So he did derive length-weighted average lead and silver grades and appended a correction to his 1954 paper on January 13, 1955. What he had not done is derive the variances of his length-weighted average lead and silver grades. Neither did he test for, or even talk about, spatial dependence between metal grades of ordered core samples. Matheron’s first paper showed that testing for spatial dependence was beyond his grasp in 1954.

Why was Formule des Minerais Connexes marked Note statistique No 1? Matheron had not derived variances to compute confidence limits for arithmetic mean lead and silver grades but applied correlation-regression analyis. Statisticians do know that the central limit theorem underpins sampling theory and practice. So why didn’t young Matheron derive confidence limits? Surely, he was familiar with this theorem, wasn't he? Or was it because he thought he was some sort of genius at probability theory? That would explain why he worked mostly with symbols and rarely with real data. Had he worked with real data, he would still have cooked up odd statistics because the variances of his central values went missing. That’s why he was but a self-made wizard of odd statistics. It was Matheron who called the weighted average a kriged estimate as a tribute to the first mining engineer who took to working with weighted averages. Matheron never bothered to differentiate area-, count-, density-, distance-, length-, mass- and volume-weighted averages. But then, neither did any of his disciples.

Matheron’s followers, unlike real statisticians, didn’t take to counting degrees of freedom. Statisticians do know why and when degrees of freedom should be counted. Geostatisticians don’t know much about degrees of freedom but they do know how to blame others when good grades go bad. They always blame mine planners, grade control engineers, or assayers whenever predicted grades fail to pan out. They claim over-smoothing causes kriging variances of kriged estimates to rise and fall. Kriging variances rise and fall because they are pseudo variances that have but squared dimensions in common with true variances. Of course, Matheron’s odd new science is never to blame for bad grades or bad statistics.

It is a fact that Matheron fumbled the variance of his length-weighted average in 1954. Several years before the Bre-X fraud I derived the variance of a length- and density-weighted average metal grade. The following example is based on core samples from an ore deposit in Canada. The mine itself is no longer as Canadian as it once was. The Excel template with the set of primary data and its derived statistics are posted on a popular but wicked website.

My website was set up early in the Millennium. I loved to send emails with links to my reviews of Matheron’s new science of geostatistics. Students at the Centre de Géostatistique (CDG) in Fontainebleau, France, ranked on high on my list of those who ought to pass Statistics 101. I was pleased when PDF files of Matheron’s work were posted with CDG’s online library. But I was surprised to find out that Matheron’s first paper was no longer listed as Note statistique No 1 in the column marked Reference but as Note géostatistique No 1. Just the same, the PDF file of this paper and its appended correction are still marked Note statistique No 1. On October 27, 2008, five out of six of Matheron's 1954 papers were still marked Note statistique Nrs 2 to 6.

What was going on? Was the birth date of Matheron’s new science of geostatistics under review? Who reviewed it? And why? Why not retype the whole paper? Why not add the variances of length-weighted average lead and silver grades? And how about testing for spatial dependence between metal grades of ordered core samples? Where have all of Matheron’s sets of primary data gone? And what has happened to his old Underwood typewriter? I have so many questions but hear nothing but silence!

Matheron himself moved from odd statistics to geostatistics in 1959 when he went without a glitch from Note statistique no 19 to Note géostatistique no 20. Check it out before geostat revisionists strike again. I admit to having paraphrased Darrell Huff’s How to lie with statistics. But I couldn’t have made up that this delightful little work was published for the first time in 1954. That’s precisely when young Matheron was setting the stage for his new science of geostatistics in North Africa. Matheron, the creator of geostatistics, never read Huff’s work. But then Huff didn't read Matheron’s first paper either. Thank goodness Darrell Huff’s How to lie with statistics is still in print!

geostatscam