First off you need to understand me: a self trained engineer with forty years of commercial experience across numerous disciplines: electronics, software, color, audio, music.
I have been writing commercial software for so long I have substantial intuition about how data will behave on a very, very large scale.
I am a bad ass, hard core, a mercenary. It's what I do.
I design, build and support large, complex data-intensive commercial computer-based systems. Commercial in this context means that I get paid on the results. If the system doesn’t work then I lose - no one pays me, especially not to waste their time or money.
These systems often involve large complex data sets on which a variety of statistical calculations are performed. They tend to operate for many years, often without intervention.
This includes systems which utilize what is effectively “programmed intuition.” What I mean by this is that regardless of the size of a set of data used to develop and design the system there are always things which you cannot know. Sometimes I have to write software that uses “guesses” to complete or bound calculations when all the data needed to verify something is unavailable.
A few examples of this:
1. A system I designed, in production for eight years, to repair Japanese fonts in a printing system used to produce a substantial number of cell phone bills in Japan. I don’t know Japanese or how to write it.
2. A system, in production for nine years, to “color correct and match” documents printing at 1,000 feet per minute when traditional color management systems fail.
Hopefully you get the idea...
So today I found the above article through Ars Technica’s description here (https://arstechnica.com/science/2017/04/the-peer-reviewed-saga-of-mindless-eating-mindless-research-is-bad-too/).
So what’s my blog post about?
Somebody named Brian Wansink from Cornell University (link: https://en.wikipedia.org/wiki/Brian_Wansink), a consumer psychologist, apparently made some blog posts which alerted other “researchers” to some question some of Wansink's published scientific results. Wansink runs some sort of “lab” at Cornell that has to do with the “science” of convincing children to eat carrots by giving them (the carrots) clever names, hiding the goody bowl so you won't each too many snacks (sigh), and so on.
Apparently Wansink is a “rock star” in this area of “science”.
The researchers questioning the results, Tim van der Zee, Jordan Anaya, Nicholas J. L. Brown, published this article (also https://peerj.com/preprints/2748v1/) as a response.
The idea of their paper critical of Wansink is that it has data values that don’t add up (like the data in the tables in his paper) or they have strange mathematical problems.
For example, if my data set is “44.16 18.88 46.08 14.46” and I calculate the “mean” (add the numbers and divide by 4 in this case) I should not get any odd numbers because there are no odd numbers involved (all the input values are even and I am dividing by four).
Fair enough so far.
If Wansink’s numbers and papers have these sorts of problems then van der Zee, et al. are in correct in being critical.
But that’s not my point here.
The really, really troubling aspect of this is the “really cool” Wansink data set (link: https://web.archive.org/web/20170312041524/http://www.brianwansink.com/phd-advice/the-grad-student-who-never-said-no) and the subsequent “analysis” data set by van der Zee (see this at github: https://github.com/OmnesRes/pizzapizza).
There is virtually no data here!
The sample sizes are like twenty (20) items.
(Please feel free to review my other posts here on this - follow the links.)
And with this few data items the Wansick papers are full of math errors (and, no, I didn’t check van der Zee’s math).
Yet some of the “results” of Wansick's "science" are used, according to the Ars article, in more than “30,000 schools across the US.”
That’s bad enough, but these other guys (van der Zee) expend enormous effort going through the Wansink’s paper and picking them apart.
But again, there’s no significant data in the paper to pick apart (even less data in the paper because Wansink didn’t include his raw data so they only look at the numbers printed in the paper), so what’s the point?
Apparently they don’t notice this is a problem.
So how is their “science” or “methods” of picking apart Wansink's any different than what they accuse Wansick’s of?
Incredibly tiny data sets used to make a lot of noise.
Gee, I wonder if this is how "climate science" is done?
But I digress...
This is absolutely astonishing!
Acclaimed scientists that cannot add.
That cannot realize that insignificant data is, in fact, insignificant and should be ignored.
Guys sitting around counting pizza slices and carrots.
More guys sitting around counting the lines of numbers in "scientific papers" about carrots and pizza slices...
Reminds me of how I got my PhD in "Shuttle Runs"...
Decades ago a neighbor who was in the university "PhD" program asked me to come over one night.
As it turned out he needed some "math help" on his PhD thesis.
Gee, I worried, I wonder if I know how to help him.
After much hemming and hawing he pulled out his "research."
A small table of times for the "shuttle run" in some gym class.
Perhaps three or four tables of about 10 or 20 numbers each.
Please, he asked, if you have the time, could you help me calculate the "mean"?
I breathed a sigh of relief...
Yup, no problem.
So I guess that along with all the little carrots I too have a PhD (some day I'll tell you the story of my second PhD...)