ReadingAssignments [C1 and C3] – How to Lie with Statistics
Inthis chapter, Huff demonstrates that a surprise precise figure has ahigh probability of being false. The creators of such precise figuresdid not make appropriate sampling, creating bad samples all the way.A large sample which is well selected will represent a wholepopulation and conclusion will be true. A small sample is biased, but will appear scientific. However, bad samples are used where itsrespondents give a false response in order to please theinterviewees.
Thecreators of samples must get rid of bias. They can do this by randomsample or selection of samples from chance. However, with randomsampling it is expensive and difficult to get a random sample from abig population, therefore the stratified random sample is the bestsubstitute. In this type of sampling, a population is divided intoseveral samples.Inaddition, the sample bias may be caused by respondents underrating oroverrating the truth, inability of respondents to respond to aquestion as a result of pride or inaccessibility, wording problems inquestionnaires and the tendency of a respondent to remember beforeexposure to a study.
Anexample is a biased poll on who would be the next president. Asurveyor will use a small sample to gauge who will win the nextpresidency and announce the results. Despite, some discrepancies,known to him, he will just take the results as it is.
Averageis a loose and tricky word used by advertisers to trick and influencethe sale of their products. Therefore, readers are deceived withoutknowing the kind of average it is referred to.The word average hasdifferent reference. It may refer to mean, mode or median. Therefore,average is meaningless unless someone is sure if to refer to mean,median or mode.
Meandistributes the total among the sample, median divides the totalsamples into two equal percentile, while mode is the outcome withmost frequencies.
Ina neighborhood, a person may claim that the average income is 45000,while another claim it is51000 Both are honesty, but the problem liesin the type of average user.
Incomeof neighborhood ($)
Therefore,the mean is more sensitive to outliers than the two.
Chapter3: Little Figures Not There
Inchapter 3, Huff warned on the figures or data omitted from thesample. Statisticians usually use inadequate samples leading to alarge amount of errors in the end. For larger samples, any change islikely to be too small to be noticed. For readers, larger samples aremore accurate. With large samples, the laws of averages holds whereprobability influences the occurrences in the long term. It can alsobe noted that
Small samples lead to low incidence rates than large samples in estimating the frequency of events
Little figures also cause significant margin errors in measuring inference
This figure results from ranges such as standard deviation from the mean
This also involves inference among the sample verses the population
Huffstill claimed that the law of averages is used in descriptions andpredictions. The usefulness depends on the number of samples takenand required to be able to predict a given phenomenon accurately. Thesample size depends on the total size of the population and how itvaries. Sometimes, sample size is deceptive thus, to stop beingdeceived the degree of significance should be figured. Such anaverage should not be trusted when important figures are omitted. Ifthe numbers or ranges or deviating data from average are not shown,then the statistics are deceptive.
Peoplebelieve that a coin tossed will have a 80% of being ahead. The lawsof averages say it is due to be the tail. Even if the coin has landedon the head for 10 consecutive times, the probability of landing onthe tail the next time is still 20%.
Chapter4: Much Ado About practically Nothing
Thephrase above means that a difference is different if it make adifference. Therefore, Huff has explored this concept in thischapter. Huff introduces the aspects of errors and ranges toillustrate that sometimes numbers are made to bring more meaning thanwhat they do. He claimed that in statistics, there are samplingmethods of collecting data. Any statistic of this method must have statistical error. A sample is taken to represent the wholepopulation and represented in the figures. To do this a probableerror and standard errors are incorporated. A probable error which isapproximated at 0.5, is an amount through which the mean of a samplevaries as a result of chance.
Anexample: You count 50 steps along a road for several times. In thenext counting you realize that you came within 2 meters of hittingthe exact 50 steps in half your trials and also missed 2 meters inthe other trials. So with 9 trials, the probable error will becalculated as 9±2meters.
Thestandard error is also but it requires the knowledge of the size ofthe sample. With standard error, people make much ado about thedifference that can be demonstrated, but it is very insignificant andtiny. The standard error is usually 0.95 probability.
Aman is issued with a box containing 37 fruits and told to give out20% of the fruits and take the rest. So he will take 80%, which is29.6. He will end up taking 30 fruits.
Chapter5 Ghee Weezing or eye catching Graphs
Chapterdeals with the gee-whiz graphs. Ghee Weezing graphs are eye catchingline graphs are the easiest graphs to be used in statistics. They aregood it demonstrating trends and explanation of a certain issue ofinterest. Unfortunately, these line graphs are efficient inmisleading readers both intentionally or unintentionally. Forinstance, if you want the bar graph to have a more appealing factor,a certain part of a graph is cut to make a bigger impression andstill present an honest data. Organizations can present false graphsto increase their reputation by changing a certain proportion ofgraph and no one can be blamed. The problems with Ghee Weezing graphslies in the following:
The ranges used in the graph have got great impact an the interpretation of the graph especially in the percentage change interpretation
Y-axis and x-axis proportions also distort
It enables bar charts to start at positive values and trimming artificial baseline to zero hence distorting it.
Byuse of A graph titled number of frogs in a pond, a question can beasked to determine the number of frogs in that pond. In a graph, onesmall frog is used to show the quantity in may while a big frog isused to show the quantity in September. Another exact graph is drawnagain, but the difference is only use of more small frogs instead ofthe previous largest frog. In this case a person may think that frogsare largest in September.
Chapter6:One Dimensional Pictures
Inthis chapter, Huff has illustrated how to deceive readers throughpictorial graphs and photographs. Readers usually likes picturesbecause they are appealing to the eye, but they are not able tounderstand the outcome correctly. While reading the pictorialgraphs, one should be keen on the way bars changes with the widthswhen representing a single factor.
Bar charts and pictorial graphs make a comparison in one dimension and therefore it should have proportions similar to values
Picture graphs may be used, but are deceptive
Just how many adult frogs are in the south pond? The reader might conclude that frogs are simply bigger in September as compared to May, even though the title says that the graph displays the number of frogs. The reader will notice to the area of the image, not just the height.
Ina picture graph, small change in a variable leads to a big change inanother, for example small increment in height of a container leadsto bigger change in its volume.
Chapter7 Semi Attached Figure
Inthis chapter, Huff has explained what a semi attached figure canperform. A semi attached figure is the scenario where if one does notprove what he wants, he can demonstrate something else and pretend itis the same thing. Figures are chosen that sounds best and believedthat they cannot be recognized to be imperfect. In this case, thedata used is very irrelevant. Numbers that may be related and goodmeasures put in places may be used, but in actual sense, they are notrelated. A semi-attached figure can be recognized when someinformation is missing or some variables are discarded.
Semiattached figure results from the provision of false numbers byreporters in addition to inconsistency in reporting from the source.For example, when a promoter asks a controversial question, it maylead to fallacy of information since each respondent wants to givewhat is believed as the correct response.
Thesemi attached figure usually occurs when a statistic about a certainpopulation is believed to be held by a given sample with which theoriginal population is not represented in the sample. The semiattached figures results from passage of information throughnon-technical medium such as social media.
Keepingfood in fridge prevents contamination by bacteria. It is irrelevantin that cold bacteria or germs still multiplies in cold food.
Alltomatoes are ripe. This statement presents fallacy because thetomatoes might have been all ripe in summer or winter.
Morepeople died from road carnage this year than in 1963. The assumptionis that there is the same quantity of cars in 1963 as in this year.
Chapter8PostHoc Rides Again
Inchapter 8, Huff has explained the problems of the post hoc fallacy.The post hoc rides results from the belief that if B follows A then Bwas caused by A. In other words, since one event occurs beforeanother event, the first event led to the occurrence or caused theoccurrence of the next event. Nonetheless, because A happened beforeB does not mean that they are correlated. It may be that event B wascaused by a third factor. For instance
EventA: Africa has a high percentage of deserts
EventB: Africa has a high poverty rate than other continents with lowpercentage of deserts
Thepost hoc fallacy will be: because Africa has a high percentage ofdeserts and a higher poverty rate than countries with low percentageof deserts, high percentage of deserts causes poverty. Therefore,when there a lot of explanations, no one should just pick one becauseit suits the situation. But the fact remains that, the relationshipcan be as a result of several factors such as
Sometimes the causes and effects change places
The two variables or events are the cause and also the effect
The two variables do not affect each other, but they are really correlated
The co-variation where the relationship is real, but it is difficult to determine the variable which is the cause or the effect.
Ina room there are two clocks namely clocks X and Y which are perfectlyused to keep time. When clock X points for an hour, clock B strikes.In this case clock X a question arises on whether Xcaused Y tostrike.
Chapter9 How To Statisticulate
Huffhas introduced statisticulation as a manipulation of information. Themanipulation is through decimal, percentages and averages deception,where things don’t actually add up are included. According to Huffthese misinformation is usually caused by people who are incompetentstatisticians such as journalists, salesmen, and copywriters aimedat influencing the readers. They do this by exaggerating data andminimizing negative things. They are fond of showing good impression,but not minding the reception of the impressions. Maps hides thefacts and distort relationship while decimals are always deceivingthough they are used for exactness. Such statisticians usepercentages to confuse readers since percentages arrived at by use ofsmall samples or cases misleads.
Thestatisticulation is the fudging of data and includes selectivereporting and making up false data. The incompetent statisticians whocome up with such false data, causes a result that follows a givenpattern which is consistent with the preferred hypothesis ignoringother outcomes that are contrary to the hypothesis. Therefore, datamanipulation is a serious issue affecting the validity of the honeststatistical analysis. It is imperative to study data and solveproblems before analyzing the data.
Intwo schools X and Y with different populations of students, they areissued with 100 computers each. School X has 400 students whileschool Y has 450 students, therefore in school X, 4 students uses onecomputer at ago while in school Y, 4.5 students used one computer atan the same time. The wrong question or statistic is what is 0.5 ofa student.
Anotherexample: an ad Jumia House Tea emphasized that 55% of people in theprevious survey prefer its taste. The issue is therefore the numberof people tested in the sample.
Chapter10 how to talk back to a statistic
Inthis chapter the author has shifted his focus to reveal howstatistical deceptions can be established. On examination ofeffective devices used to reveal these statistics’ secrets, Huffpointed out 5 questions to be used’ these are
Who says so?
How does he know?
Did someone change the subject?
Does it make sense?
Itis worth noting that people lie, but not the statistics and thestatistics may not relate to what it is claimed to but may meansomething else. Therefore, for us to protect ourselves from all liesof statistics, we should look for a biased sample. Then determinewhat was the intention of the creator whether to prove a theory,protect a reputation or earn a fee. The areas to be checked are theshifting units of measurement when reading graphs, the suppresseddata, the types of averages intended, the source of the claim,originality of the claim, names that tend to change or have differentdefinition and the transition from the raw statistics or figures tothe conclusion.
8%interest verses $8 to 100.
Itis safer to travel at 6AM than 6PM, because more than five timesaccidents occur at 6PM than 6AM. The important conceptis not the time of the day, but the more drivers or cars on the roadat a later time. The fact is there are more miles covered and timeand distance covered for an accident to occur.
ContemporaryExample Of Wrong Statistic
Accordingto a note on the relationship between driving and obesity, TheEconomist (US) published a story of two authors who presented dataon Vehicles Miles Travelled (VMT), licensed drivers, and GDP foradult, population for the period 1985-2007. It also included obesityrates between 1995-2007 since obesity rates before 1995 wereunavailable. To prove this on the increase in change for six years inVMT/LD, they said that weight changes become permanent of 2000 days.They demonstrated a regression analysis with a very perfectcorrelation and a better Co-efficiency of 98.44% claiming that1%decrease in VMT leads to 0.8% decline in rates of obesity.
ContemporaryExample Of Statistic Done Right
Theuse of ZMapp as a treatment for Ebola in Western Africa in 2014 canform a basis for the wrong statistic done rightly. It was first usedas an experimental medicine to treat Ebola patients when they had notbeen tested clinically. Some patients actually recuperated bytreatment with the ZMapp doses while only a very small number died. It caused a global uproar as to how the Africans are used as guineapigs for America’s medicine companies. No one realized that withthe current wide-spreading epidemic, there was a need to hurriedlydiscover a remedy. The medicine worked well, but people stillperceive it as lies since it has been highly publicized as unsure ifit effectively treats Ebola or safe for human health (Huff’s,1994).
Huff’s,D. (1994). DarrellHuff`s book How to Lie with Statistics.W. W. Norton & Company.