Fw: Is there something wrong with the scientific method?--sorry--the formatting was off 1st time

January 19, 2011

Sorry---The formatting changed on me the first time i sent it--this should be

easier:

ls of Science

Â Â Â Â Â

Â The Truth Wears Off----Â Is there something wrong with the scientific method?

Â Â Â Â Â Â Â Â Â Â Â Â Â Â by Jonah Lehrer

Â Â Â Â Â Â Â

Â Â Â Â Â Â Â December 13 2010

Â Â Â Â Â

http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=a\

ll

Many results that are rigorously proved and accepted start shrinking in later

studies.Â Â Â Â

On September 18, 2007, a few dozen neuroscientists, psychiatrists, and

drug-company executives gathered in a hotel conference room in Brussels to hear

some startling news. It had to do with a class of drugs known as atypical or

second-generation antipsychotics, which came on the market in the early

nineties. The drugs, sold under brand names such as Abilify, Seroquel, and

Zyprexa, had been tested on schizophrenics in several large clinical trials, all

of which had demonstrated a dramatic decrease in the subjectsâ€™ psychiatric

symptoms. As a result, second-generation antipsychotics had become one of the

fastest-growing and most profitable pharmaceutical classes. By 2001, Eli

Lillyâ€™s Zyprexa was generating more revenue

than Prozac. It remains the companyâ€™s top-selling drug.But

the data presented at the Brussels meeting made it clear that something strange

was happening: the therapeutic power of the drugs appeared to be steadily

waning.

Â A recent study showed an effect that was less than half of that documented in

the first trials, in the early nineteen-nineties. Many researchers began to

argue that the expensive pharmaceuticals werenâ€™t any better than

first-generation antipsychotics,which have been in use since the fifties. â€œIn

fact, sometimes they now look even worse,â€ , a professor of

psychiatry at the University of Illinois at Chicago, told me. Before the

effectiveness of a drug can be confirmed, it must be tested and tested again.

Different scientists in different labs need to repeat the protocols and publish

their results. The test of replicability, as itâ€™s known, is the foundation of

modern research. Replicability is how the

community enforces itself. Itâ€™s a safeguard for the creep of subjectivity.

Most of the time, scientists know what results they want, and that can influence

the results they get.

The premise of replicability is that the scientific community can correct for

these flaws. But now all sorts of well-established, multiply confirmed findings

have started to look increasingly uncertain. Itâ€™s as if our facts were losing

their truth: claims that have been enshrined in textbooks are suddenly

unprovable. This phenomenon doesnâ€™t yet have an official name, but itâ€™s

occurring across a wide range of fields, from psychology to ecology. In the

field of medicine, the phenomenon seems extremely widespread, affecting not only

antipsychotics but also therapies ranging from cardiac stents to Vitamin E and

antidepressants:

has a forthcoming analysis demonstrating that the efficacy of

antidepressants has gone down as much as threefold in recent decades.

For many scientists, the effect is especially troubling because of what it

exposes about the scientific process. If replication is what separates the rigor

of science from the squishiness of pseudoscience, where do we put all these

rigorously validated findings that can no longer be proved? Which results should

we believe?

Francis Bacon, the early-modern philosopher and pioneer of the scientific

method, once declared that experiments were essential, because they allowed us

to â€œput nature to the question.â€ But it appears that nature often gives us

different answers.

Schooler was a young graduate student at the University of Washington

in the nineteen-eighties when he discovered a surprising new fact about language

and memory. At the time, it was widely believed that the act of

describing our memories improved them. But, in a series of clever experiments,

Schooler demonstrated that subjects shown a face and asked to

describe it were much less likely to recognize the face when shown it later

than those who had simply looked at it. Schooler called the phenomenon â€œverbal

overshadowing.â€ The study turned him into an academic star. Since its initial

publication, in 1990, it has been cited more than four hundred times. Before

long, Schooler had extended the model to a variety of other tasks, such as

remembering the taste of a

wine, identifying the best strawberry jam, and solving difficult creative

puzzles. In each instance, asking people to put their perceptions into words led

to dramatic decreases in performance. But while Schooler was publishing these

results in highly reputable journals, a secret worry gnawed at him: it was

proving difficult to replicate his earlier findings. â€œIâ€™d often still see an

effect, but the effect just wouldnâ€™t be as strong,â€ he told me. â€œIt was as

if verbal

overshadowing, my big new idea, was getting weaker.â€ At

first, he assumed that heâ€™d made an error in experimental design or a

statistical miscalculation. But he couldnâ€™t find anything wrong with his

research. He then concluded that his initial batch of research subjects must

have been unusually susceptible to verbal overshadowing. ( , similarly,

has speculated that part of the drop-off in the effectiveness of antipsychotics

can be attributed to using subjects who suffer from milder forms of psychosis

which are less likely to show dramatic improvement.) â€œIt wasnâ€™t a very

satisfying explanation,â€ Schooler says.

â€œOne of my mentors told me that my real mistake was trying to replicate my

work. He told me doing that was just setting myself up for

disappointment.â€Schooler tried to put the problem out of his mind; his

colleagues assured him that such things happened all the time.Â

Over the next few years, he found new research questions, got married and had

kids. But his

replication problem kept on getting worse. His first attempt at replicating the

1990 study, in 1995, resulted in an effect that was thirty per cent smaller. The

next year, the size of the effect shrank another thirty per cent. When other

labs repeated Schoolerâ€™s experiments, they got a similar spread of data, with

a

distinct downward trend. â€œThis was profoundly frustrating,â€ he says. â€œIt

was as if nature gave me this great result and then tried to take it back.â€ In

private, Schooler began referring to the problem as â€œcosmic habituation,â€ by

analogy to the decrease in response that occurs when individuals habituate to

particular stimuli. â€œHabituation is why you

donâ€™t notice the stuff thatâ€™s always there,â€ Schooler says. â€œItâ€™s an

inevitable process of adjustment, a ratcheting down of excitement. I started

joking that it was like the cosmos was habituating to my ideas. I took it very

personally.

" Schooler is now a

tenured professor at the University of California at Santa Barbara. He has

curly black hair,

pale-green eyes, and the relaxed demeanor of someone who lives five minutes away

from his favorite beach. When he speaks, he tends to get distracted by his own

digressions.Â He might begin with a point about memory, which reminds him of a

favorite quote, which

inspires a long soliloquy on the importance of introspection.

Before long, weâ€™re looking at pictures from Burning Man on his iPhone, which

leads us back to the fragile nature of memory.Although verbal overshadowing

remains a widely accepted theoryâ€”itâ€™s often invoked in the context of

eyewitness testimony, for instanceâ€”Schooler is still a

little peeved at the cosmos. â€œI know I should just move on already,â€ he

says. â€œI really should stop talking about this. But I canâ€™t.â€ Thatâ€™s

because he is convinced that he has stumbled on a serious problem, one

that afflicts many of the most exciting new ideas in psychology. One of the

first demonstrations of this mysterious phenomenon came in the early

nineteen-thirties. ph Banks Rhine, a psychologist at Duke, had developed an

interest in the possibility of extrasensory perception, or E.S.P. Rhine devised

an experiment featuring Zener cards, a special

deck of twenty-five cards printed with one of five different symbols: a card was

drawn from the deck and the subject was asked to guess the symbol. Most of

Rhineâ€™s subjects guessed about twenty per cent of the cards correctly, as

youâ€™d expect, but an undergraduate named Adam

Linzmayer averaged nearly fifty per cent during his initial sessions, and pulled

off several uncanny streaks, such as guessing nine cards in a row. The odds of

this happening by chance are about one in two million.

Linzmayer did it three times.Rhine documented these stunning results in his

notebook and prepared

several papers for publication. But then, just as he began to believe in the

possibility of extrasensory perception, the student lost his spooky talent.

Between 1931 and 1933, Linzmayer guessed at the identity of another several

thousand cards, but his success rate was now barely above chance. Rhine was

forced to conclude that the studentâ€™s â€œextra-sensory perception ability has

gone through a marked decline.â€ And Linzmayer wasnâ€™t the only subject to

experience such a drop-off: in nearly every case in which Rhine and others

documented E.S.P. the effect dramatically diminished over time.

Rhine called this trend the â€œdecline effect.â€Schooler was

fascinated by Rhineâ€™s experimental struggles. Here was a scientist who had

repeatedly documented the decline of his data; he seemed to have a talent for

finding results that fell apart. In 2004, Schooler embarked on an ironic

imitation of Rhineâ€™s research: he tried to replicate this

failure to replicate. In homage to Rhineâ€™s interests, he decided to test

for a parapsychological phenomenon known as precognition. The experiment itself

was straightforward: he flashed a set of images to a subject and asked him or

her to identify each one. Most of the time, the response was negativeâ€”the

images were displayed too quickly to

register. Then Schooler randomly selected half of the images to be shown again.

What he wanted to know was whether the images that got a secondÂ showing were

more likely to have been identified the first time around.

Could subsequent exposure have somehow influenced the initial results?

Could the effect become the cause? The craziness of the hypothesis was the

point: Schooler knows that precognition lacks a scientific explanation. But he

wasnâ€™t testing extrasensory powers; he was testing the decline effect. â€œAt

first, the data looked amazing, just as weâ€™d expected,â€ Schooler says. â€œI

couldnâ€™t believe the amount of precognition we were finding. But then, as we

kept on running subjects, the effect sizeâ€â€”a standard statistical

measureâ€”â€œkept on getting smaller and smaller.â€ The scientists eventually

tested more than two thousand

undergraduates. â€œIn the end, our results looked just like Rhineâ€™s,â€

Schooler said. â€œWe found this strong paranormal effect, but it disappeared on

us.â€The most likely explanation for the decline is an obvious one: regression

to the mean. As the experiment is repeated, that is, an early statistical fluke

gets cancelled out. The extrasensory powers of Schoolerâ€™s subjects didnâ€™t

declineâ€”they were simply an

illusion that vanished over time. And yet Schooler has noticed that many of the

data sets that end up declining seem statistically solidâ€”that is, they contain

enough data that any regression to the mean shouldnâ€™t be dramatic. â€œThese

are the results that pass all the

tests,â€ he says.

â€œThe odds of them being random are typically quite remote, like one in a

million. This means that the decline effect should almost never happen.

But it happens all the time! Hell, itâ€™s happened to me multiple times.â€ And

this is why Schooler believes that the decline effect deserves more attention:

its ubiquity seems to violate the laws of statistics.Â â€œWhenever I start

talking about this, scientists get very nervous,â€ he says. â€œBut I still want

to know what happened to my results. Like most scientists, I assumed that it

would get easier to document my effect over time. Iâ€™d get better at doing the

experiments, at zeroing in on the conditions that produce verbal overshadowing.

So why did the opposite happen? Iâ€™m convinced that we can use the tools of

science to figure

this out. First, though, we have to admit that weâ€™ve got a problem.â€In1991,

the Danish zoologist Anders MÃ¸ller, at Uppsala

University, in Sweden, made a remarkable discovery about sex, barn swallows,

and symmetry. It had long been known that the asymmetrical appearance of a

creature was directly linked to the amount of mutation in its genome, so that

more mutations led to more â€œfluctuating asymmetry.â€ (An easy way to measure

asymmetry in humans is to compare the length of the fingers on each hand.) What

MÃ¸ller discovered is that female barn swallows were far more likely to mate

with male birds that had long, symmetrical feathers. This suggested that the

picky females were using symmetry as a proxy for the quality of male genes.

MÃ¸llerâ€™s paper, which was published in Nature, set off a frenzy of research.

Here was an easily measured, widely applicable indicator of genetic quality, and

females could be shown to gravitate toward it. Aesthetics was really about

genetics.In the three years following, there were ten independent tests of the

role of fluctuating

asymmetry in sexual selection, and nine of them found a relationship between

symmetry and male reproductive success. It didnâ€™t matter if scientists were

looking at the hairs on fruit flies or replicating the swallow studiesâ€”females

seemed to prefer males with mirrored halves. Before long, the theory was applied

to humans. Researchers found, for instance, that women preferred the smell of

symmetrical men, but only during the fertile phase of the menstrual cycle. Other

studies claimed that females had more orgasms when their partners were

symmetrical, while a paper by anthropologists at Rutgers analyzed forty Jamaican

dance routines and discovered that symmetrical men were consistently rated as

better dancers. Then the theory started to fall apart. In 1994, there were

fourteen published tests of symmetry and sexual selection, and only

eight found a correlation. In 1995, there were eight papers on the subject, and

only four got a positive result.

By 1998, when there were twelve additional investigations of fluctuating

asymmetry, only a third of them confirmed the theory. Worse still, even the

studies that yielded some positive result showed a steadily declining effect

size.

Between 1992 and 1997, the average effect size shrank by eighty per cent. And

itâ€™s not just fluctuating asymmetry. In 2001, Jennions, a biologist at

the Australian National University, set out to analyze â€œtemporal trendsâ€

across a wide range of subjects in ecology and

evolutionary biology. He looked at hundreds of papers and forty-four

meta-analyses (that is, statistical syntheses of related studies), and

discovered a consistent decline effect over time, as many of the theories seemed

to fade into irrelevance. In fact, even when numerous variables were controlled

forâ€”Jennions knew, for instance, that the same

author might publish several critical papers, which could distort his

analysisâ€”there

was still a significant decrease in the validity of the hypothesis, often

within a year of publication. Jennions admits that his findings are troubling,

but expresses a reluctance to talk about them publicly. â€œThis is a very

sensitive issue for scientists,â€ he says. â€œYou

know, weâ€™re supposed to be dealing with hard facts, the stuff thatâ€™s

supposed to stand the test of time. But when you see these trends you

become a little more skeptical of things.â€What happened? Leigh

, a biologist at the University of Western Australia, suggested

one explanation when he told me about his initial enthusiasm for the

theory: â€œI was really excited by fluctuating asymmetry. The early

studies made the effect look very robust.â€ He decided to conduct a few

experiments of his own, investigating symmetry in male horned beetles.

â€œUnfortunately, I couldnâ€™t find the effect,â€ he said. â€œBut the worst

part was that when I

submitted these null results I had difficulty getting them published. The

journals only wanted confirming data. It was too exciting an idea to disprove,

at least back then.â€ For , the

steep rise and slow fall of fluctuating asymmetry is a clear example of a

scientific paradigm, one of those intellectual fads that both guide and

constrain research: after a new paradigm is proposed, the peer-review process is

tilted toward positive results. But then, after a few years, the academic

incentives shiftâ€”the paradigm has become

entrenchedâ€”so that the most notable results are now those that disprove the

theory. Jennions, similarly, argues that the decline effect is largely a product

of publication bias, or the tendency of scientists and scientific journals to

prefer positive data over null results, which is what happens when no effect is

found. The bias was first identified

by the statistician Theodore Sterling, in 1959, after he noticed

that ninety-seven per cent of all published psychological studies with

statistically significant data found the effect they were looking for. A

â€œsignificantâ€ result is defined as any data point that would be produced by

chance less than five per cent of the time. This ubiquitous

test was invented in 1922 by the English mathematician Fisher, who picked

five per cent as the boundary line, somewhat arbitrarily, because it made pencil

and slide-rule calculations easier. Sterling saw that if ninety-seven per cent

of psychology studies were proving their hypotheses, either psychologists were

extraordinarily lucky or they published only the outcomes of successful

experiments. In recent years, publication bias has mostly been seen as a problem

for clinical trials, since pharmaceutical companies are less interested in

publishing results that arenâ€™t favorable. But itâ€™s becoming increasingly

clear that publication bias also produces major

distortions in fields without large corporate incentives, such as psychology

and ecology.While publication bias almost certainly plays a role in the decline

effect, it remains an incomplete explanation. For one thing, it fails to account

for the initial prevalence of positive results among studies that never even

get submitted to journals. It also fails to explain the experience of people

like Schooler, who have been unable to replicate their initial data despite

their best efforts. Palmer, a biologist at the University of Alberta,

who has studied the problems surrounding

fluctuating asymmetry, suspects that an equally significant issue is the

selective reporting of resultsâ€”the data that scientists choose to document in

the first place. Palmerâ€™s most convincing evidence relies on a statistical

tool known as a funnel graph. When a large number of studies have been done on a

single subject, the data should follow a

pattern: studies

with a large sample size should all cluster around a common valueâ€”the true

resultâ€”whereas those with a smaller sample size should exhibit a random

scattering, since theyâ€™re subject to greater sampling error. This pattern

gives the graph its name, since the distribution resembles a funnel. The funnel

graph visually captures the distortions of selective reporting. For instance,

after

Palmer plotted every study of fluctuating asymmetry, he noticed that the

distribution of results with smaller sample sizes wasnâ€™t random at all but

instead skewed heavily toward positive results. Palmer has since documented a

similar problem in several other contested subject areas.

â€œOnce I realized that selective reporting is everywhere in science, I got

quite depressed,â€ Palmer told me. â€œAs a researcher, youâ€™re always aware

that there might be some nonrandom patterns, but I had no idea how widespread it

is.â€ In a recent review article, Palmer

summarized the

impact of selective reporting on his field: â€œWe cannot escape the troubling

conclusion that someâ€”perhaps manyâ€”cherished generalities are at best

exaggerated in their biological significance and at worst a collective illusion

nurtured by strong a-priori beliefs often repeated.â€

Palmer emphasizes that selective reporting is not the same as scientific fraud.

Rather, the problem seems to be one of subtle omissions and unconscious

misperceptions, as researchers struggle to make sense of their results.

Jay Gould referred to this as the â€œshoehorningâ€ process. â€œA lot of

scientific measurement is really hard,â€

told me. â€œIf youâ€™re talking about fluctuating asymmetry, then

itâ€™s a matter of minuscule differences between the right and left sides of an

animal. Itâ€™s millimetres of a tail feather. And so maybe a researcher knows

that heâ€™s measuring a good maleâ€â€”an animal that has successfully

matedâ€”â€œand he knows that itâ€™s supposed to be symmetrical. Well, that act

of measurement is going to be vulnerable to all sorts of perception biases.

Thatâ€™s not a cynical statement. Thatâ€™s just the way human beings work.â€

One of the classic examples of selective reporting concerns the testing of

acupuncture in different countries.

While acupuncture is widely accepted as a medical treatment in various Asian

countries, its use is much more contested in the West. These cultural

differences have profoundly influenced the results of clinical trials. Between

1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan,

and Japan, and every single trial concluded that acupuncture was an effective

treatment. During the same period, there were ninety-four clinical trials of

acupuncture in the United States, Sweden, and the U.K., and only fifty-six per

cent of these studies found any therapeutic benefits. As Palmer notes, this

wide

discrepancy suggests that scientists find ways to confirm their preferred

hypothesis, disregarding what they donâ€™t want to see. Our beliefs are a form

of blindness. Ioannidis, an epidemiologist at Stanford University, argues

that such distortions are a serious issue in biomedical research. â€œThese

exaggerations are why the decline

has become so common,â€ he says. â€œItâ€™d be really great if the initial

studies gave us an accurate summary of things. But they donâ€™t. And so what

happens is we waste a lot of money treating millions of patients and doing lots

of follow-up studies on other themes based on results

that are misleading.â€ In 2005, Ioannidis published an article in the Journal

of the American Medical Association that

looked at the forty-nine most cited clinical-research studies in three major

medical journals. Forty-five of these studies reported positive results,

suggesting that the intervention being tested was

effective.

Because most of these studies were randomized controlled trialsâ€”the â€œgold

standardâ€ of medical evidenceâ€”they tended to have a significant impact on

clinical practice, and led to the spread of treatments such as hormone

replacement therapy for menopausal women and daily low-dose aspirin to prevent

heart attacks and strokes. Nevertheless, the data Ioannidis found were

disturbing: of the thirty-four claims that had been subject to replication,

forty-one per cent had either been directly contradicted or had their effect

sizes significantly downgraded.The situation is even worse when a subject is

fashionable. In recent years, for instance, there have been hundreds of studies

on the various genes that control the differences in disease risk between men

and women.

These findings have included everything from the mutations responsible for the

increased risk of schizophrenia to the genes underlying hypertension. Ioannidis

and

his colleagues looked at four hundred and thirty-two of these claims. They

quickly discovered that the vast majority had serious flaws. But the most

troubling fact emerged when he

looked at the test of replication: out of four hundred and thirty-two claims,

only a single one was consistently replicable. â€œThis doesnâ€™t mean that none

of these claims will turn out to be true,â€ he says. â€œBut, given that most of

them were done badly, I wouldnâ€™t hold my breath.â€ According to Ioannidis,

the main problem is that too many researchers engage in what he calls

â€œsignificance chasing,â€ or finding ways to interpret the data so that it

passes the statistical test of significanceâ€”the ninety-five-per-cent boundary

invented by Fisher. â€œThe scientists are so eager to pass this magical

test that they start playing around with the numbers, trying to find anything

that seems worthy,â€ Ioannidis says. In recent years, Ioannidis has become

increasingly blunt about the pervasiveness of the problem.

One of his most cited papers has a deliberately provocative title: â€œWhy Most

Published Research Findings Are False.â€ The problem of selective reporting is

rooted in a fundamental cognitive flaw, which is that we like proving ourselves

right and hate being wrong. â€œIt feels good to validate a hypothesis,â€

Ioannidis said. â€œIt feels even better when youâ€™ve got a financial interest

in the idea or your career depends upon it. And thatâ€™s why, even after a claim

has been systematically disprovenâ€â€”he cites, for instance, the early work on

hormone replacement therapy, or claims

involving various vitaminsâ€”â€œyou still see some stubborn researchers citing

the first few studies that show a strong effect. They really want to believe

that itâ€™s true.â€Thatâ€™s why Schooler argues that scientists need to become

more rigorous about data collection before they publish. â€œWeâ€™re

wasting too much time chasing after bad studies and

underpowered experiments,â€ he says. The current â€œobsessionâ€ with

replicability distracts from the real problem, which is faulty design.

He notes that nobody even tries to replicate most science papersâ€”there

are simply too many. (According to Nature, a third of all studies

never even get cited, let alone repeated.) â€œIâ€™ve learned the hard way

to be exceedingly careful,â€ Schooler says. â€œEvery researcher should have to

spell out, in advance, how many subjects theyâ€™re going to use, and what

exactly theyâ€™re testing, and what constitutes a sufficient level of proof. We

have the tools to be much more transparent about our

experiments.â€In a forthcoming paper, Schooler recommends the establishment of

an open-source database, in which researchers are required to outline their

planned investigations and document all their results. â€œI think this would

provide a huge

increase in access to

scientific work and give us a much better way to judge the quality of an

experiment,â€ Schooler says. â€œIt would help us finally deal with all these

issues that the decline effect is exposing.â€Although such reforms would

mitigate the dangers of publication bias and selective reporting, they still

wouldnâ€™t erase the decline effect. This is largely because scientific research

will always be shadowed by a force that canâ€™t be curbed, only contained: sheer

randomness. Although little research has been done on the experimental dangers

of chance and happenstance, the research that exists isnâ€™t encouraging. In the

late nineteen-nineties, Crabbe, a neuroscientist at the Oregon Health and

Science University, conducted an experiment that showed how

unknowable chance events can skew tests of replicability. He performed a series

of experiments on mouse behavior in three different science labs: in Albany, New

York;

Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments,

he tried to standardize every

variable he could think of. The same strains of mice were used in each lab,

shipped on the same day from the same supplier.

The animals were raised in the same kind of enclosure, with the same brand of

sawdust bedding. They had been exposed to the same amount of incandescent light,

were living with the same number of littermates, and were fed the exact same

type of chow pellets. When the mice were handled, it was with the same kind of

surgical glove, and when they were tested it was on the same equipment, at the

same time in the morning. The premise of this test of replicability, of course,

is that each of the labs should have generated the same pattern of results.

â€œIf any set of experiments should have passed the test, it should have been

ours,â€ Crabbe says. â€œBut thatâ€™s not the way it turned out.â€ In one

experiment, Crabbe

injected a particular strain of mouse with cocaine. In Portland the mice given

the drug moved, on average, six hundred centimetres more than they normally did;

in Albany they moved seven hundred and one additional centimetres. But in the

Edmonton lab they moved more than five thousand additional centimetres. Similar

deviations were observed in a test of anxiety. Furthermore, these

inconsistencies didnâ€™t follow any detectable pattern. In Portland one strain

of mouse proved most anxious, while in Albany another strain won that

distinction. The disturbing implication of the Crabbe study is that a lot of

extraordinary scientific data are nothing but noise. The hyperactivity of those

coked-up Edmonton mice wasnâ€™t an interesting new factâ€”it was a meaningless

outlier, a by-product of invisible variables we donâ€™t understand.

The problem, of course, is that such dramatic findings are also the most likely

to get published in prestigious journals,

since the data are both statistically significant and entirely unexpected.

Grants get written, follow-up studies are conducted. The end result is a

scientific accident that can take years to unravel.This suggests that the

decline effect is actually a decline of illusion. While Karl Popper imagined

falsification occurring with a single, definitive experimentâ€”Galileo refuted

Aristotelian mechanics in an afternoonâ€”the process turns out to be much

messier than that. Many scientific theories continue to be considered true even

after failing numerous experimentaltests. Verbal overshadowing might exhibit the

decline effect, but it remains extensively relied upon within the field. The

same holds for any number of phenomena, from the disappearing benefits of

second-generation antipsychotics to the weak coupling ratio exhibited by

decaying neutrons, which appears to have fallen by more than ten standard

deviations between 1969 and 2001. Even the law of

gravity

hasnâ€™t always been perfect at predicting real-world phenomena. (In one test,

physicists measuring gravity by means of deep boreholes in the Nevada desert

found a two-and-a-half-per-cent discrepancy between the

theoretical predictions and the actual data.) Despite these findings,

second-generation antipsychotics are still widely prescribed, and our model of

the neutron hasnâ€™t changed. The law of gravity remains the same.

Such anomalies demonstrate the slipperiness of empiricism. Although many

scientific ideas generate conflicting results and suffer from falling effect

sizes, they continue to get cited in the textbooks and drive standard medical

practice. Why? Because these ideas seem true.

Because they make sense. Because we canâ€™t bear to let them go. And this is

why the decline effect is so troubling. Not because it reveals the human

fallibility of science, in which data are tweaked and beliefs shape perceptions.

(Such

shortcomings arenâ€™t surprising, at least for scientists.) And not because it

reveals that many of our most exciting theories are fleeting fads and will soon

be rejected. (That idea has been around since Kuhn.) The decline effect

is troubling because it reminds us how difficult it is to prove anything. We

like to pretend that our experiments define the truth for us. But thatâ€™s often

not the

case. Just because an idea is true doesnâ€™t mean it can be proved. And just

because an idea can be proved doesnâ€™t mean itâ€™s true. When the experiments

are done, we still have to choose what to believe. â™¦

Â Â Â Â Â Â Â Â

Sign In

Fw: Is there something wrong with the scientific method?--sorry--the formatting was off 1st time

Recommended Posts

Guest guest

Link to comment

Share on other sites

Join the conversation

Activity