Behavioral Treatments for Insomnia
By Luke Muehlhauser
Editor’s note: This article was published under our former name, Open Philanthropy. Some content may be outdated. You can see our latest writing here.
Updated: November 2016
It is widely believed, and seems likely, that regular, high-quality sleep is important for personal performance and well-being, as well as for public safety and other important outcomes. Unfortunately, many people are unable to fall asleep as quickly as desired and/or unable to stay asleep as long as desired — a condition known as insomnia. Reliable and scalable treatments for insomnia could bring substantial humanitarian benefit.
The Open Philanthropy Project hopes to survey a wide range of potential cause areas in the social sciences, only some of which will turn out to look promising enough to warrant deeper investigation and potential grantmaking. We chose to conduct a brief, surface-level investigation of the evidence for the effectiveness of standard behavioral treatments for insomnia because we thought it might turn out to be promising enough to warrant deeper investigation, and because it seemed to be a well-contained topic on which we could experiment with variations on our process for generating such reports (more on this below). We might or might not investigate non-behavioral treatments for insomnia later.
I (Luke Muehlhauser) had two goals for this project: to identify the most well-regarded behavioral treatments for insomnia, and then evaluate what the state of the evidence on those treatments’ effectiveness appears to be. I did not attempt to closely examine any studies I found — mostly, I only evaluated “surface features” such as what methods the study authors claim to have used.
My overall conclusion, described in more detail here, is that I don’t think we have strong evidence to suggest that standard behavioral treatments for insomnia are effective at ≥1mo after treatment.
To increase the speed with which we can survey the evidence concerning many different potential cause areas, we decided not to invest as much time on exposition and thoroughness as we have for some other investigations.
If you know of studies or reviews which seem like they should have been mentioned or cited in this report, or which are important but were published after the initial release of this report, please send them to socialscienceupdates+insomnia@coefficientgiving.org along with your comments, if any.
My process
For this report, I will describe my literature search process in less detail than I did for my carbs-obesity report. This time, we experimented with a different investigation process that we hoped would require less overhead on our part but still allow the report to be vetted for accuracy by Open Philanthropy Project staff and by external readers. To that end, the sections describing my tentative conclusions about treatments for insomnia are footnoted with vettable probability statements such as “To be more precise: I am X% confident that my spreadsheet of RCTs on this topic includes at least Y% of RCTs on this topic which have features A, B, and C.” We suspect these probability statements will be useful only for some readers. I explain our motivations for providing these probability statements in more detail in a footnote.[1]Both GiveWell and the Open Philanthropy Project aim to communicate the evidence and reasoning behind our claims in a transparent manner whenever such transparency is not cost-prohibitive (see here, here, and here). One way to achieve such transparency is to provide support (from the scientific … Continue reading
Here is a brief account of my literature search process. First, I searched for general overview articles on treatments for insomnia,[2]Most of my searches use Google Scholar, because in my own experiments I have found that it (1) uses a more comprehensive database of sources than e.g. PubMed or EMBASE, and that it (2) more reliably brings the most relevant results to the top of the search results pages than other literature … Continue reading and quickly learned that the relevant literature is organized under the heading of “sleep medicine.” I used these general overview articles to familiarize myself with the standard concepts, treatments, and outcome measures used in the field.
Once I learned that standard insomnia treatments have been tested by many randomized controlled trials (RCTs),[3] The way I use the term, an RCT can have either a placebo control or an “active” control or both. I decided to focus only on the evidence for treatment effectiveness from RCTs. I then searched for systematic reviews (SRs) of RCTs for insomnia treatments, and found that the literature could be divided into three major categories of treatments: psychological/behavioral treatments for insomnia (I’ll call them “BTIs”), pharmacological treatments for insomnia (“PTIs”), and alternative treatments for insomnia (“ATIs”) such as acupuncture. For now, I decided to investigate only BTIs.
To survey SR-included RCTs testing the effectiveness of commonly-tested BTIs, I did the following:
- I made a spreadsheet (here) of all the SRs of RCTs (plus other studies, in some cases) testing the effectiveness of any kind of insomnia treatment, published online before October 2015. I found ~70 such SRs, and I think this is a fairly complete list.[4] To be more precise: I’m 70% confident there are fewer than 5 SRs on this topic, that I did not find, published online before October 2015, which include at least 5 RCTs testing the effectiveness of one or more treatments for insomnia.
- I identified the SRs on this list that focused entirely or mostly on BTIs (rather than PTIs or ATIs).
- Two GiveWell staff members[5] Karalyn Lacey and Tracy Williams collectively put in more than 30 hours of work into this project. My thanks to them both! identified all unique RCTs across these SRs,[6] I also used this table of RCTs to identify additional SRs. Specifically, I identified the RCTs cited by the most SRs, and then checked Google Scholar for sources citing those RCTs and using SR-related keywords. and identified which ones met certain criteria discussed below, for example which outcome measures were used at each study’s last follow-up assessment. I spot-checked their work.[7]The chance of errors in our spreadsheet of SR-included RCTs was considered when stating my probabilities (later in this document) that there are RCTs of various types that I did not review. So, too, was the chance that some RCTs we could not retrieve (and thus could not evaluate) actually do meet … Continue reading Our spreadsheet of SR-included RCTs is available here.
- I quickly reviewed the RCTs matching certain criteria, and wrote my tentative conclusions below.
An initial draft of this report was internally vetted in April and May of 2016 by Sarah Ward, which led to a few minor error corrections. We considered publishing the details of the vet and the edits it prompted, but in this case doing so would have been prohibitively time-expensive, and we decided not to do so.
How insomnia treatments are studied
First let me set the stage for my later substantive claims about BTIs by explaining some basic concepts related to BTIs, as explained by recent narrative reviews on the topic.[8]E.g. Morin (2010); Morin & Benca (2012); Lichstein et al. (2012); Perlis et al. (2010); Miller et al. (2015); Bastien et al. (2012); Berry (2011), ch. 25; Wood & McCall (2013). Note that some of my sources in this report are chapters from the 5th edition of Kryger et … Continue reading
A patient with insomnia can’t fall asleep as quickly as they’d like, and/or can’t stay asleep as long as they’d like. Insomnia with one or more obvious medical, psychiatric, or environmental causes (e.g. acute pain) is known as comorbid or secondary insomnia; otherwise the condition is known as primary insomnia. Common BTIs for primary or comorbid insomnia include:
- Sleep restriction: Instruction to avoid the bed as much as possible when not sleeping in it.[9]Morin (2010) defines sleep restriction as “A method designed to restrict time spent in bed… as close as possible to the actual sleep time, thereby strengthening the homeostatic sleep drive” (p. 867). Lichstein et al. (2012) defines sleep restriction as “Prescribed time in bed is abruptly … Continue reading
- Stimulus control: Instructions which aim to strengthen the mental and physiological association between the bed and sleep, and to establish a regular sleeping schedule. E.g.: “Go to bed only when sleepy,” “Get out of bed when unable to sleep,” “No napping,” and “Arise at the same time every morning.”[10]Morin (2010) defines stimulus control as “A set of instructions designed to reinforce the association between the bed and bedroom with sleep and to re-establish a consistent sleep-wake schedule” and lists the following instructions: “Go to bed only when sleepy,” “Get out of bed when … Continue reading
- Sleep hygiene: Education about health practices (e.g. diet, exercise, substance use) and environmental factors (e.g. noise, light, temperature) that may affect sleep success.[11] Morin (2010) defines sleep hygiene education as “General guidelines about health practices (e.g., diet, exercise, substance use) and environmental factors (e.g., light, noise, excessive temperature) that may promote or interfere with sleep” (p. 867).
- Relaxation training: Procedures aimed at reducing arousal, muscle tension, and thoughts that may interfere with sleep, e.g. meditation and progressive muscle relaxation. Most of these procedures require some initial training and practice.[12]Morin (2010) defines relaxation training as “Clinical procedures (e.g., progressive muscle relaxation, meditation) aimed at reducing autonomic arousal, muscle tension, and intrusive thoughts interfering with sleep. Most relaxation procedures require some professional guidance initially and daily … Continue reading
- Cognitive therapy: Psychotherapy aimed at treating anxiety about sleep problems and reframing false beliefs about insomnia.[13] Morin (2010) defines cognitive therapy (for insomnia) as “Psychological approach using socratic questioning and behavioral experiments to reduce excessive worrying about sleep and to reframe faulty beliefs about insomnia and its daytime consequences” (p. 867).
- Cognitive behavioral therapy for insomnia (CBT-I): A combination of several different treatments from the above list, perhaps most commonly of sleep restriction, stimulus control, and sleep hygiene.[14] Morin (2010) defines CBT (for insomnia) as “A multimodal intervention combining some of the above cognitive and behavioral… procedures” (p. 867).
Hereafter, I’ll refer to these BTIs as “standard” BTIs.[15] Physical exercise is an example of another BTI, but the narrative reviews of BTIs I found tended to say little or nothing about physical exercise as a treatment for insomnia, and instead focused on the “standard” BTIs I’ve listed here.
CBT-I appears to be the most commonly-discussed BTI in the research literature, and is plausibly the most common BTI in clinical practice. It can be delivered on an individual basis or in a group setting, via self-help (with or without phone support), and via computerized delivery (with or without phone support). It can also be delivered simultaneously with PTIs and ATIs.[16] I found many or several RCTs testing each of these types of CBT-I.
In RCTs testing the effectiveness of standard BTIs, night-time sleep outcomes are typically measured with one or more of the following measures:
- Polysomnography (PSG): PSG combines objective measures of brain activity, eye movement, muscle activity and perhaps also heart rhythm, respiration, blood oxygen saturation, and other measures. PSG is widely considered the “gold standard” measure of sleep, but it has several disadvantages. It is expensive, complicated to interpret, requires some adaptation by the patient (people aren’t used to sleeping with wires attached to them), and is usually (but not always) administered at a sleep lab rather than at home.[17]Miller et al. (2015) explains: “Sleep and wake states are measured by EEG whereby electrodes on the scalp record electrical brain activity… The recording of brain activity by EEG is only one aspect of the overall diagnostic sleep study. The study can gather other information about the body … Continue reading
- Actigraphy (ACT): An actigraph is a watch-like device that uses an accelerometer to record movement during the night. Supposedly (I haven’t checked), it correlates well with PSG on at least two key variables — total sleep time (TST) and sleep efficiency (SE: percentage of time in bed spent asleep) — in healthy subjects, but agreement rates are lower in patients with insomnia. Actigraphy is less expensive and more convenient than PSG, and can easily be used at home.[18]Miller et al. (2015) explains: “Actigraphy is cost effective and more convenient than a full PSG… and it can be repeated across many nights to build an ecologically valid assessment of sleep without the first-night effect of PSG… Actigraphs are typically watch-like devices worn on the … Continue reading
- Sleep diary (SD): Subjects are asked to fill out a daily diary of sleep outcomes, usually including TST, SE, sleep onset latency (SOL: how long it took to fall asleep), wake-time after initial sleep onset (WASO), and perhaps other variables. Usually, subjects are asked to self-report these outcomes in the morning, for the previous night’s sleep. Supposedly (I haven’t checked), SD is known to be less accurate than PSG or ACT, but it seems to be the most common measure of sleep outcomes in RCTs of BTIs.[19]Miller et al. (2015) explains: “Sleep diaries are widely used in sleep science… Self-monitoring of sleep through a sleep diary… normally includes the following estimated measures: sleep onset latency (SOL), wake-time after initial sleep onset (WASO), TST, total time spent in bed, sleep … Continue reading
- Questionnaires: A variety of standardized questionnaires are available to measure sleep outcomes, the most common of which is probably the Pittsburgh sleep quality index (PSQI). The PSQI includes 19 questions about sleep quality over the past month, and results in a total score for overall sleep quality as well as 7 component scores (e.g. sleep duration and sleep quality). I haven’t checked how valid and reliable this measure is.[20]Miller et al. (2015) explains: “Sleep can be profiled subjectively through self-report questionnaire measures… The Pittsburgh sleep quality index… is one of the most widely used self-report measures for the assessment of sleep quality… This is a … retrospective assessment of sleep … Continue reading
In the section below, I focus on measurements of TST and SE, because these two variables seem (to me) to capture the most relevant outcome information without requiring that I check the results for a cumbersomely long list of outcome variables, and because they are two of the most commonly measured outcome variables.
How effective are commonly tested BTIs?
To quickly assess the likely effectiveness of commonly tested BTIs, I looked only at SR-included RCTs testing the effectiveness of standard BTIs for adults (or mostly adults). Approximately 180 unique RCTs were included across all the SRs I found (published online before October 2015).[21]I did not look at some SRs that included RCTs on all kinds of treatments (rather than focusing on BTIs): Belanger et al. (2007), McCurry et al. (2007), and Smith et al. (2002). Finally, I did not look at SRs published earlier than Morin et al. (1999), the first SR of nonpharmacological … Continue reading
Long-term effectiveness, measured objectively
First, I looked at RCTs that (1) measured the long-term (≥6mo) effectiveness of one or more BTIs, and that (2) used at least one objective measure of sleep (PSG or ACT) during the last follow-up measurement.[22] See below for more details on why I focused on objective measures of sleep.
These criteria yielded ~20 RCTs. Unfortunately, only 7 of these RCTs had a neutral control, retained it through a follow-up period of at least 6 months (thus allowing meaningful comparisons between active treatment and neutral control at that follow-up), and reported objectively-measured TST or SE at that follow-up.[23]“Neutral control” is contrasted with a “positive” control, i.e. another active treatment. I excluded Hauri (1981) from the tally of RCTs because, though it was included in one of the SRs I found, it does not test one of the standard modern BTIs, but instead tests biofeedback treatments. I … Continue reading The results of these studies, focusing on objectively-measured TST and SE, are:
| STUDY [LAST FOLLOW-UP] | TREATMENT CONDITIONS AT LAST FOLLOW-UP | PARTICIPANTS (AT LAST FOLLOW-UP) | OBJECTIVELY MEASURED TST AND SE AT LAST FOLLOW-UP |
|---|---|---|---|
| Lichstein et al. (2001) [12mo] | Relaxation therapy vs. sleep compression vs. placebo desensitization | 74 subjects from Memphis, 59 or older, chronic primary insomnia, no sleep apnea, no sleep medications, plus some other criteria | PSG: TST and SE were worse for relaxation therapy subjects than placebo subjects. Sleep compression subjects averaged ~40 more minutes of TST than placebo subjects, and ~7 percentage points greater sleep efficiency. Statistical significance of these differences not reported.[24]The paper doesn’t seem to report whether these differences are statistically significant, and I did not take the time myself to compute their statistical significance and check whether e.g. distributional assumptions were met. The discussion section makes the general claim that “Our main … Continue reading |
| Wu et al. (2006) [8mo] | CBT-I vs. placebo tablets[25] This trial also included pharmacotherapy treatments that are not listed here because they are not BTIs. | 36 subjects[26] The study reports that 71 subjects completed the treatment protocol (36 in the CBT-I and placebo conditions), but doesn’t say whether any subjects dropped out between post-treatment and the last follow-up measurements. from an unspecified location (Beijing?), chronic primary insomnia, no sleep apnea, no sleep medication, plus some other criteria | PSG: CBT-I subjects averaged ~53 more minutes of TST than controls, and ~10 percentage points greater sleep efficiency. Statistical significance of these differences not reported.[27] See Table 1. The paper doesn’t seem to report whether these differences (between CBT-I and placebo) are statistically significant, and I did not take the time myself to compute their statistical significance and check whether e.g. distributional assumptions were met. |
| Berger et al. (2009) [12mo] | CBT-I vs. healthy eating instructions[28]The treatment condition is called the “Individualized Sleep Promotion Plan,” but this turns out to be a combination of stimulus control, sleep restriction, relaxation therapy, and sleep hygiene, which is consistent with what is normally called CBT-I. I counted the “healthy eating” control … Continue reading | 155 female subjects from the U.S. Midwest, breast cancer-related fatigue, receiving chemotherapy, no pre-cancer insomnia or sleep apnea, plus some other criteria | ACT: CBT-I patients averaged 16 more minutes of TST than controls, and 1 percentage point higher “sleep percent after onset.” Statistical significance of these differences not reported.[29]The published paper refers to this information as being in “Table B,” but Table B was never published. The numbers I provide here — both for outcomes and for subject count (counting only subjects measured by ACT) — are taken from personal communication with the study’s lead author, Dr. … Continue reading |
| Espie et al. (2008) [6mo] | CBT-I vs. treatment as usual (TAU) | 106 subjects from Scotland, chronic insomnia, diagnosed with cancer, no sleep apnea, plus some other criteria | ACT: No effect of CBT-I over TAU for either TST or SE.[30] See Table 4. |
| Edinger et al. (2005) [6mo] | CBT-I vs. sleep hygiene vs. TAU | 20 subjects from an unspecified location (near Durham, NC?), insomnia and fibromyalgia but not other comorbidities, no sleep apnea, plus some other criteria | ACT: No group differences for either TST or SE.[31] See Table 3. |
| McCurry et al. (2014) [18mo] | CBT for pain vs. CBT for pain and insomnia vs. education only | 320 subjects, members of a health maintenance organization in Washington state (“Group Health”), aged 60 or older, had received care for osteoarthritis at Group Health in the past 3 years, with chronic pain and insomnia, no sleep apnea, plus some other criteria[32] Details about the subjects of this RCT are provided on p. 948 of an earlier paper, Vitiello et al. (2013). | ACT: TST not measured. No differences between groups for SE.[33] “…benefits for insomnia observed over 9 mo were reduced at 18 mo and did not achieve statistical significance for any group comparison” (p. 302). |
| Lichstein et al. (2013) [12mo] | CBT vs. placebo biofeedback vs. withdrawal | 61 subjects from (or near) Memphis, diagnosed with hypnotic-dependent insomnia, aged 50 or older, no sleep apnea, plus some other criteria | PSG: Probably no group differences for either TST or SE (but statistical significance not reported).[34] See p. 792. |
The results of these studies are inconsistent. Though three of these seven studies did not report the statistical significance of the comparisons that most interested me,[35] This tally does not include McCurry et al. (2014), which reported follow-up data for SE but never collected TST measurements. I would guess (based on effect sizes) that the first two studies (as listed above) each found positive and statistically significant effects of at least one BTI on TST and SE (objectively measured, at last follow-up) that are both statistically significant and large enough to be “practically” significant, whereas the last five studies (as listed above) found no statistically or practically significant effects (objectively measured, at last follow-up).
Moreover, these seven trials were only moderately pragmatic in design.[36]To be more precise: If three professionally trained users of the PRECIS-2 tool used that tool to assess the pragmaticness of these seven trials, I’m 70% confident that none of these trials would be achieve an average domain score of 3.7 or higher (after averaging the domain scores from each of … Continue reading For example, subject eligibility was usually tightly restricted, resulting in a sample of subjects that is not especially representative of the population we’d like to treat for insomnia. All else equal, I consider pragmatic trials to provide stronger evidence of broad intervention effectiveness than explanatory trials do, for reasons described here.
Medium-term effectiveness, measured objectively
Perhaps it is too much to hope for that we could have good evidence that BTIs are effective ≥6mo after treatment. What if we look at standard BTIs’ effects on objectively measured TST and SE at the last follow-up occurring ≥1mo and ≤3mo after treatment?
These criteria yielded 16 RCTs. Unfortunately, only 3 of these RCTs had a neutral control, retained it through the designated follow-up period, and reported objectively-measured TST or SE at the designated follow-up period.[37]Currie et al. (2000) is excluded from this tally because, though it used ACT at follow-up, it did not report ACT-measured SE or TST at follow-up. I also excluded Taylor et al. (2014), Fiorentino et al. (2010), and Hoch et al. (2001) because they are pilot studies. Barsevick et al. (2010) is … Continue reading The results of these studies, focusing on objectively measured TST and SE, are:
| STUDY [LAST FOLLOW-UP ≥1MO AND ≤3MO AFTER TREATMENT] | TREATMENT CONDITIONS AT DESIGNATED FOLLOW-UP | PARTICIPANTS (AT DESIGNATED FOLLOW-UP) | OBJECTIVELY MEASURED TST AND SE AT DESIGNATED FOLLOW-UP. |
|---|---|---|---|
| Wu et al. (2006) [3mo] | CBT-I vs. placebo tablets[38] This trial also included pharmacotherapy treatments that are not listed here because they are not BTIs. | 36 subjects[39] The study reports that 71 subjects completed the treatment protocol (36 in the CBT-I and placebo conditions), but doesn’t say whether any subjects dropped out between post-treatment and the 3mo follow-up measurements. from an unspecified location (Beijing?), chronic primary insomnia, no sleep apnea, no sleep medication, plus some other criteria | PSG: CBT-I subjects averaged ~71 more minutes of TST than controls, and ~17 percentage points greater sleep efficiency. Statistical significance of these differences not reported.[40] See Table 1. The paper doesn’t seem to report whether these differences (between CBT-I and placebo) are statistically significant, and I did not take the time myself to compute their statistical significance and check whether e.g. distributional assumptions were met. |
| Lovato et al. (2014) [3mo] | CBT-I vs. wait list control | 99 subjects from near Adelaide, South Australia, chronic insomnia, no sleep apnea, plus some other criteria | ACT: CBT-I subjects averaged ~30 fewer minutes of TST than controls. No difference for sleep efficiency.[41] See Table 2. |
| Berger et al. (2009) [3mo] | CBT-I vs. healthy eating instructions | 160 female subjects from the U.S. Midwest, breast cancer-related fatigue, receiving chemotherapy, no pre-cancer insomnia or sleep apnea, plus some other criteria | ACT: CBT-I patients averaged 10 more minutes of TST than controls, and 1 percentage point higher “sleep percent after onset.” Statistical significance of these differences not reported.[42]As stated about this study in the previous table, the numbers I provide here — both for outcomes and for subject count (counting only subjects measured by ACT) — are taken from personal communication with the study’s lead author, Dr. Ann Berger, in January 2016. The paper doesn’t report … Continue reading |
The results of these studies are inconsistent. The first study listed above reported a CBT-I advantage that is plausibly practically and statistically significant, whereas the other two studies did not. Moreover, as with the RCTs summarized in the previous section, these three trials were only moderately pragmatic in design.[43]To be more precise: If three professionally trained users of the PRECIS-2 tool used that tool to assess the pragmaticness of these three trials, I’m 75% confident that none of these trials would be achieve an average domain score of 3.7 or higher (after averaging the domain scores from each of … Continue reading
Immediate effectiveness, measured via self-report
Finally, what if we look at self-reported TST and SE, immediately after treatment? This is the kind of summary statistic typically reported in meta-analyses of RCTs on the topic. Here are the findings from the most recent (2015-2016) SRs of RCTs of standard BTIs I reviewed:
| SR | FOCUS OF THE SR | INCLUDED RCTS | BASIC RESULTS FOR SELF-REPORTED TST AND SE, AT POST-TREATMENT |
|---|---|---|---|
| Johnson et al. (2016) | CBT-I for cancer survivors | 8 | “CBT-I resulted in a 15.5% improvement in SE relative to control conditions.” TST not reported. |
| Geiger-Brown et al. (2015) | CBT-I for comorbid insomnia | 23 | Standardized mean difference for TST was .25 and for SE was .93.[44] See Table 3. |
| Koffel et al. (2015) | Group CBT-I | 8 | Mean effect size for TST was -.04 and for SE was .84.[45] See Table 4. |
| Ho et al. (2015) | Self-help CBT-I | 20 | Mean effect size for TST was .24 and for SE was .80.[46] See Table S3. |
| Zacharie et al. (2015) | Internet-delivered CBT-I | 11 | Hedges’ g for TST was .29 and for SE was .58.[47] See Table 2, which claims these numbers were “adjusted for publication bias” using the trim and fill method. |
| Trauer et al. (2015) | CBT-I, excluding studies focused on comorbid insomnia | 20 | “TST improved by 7.61… minutes, and SE improved by 9.91%.” |
In short, these SRs tend to report practically relevant average effects on SE but not so much for TST.
However, I don’t summarize more details from these SRs, or summarize details from any SR with an official publication date earlier than 2015, because I don’t weight their meta-analytic findings very heavily in my consideration of the evidence, for two reasons.
First, I don’t trust the accuracy of self-reported sleep diary measurements. In part, this is because some (but not all) narrative reviews on insomnia report that sleep diaries are considered a less accurate measure of sleep than PSG or ACT.[48]For example, Miller et al. (2015) cites “Poor correlation with PSG” as a disadvantage of sleep diary measures, and the chapter later states that “When compared to gold standard PSG and actigraphy, sleep diaries tend to be less accurate…” Bastien et al. (2012) is less clear, and … Continue reading And while I couldn’t find any SRs of studies comparing self-report and objective measures of sleep in adults, I did find two SRs of studies comparing self-report (or parent-report) of sleep and objective measures in children and adolescents, and both of those SRs reported low correspondence between self-report/parent-report and objective measures.[49]Bauer & Blunden (2008) found 17 studies matching their criteria, and concluded that there is “a great deal of variance between subjective and objective sleep reports.” Hodge et al. (2012) focused on studies of children with autism spectrum disorders, found 11 studies matching their … Continue reading Moreover, both a priori reasoning about self-report measures and empirical reviews of the accuracy of self-report measures (across multiple domains) lead me to be suspicious of self-reported measures of sleep.[50]I am still researching the accuracy of self-report measures across multiple domains, and might or might not produce a separate report on the topic. In the meantime, I only have time to point to some of the sources that have informed my preliminary judgments on this question, without further comment … Continue reading Finally, it’s my impression, from the dozens of studies I skim-read for this investigation, that objective and self-report measures of sleep often disagree, with the self-report measures typically showing more beneficial effects of treatment than objective measures show.[51]To be more precise, I’ll describe two tests. First test: Select at random 20 RCTs from our spreadsheet of SR-included RCTs which reported both SD-measured TST and SE and objectively-measured TST and SE at ≥1mo follow-up (if there are multiple such follow-ups, select one at random for each … Continue reading
Second, I’m interested in lasting effects of treatment, not immediate post-treatment effects.
Finally, a point that applies to studies using either self-report measures or objective measures or both: I expect few to no RCTs on this topic to be both high quality and highly pragmatic.[52]To be more precise, I’ll describe two tests. The test for study quality is this: If three professionally trained users of the Cochrane Risk of Bias Tool (from version 5.1 of the Cochrane Handbook) used that tool to assess the quality of an RCT, and the RCT received a “low risk” rating for … Continue reading
My overall tentative conclusion
Standard BTIs have only rarely been tested against a neutral control at ≥1mo follow-up using objective measures of TST or SE in RCTs, and these results are inconsistent, with most such studies showing no practically significant effect of treatment at the follow-ups I checked. Moreover, I would guess that standard BTIs have never been tested in this way in a high-quality, highly pragmatic RCT. Given this, and given that I have many reasons to be suspicious of self-report measures of sleep quality, I don’t think we have strong evidence to suggest that standard BTIs are effective at ≥1mo.
I would be quite surprised if a more thorough search for RCTs testing the effectiveness of standard BTIs challenged this tentative conclusion.[53]A more precise statement of the 2nd sentence in my conclusion paragraph can be found in the previous footnote. To be more precise about the 1st sentence in my conclusion paragraph: I’m 75% confident that there are fewer than 12 RCTs, published online before October 2015, that I didn’t describe … Continue reading
If I were to substantially change my mind about this upon further investigation, my guess is that the most likely reasons for this change of mind would be:
- There turn out to be reasons to think self-reported sleep diary measurements of sleep are more accurate than I currently suspect they are, and a well-designed recent meta-analysis of RCTs relying on sleep diary measurements shows substantial positive effects of standard BTIs at ≥1mo (when sleep diary measures are used), in a variety of populations and contexts.
- There is at least one well-conducted, highly pragmatic RCT which shows that a standard BTI improves sleep outcomes at ≥1mo (using objective measures), but I didn’t find this RCT in my search. If I found one well-conducted pragmatic RCT of this nature, that could be more persuasive to me than meta-analyses of many small, weak, mostly explanatory RCTs, for reasons described here.
Despite my skepticism about the state of the evidence on the effectiveness of standard BTIs, I continue to suggest some standard BTIs (in particular sleep restriction and sleep hygiene) to insomnia sufferers who ask me for advice. I make this suggestion not based on scientific evidence, but based on my intuitive priors about which interventions seem to me like they might work, and the fact that these interventions are usually cheap to try.
In other words, my personal recommendation that insomnia sufferers at least try the sleep restriction and sleep hygiene treatments is given from the following perspective: “The effectiveness evidence in this area is weak. But sleep restriction and sleep hygiene seem intuitively to me like they might help at least some insomnia sufferers, and that’s not true of most possible insomnia treatments one could propose (e.g. various herbal treatments, about which I have no intuitions concerning effectiveness). If you’ve got insomnia, you might as well try sleep restriction and sleep hygiene and see whether they help you. But if I wanted to predict how much human welfare (via insomnia reduction) would accrue if someone spent several million dollars improving or scaling the delivery of standard BTIs, I would say I have no idea because the scientific evidence is too weak to allow me to make that kind of judgment, even as a guesstimate.”
What might I recommend funding in this area?
Obviously, I would want to investigate this topic more deeply before making any funding recommendations. But if I had to guess, on the basis of what I know now, which funding recommendations I’d make upon investigating further, I would guess I’d end up recommending something like the following.
Before anyone funds the first large, expensive, highly pragmatic RCT on this topic, I think we should make sure we’ve got an accurate and ecologically valid measure of sleep, and I’m worried that current actigraphs aren’t accurate enough, even if they’re more accurate than sleep diaries. So, I’d be curious to learn more about the feasibility of developing a night-time sleep measure that will strongly agree with PSG for approximately all populations and conditions. It seems to me like this might be feasible, plausibly via a combination method: e.g. perhaps a comfortable-to-wear headband or skullcap, plus an improved actigraph, and maybe also some little device that listens to one’s breathing throughout the night (or even something similar to this micro-CPAP device[54] Even if the linked and still under-development micro-CPAP device doesn’t actually work, it seems plausible that a similar small device devoted exclusively to measuring respiration might work. but only for measuring respiration). Basically: if the startup incubator X (formerly Google X) wanted to build a highly accurate measure of sleep that didn’t require attaching wires to people, what would they build?
If we had a highly accurate measure that subjects could use relatively cheaply at home — either because a new measure was developed or because actigraphy looks more accurate to me upon deeper investigation than it does now — then my next step would probably be to recommend a relatively small pre-registered RCT with ≥6mo follow-up, open data, blinding for everything that can be blinded, and so on — just to see if we could get some preliminary good news about person-delivered CBT-I vs. computerized CBT-I vs. placebo once we’re using an accurate and ecologically valid sleep measure and checking off basic methodological boxes like pre-registration. I’d also want to make sure more development effort goes into the computerized CBT-I intervention than is usually the case.
If one such RCT was promising, or perhaps only if a few such RCTs were promising, then I might be ready to recommend a large, well-designed, multi-site, highly pragmatic RCT with ≥6mo follow-up, testing the effectiveness of person-delivered CBT-I vs. computerized CBT-I vs. placebo.
I have very little sense of how much these things would cost. My guess is that if a better measure of sleep (of the sort I described) can be developed, it could be developed for $2M-$20M. I would guess that the “relatively small” RCTs I suggested might cost $1M-$5M each, whereas I would guess that a large, pragmatic RCT of the sort I described could cost $20M-$50M. But these numbers are just pulled from vague memories of conversations I’ve had with people about how much certain kinds of product development and RCT implementation cost, and my estimates could easily be off by a large factor, and maybe even an order of magnitude.
Sources
| DOCUMENT | SOURCE |
|---|---|
| Adamo et al. (2009) | Source |
| Airing micro-CPAP | Source (archive) |
| Amazon | Source (archive) |
| Barnow & Greenberg (2014) | Source (archive) |
| Barsevick et al. (2010) | Source (archive) |
| Bastien et al. (2012) | Source |
| Bauer & Blunden (2008) | Source (archive) |
| Belanger et al. (2007) | Source (archive) |
| Berger et al. (2009) | Source (archive) |
| Berry (2011) | Source |
| Bhandari & Wagner (2006) | Source (archive) |
| Bound et al. (2001) | Source (archive) |
| Bryant et al. (2014) | Source (archive) |
| Chan (2009) | Source |
| Cochrane Collaboration Risk of Bias Tool | Source (archive) |
| Currie et al. (2000) | Source (archive) |
| Donaldson & Grant-Vallone (2002) | Source (archive) |
| DynaMed | Source (archive) |
| Edinger et al. (2005) | Source (archive) |
| Espie et al. (2008) | Source (archive) |
| Fayers & Machin (2016) | Source (archive) |
| Fernandez-Ballesteros & Botella (2007) | Source |
| Fiorentino et al. (2010) | Source (archive) |
| Geiger-Brown et al. (2015) | Source (archive) |
| Geretsegger et al. (2012) | Source (archive) |
| Google Scholar | Source (archive) |
| Gorber et al. (2007) | Source (archive) |
| Gorber et al. (2009) | Source (archive) |
| Groves et al. (2009) | Source (archive) |
| Hauri (1981) | Source (archive) |
| Ho et al. (2015) | Source (archive) |
| Hoch et al. (2001) | Source (archive) |
| Hodge et al. (2012) | Source (archive) |
| Johnson et al. (2016) | Source (archive) |
| Jungquist et al. (2012) | Source (archive) |
| Koffel et al. (2015) | Source (archive) |
| Kowalski et al. (2012) | Source (archive) |
| Kryger et al. (2016), 6th Edition | Source |
| Kryger et al. (2016), 5th Edition | Source (archive) |
| Kuncel et al. (2005) | Source (archive) |
| Lichstein et al. (2001) | Source (archive) |
| Lichstein et al. (2012) | Source |
| Lichstein et al. (2013) | Source (archive) |
| Loudon et al. (2015) | Source (archive) |
| Lovato et al. (2014) | Source (archive) |
| Luke Muehlhauser, Insomnia treatment SRs | Source |
| McCurry et al. (2007) | Source (archive) |
| McCurry et al. (2014) | Source (archive) |
| Meyer et al. (2009) | Source (archive) |
| Miller et al. (2015) | Source |
| Morin (2010) | Source |
| Morin & Benca (2012) | Source (archive) |
| Morin et al. (1999) | Source (archive) |
| Open Philanthropy Project, RCTs included in SRs on behavioral treatments of insomnia | Source |
| Payne et al. (2008) | Source (archive) |
| Perlis et al. (2010) | Source (archive) |
| PRECIS-2 | Source (archive) |
| Price et al. (2011) | Source (archive) |
| Prince et al. (2008) | Source (archive) |
| Schwarz et al. (2008) | Source |
| Smith (2011) | Source (archive) |
| Smith et al. (2002) | Source (archive) |
| Stalans (2012) | Source (archive) |
| Stone et al. (1999) | Source (archive) |
| Stone et al. (2007) | Source (archive) |
| Streiner & Norman (2008) | Source (archive) |
| Suziedelyte & Johar (2013) | Source (archive) |
| Taylor et al. (2014) | Source (archive) |
| Thomas & Frankenberg (2002) | Source |
| Trauer et al. (2015) | Source (archive) |
| UpToDate | Source (archive) |
| Vitiello et al. (2013) | Source (archive) |
| Wikipedia, X | Source (archive) |
| Wood & McCall (2013) | Source (archive) |
| Wu et al. (2006) | Source (archive) |
| Zacharie et al. (2015) | Source (archive) |
Footnotes