Reducing Blood Pressure using Deep Breathing: No Observable Effect with Additional Data

Last week, I reported my first attempt at reducing blood pressure using a deep breathing protocol. I saw a drop in blood pressure with deep breathing, but due to the high variance in the measurements, I couldn’t tell if this was a real effect or just due to chance.

Based on my repeatability study, I’ve repeated the experiment, this time measuring my blood pressure 5 times for each observation. Here’s the result:

Summary

  • Background:
    • Numerous studies, reviews, and meta-analyses have shown deep breathing to lower blood pressure in both the short and long-term (example 1example 2).
    • Effect sizes are moderate (3-5 mmHg) and statistically significant for large patient populations (>10,000 patients in some studies).
    • Numerous breathing protocols have been tested, with varying results.
    • My own tests suggested a possible effect: first, second.
  • Approach:
    • Blood pressure and pulse were measured each morning before and after the following protocols:
      • 8s inhale, 8s exhale, 5 min.
      • Normal activity, 5 min.
    • For each measurement, I took 5 readings and averaged the results.
    • Protocols were alternated by day for 10 days (5 days each protocol).
    • Average and 95% confidence intervals were compared for each metric & protocol.
  • Results & Conclusions:
    • With additional, lower variance measurements, I did not observe a meaningful drop in blood pressure or pulse. For all metrics, the difference between deep breathing and normal activity overlapped zero effect and was lower than my target for “clinical” significance.
    • While the variance is still too large to rule out a clinically significant effect size, it’s sufficiently unlikely that I’m not going to continue testing the short term effect of deep breathing.
  • Next Steps:
    • Retrospective analysis of self tracking data
      • I’ve finished the analysis and just need to write it up for posting.
      • There were no effects that were practically meaningful and statistically significant, but a few things were worth keeping an eye on.
    • Inspiratory muscle training:
      • On my last post u/OrganicTransistor suggested trying strengthening my respiratory muscles based on the results in this paper.
      • I’m going to replicate their protocol as best I can (pre-registration to follow in another post).
      • This study will take six weeks, but I will do an interim analysis every two weeks.

Details

Purpose

  • To determine the effect of deep breathing protocols on short-term blood pressure.

Background

For additional background, see previous post.

  • Numerous studies, reviews, and meta-analyses have shown deep breathing to lower blood pressure in both the short and long-term (example 1example 2).
  • Effect sizes are moderate (3-5 mmHg) and statistically significant for large patient populations (>10,000 patients in some studies).
  • Numerous breathing protocols have been tested, with varying results.
  • My own tests suggested a possible effect: first, second.

Results & Discussion

First, let’s take a look at the change in blood pressure for each protocol (deep breathing & normal activity). As shown in both the table and graphs above, on average:

  • Systolic pressure dropped for both deep breathing and normal activity.
    • In both cases, the magnitude was modest, 2.0 & 1.5 mmHg for deep breathing and normal activity, respectively.
    • Since I took these measurements ~1h after waking up, this drop is presumably related to my morning routine in some way (e.g. dissipation of the initial stress from waking up, relaxing during morning computer work, etc.)
  • Diastolic pressure was nearly unchanged with deep breathing (0.1 mmHg drop), but showed a modest drop for normal activity (1.2 mmHg)
  • Pulse increased during deep breathing (1.3 bpm) and stayed the same during normal activity (0.1 bpm increase).
  • Since I took these measurements ~1h after waking up, these effects, if real, are presumably related to my morning routine in some way (e.g. dissipation of the initial stress from waking up, relaxing during morning computer work, etc.)
  • Several of these effects are different than my previous observations. Notably:
    • I saw a drop in systolic and diastolic blood pressure in the normal activity condition vs. no change or increase previously.
    • I saw an increase in pulse in the normal activity condition vs. a decrease previously.
    • In no case was the difference outside of what would be expected due to the high variance in the previous experiments. As such, the differences are likely due to chance.
    • Given the much lower variance in the current experiment (5 measurements per condition vs. 1) I have a lot more confidence in the current conclusions.

Looking at the difference between means (deep breathing – normal activity) for each metric, I see a decrease of only 0.5 mmHg for systolic pressure, an increase of 1.1 mmHg for diastolic pressure, and an increase of 1.4 bpm for pulse. In all cases, the 95% CI for the difference of means overlaps zero.

Since the measured effects are below my target for “clinical” significance and have a low probability of reaching the target with larger a sample size, it looks like deep breathing doesn’t meaningfully lower my blood pressure.

As mentioned in the background section, there are numerous published studies showing moderate effect sizes (3-5 mmHg) and statistically significant blood pressure drop during deep breathing for large patient populations. While my experiments indicate that this doesn’t work for me, it doesn’t mean the literature is mistaken. Some hypotheses:

  • Most literature experiments were done in a clinical environment during the day. Due to the environment, the patients might have been more stressed, which can cause an increase in blood pressure and be mitigated by the deep breathing.
  • My baseline stress may be lower than average and therefore methods to reduce stress (e.g. deep breathing) have a reduced effect on me.
  • I breath more deeply during normal activity than average.
  • Other natural person to person variation
    • This is obviously a catch-all, but in the published studies, it was not the case that every patient showed a drop in blood pressure, just that there was a drop on average.

Conclusions & Next Experiments

It looks like deep breathing doesn’t meaningfully lower my blood pressure. The measured effects are below my target for “clinical” significance and have a low probability of reaching the target with larger a sample size.

Given that I’m not going to continue testing the short term effect of deep breathing on blood pressure. For my next experiments, I’m going to look at the following:

  • Retrospective analysis of self tracking data
    • I’ve finished the analysis and just need to write it up for posting. There were no effects that were practically meaningful and statistically significant, but a few things were worth keeping an eye on.
  • Inspiratory muscle training:
    • On my last post u/OrganicTransistor suggested trying strengthening my respiratory muscles based on the results in this paper.I’m going to replicate their protocol as best I can (pre-registration to follow in another post).
    • This study will take six weeks, but I will do an interim analysis every two weeks.
  • Increasing my Potassium:Sodium ratio
    • Still figuring out how to test this in a rigorous way. Will pre-register as soon as I work it out.


– QD


Methods

Pre-registration

Here & here.

Differences from original pre-registration:

  • Instead of using students t-test, I compared 95% confidence intervals between conditions (mathematically equivalent for a threshold of p = 0.05)


Blinding

This experiment was not blinded


Procedure

  • Each morning at ~6am, I measured my blood pressure before and after the following protocols:
    • 8s inhale, 8s exhale, 5 min.
    • Normal activity, 5 min.
  • Breath timing was controlled using the iBreath app.
  • Blood pressure measurements were performed using an Omron Evolve blood pressure meter.
    • For each measurement, I placed the meter on my left arm, ~4 cm above my elbow. Measurements were taken seated, with my feet on the ground and arms resting on a flat surface at a comfortable height (same every time).
    • 5 measurements were taking with no pause in-between measurements (other than to write down the result) and the average of the 5 measurements was used.


Data Visualization

Data was visualized using Tableau.


Data


Get new posts by email or rss feed

Reducing Blood Pressure using Deep Breathing: No Statistically Significant Effect Observed in First Attempt (Data too noisy)

Get new posts by email or rss feed


For my studies to determine ways to reduce my blood pressure, the first intervention I’m testing is deep breathing protocols. Here I report my initial results:

Summary

  • Background:
  • Approach:
    • Blood pressure and pulse were measured each morning before and after the following protocols:
      • 8s inhale, 8s exhale, 5 min.
      • Normal activity, 5 min.
      • 8s inhale, 8s exhale, 15 min.
      • Normal activity, 15 min.
    • Each protocol/time combination was measured 5 times.
    • Average and 95% confidence intervals were compared for each metric & protocol.
  • Results & Conclusions:
    • For each time condition, a blood pressure drop was observed on average during deep breathing, while an increase was observed during normal activity. The opposite effect was observed for pulse (increased during deep breathing).
    • Due to the high variance in the measurements, the 95% confidence interval for the difference overlaps zero, so the results are not statistically significant and could easily be due to chance.
  • Next Steps:
    • I will repeat the experiments, but measure blood pressure 5 times for each observation, increasing measurement precision.
    • For these experiments, I will test only 5 min. deep breathing and normal activity, but run 10 trials of each, with an interim analysis at 5 trials each.

Details

Purpose

  • To determine the effect of deep breathing protocols on short-term blood pressure.

Background

For additional background, see previous post.


Results & Discussion

Caveat

All of these experiments were done before I tested the repeatability of my blood pressure meter and I only took one measurement per observation (i.e. one measurement before and one after each period). This was a big mistake on my part, as the variance between measurements was way to high and no results are statistically significant (i.e. could easily be due to chance).

Given this, please take all discussion/conclusions presented here as only suggestive for further experiments. I will be repeating this work with 5 measurements per observation.

Blood Pressure & Pulse Change during the Interventions

First, let’s take a look at the change in blood pressure during each session. As shown in both the table and graphs above, on average:

  • Systolic pressure dropped in both the 5 & 15 min. deep breathing conditions, while it increased during normal activity.
  • Diastolic pressure dropped in the 5 min. deep breathing condition, increased during 15 min., and increased in both times for normal activity
  • Pulse increased during 5 min. deep breathing, dropped during 15 min., and dropped in both times for normal activity

As discussed above, the 95% confidence interval overlaps zero for all of these measurements, so the results could easily be due to chance. However, they are consistent with my initial exploratory measurements.

Looking at the difference between means for each time condition, I see a drop of ~2.5 mmHg for systolic pressure, ~2 mmHg for diastolic, and an increase of ~2 bpm for pulse for deep breathing vs. normal activity. Again, 95% CI overlaps zero for all conditions, but the effect size is on the edge of worthwhile (I had pre-registered that I would follow up on effect sizes >3 mmHg).


Conclusions & Next Experiments

Given the high observed variance, I am going to repeat these experiments with my new measurement protocol (5 measurements/observation). I have already started these experiments and should have them completed in ~2 weeks. In the meantime, I will finish up analyzing my historical data and report that out next week.


– QD


Methods

Pre-registration

Here.

Differences from original pre-registration:

  • Instead of using students t-test, I compared 95% confidence intervals between conditions (mathematically equivalent for a threshold of p = 0.05)


Blinding

This experiment was not blinded


Procedure

  • Each morning at ~6am, I measured my blood pressure before and after the following protocols:
    • 8s inhale, 8s exhale, 5 min.
    • Normal activity, 5 min.
    • 8s inhale, 8s exhale, 15 min.
    • Normal activity, 15 min.
  • Breath timing was controlled using the iBreath app.
  • Blood pressure measurements were performed using an Omron Evolve blood pressure meter.
    • For each measurement, I placed the meter on my left arm, ~4 cm above my elbow. Measurements were taken seated, with my feet on the ground and arms resting on a flat surface at a comfortable height (same every time).


Data Visualization

Data was visualized using Tableau.


Data


Get new posts by email or rss feed

Reducing Blood Pressure without Medication Phase 0: Measurement Repeatability & Reproducibility

Get new posts by email or rss feed


For my studies to determine interventions to reduce my blood pressure, the main measurement device I’ll be using will be an Omron Evolve blood pressure monitor. In order to understand the measurements I make, I’m going to need to know the repeatability and reproducibility of the device as well as any systematic biases in the measurements. Notably, some people see blood pressure readings drop with repeat measurements (see comment from Gary Wolf here) and I need to know the magnitude of the effect for any paired sample testing over short time periods.

To measure the repeatability and reproducibility of my Omron Evolve blood pressure meters, I tested (details in below):

  • Repeatability: 19 sets of 5 measurements on the same meter
  • Reproducibility: 56 paired measurements on two different meters (one immediately following the other)

Here’s what I found.

Summary

  • Experiments:
    • Repeatability: 19 sets of 5 measurements on the same meter
    • Reproducibility: 56 paired measurements on two different meters (one immediately following the other)
  • Results:
    • Within meter standard deviation was ~3 mmHg, which is high compared to my target reduction of 10 mmHg.
    • I see a drop in blood pressure with repeat readings, but it’s relatively small (~0.5-1 mmHg/measurement over 5 measurements), and safe to ignore.
    • There’s no detectable difference between my two meters. Since the older one has been used for ~4 months, that indicates that there’s likely no change in the meter over time.
  • Conclusions:
    • Given the high variance vs. my target change in blood pressure, going forward I will take sets of 5 measurements for every observation.
    • This gives an estimated 95% CI of 2.6 mmHg systolic. Still higher than I’d like, but it should allow me to identify reasonable effect sizes (I’ll, of course, need to do power calculations for each planned experiment).

Details

Purpose

  • To determine the repeatability & reproducibility of blood pressure measurements using my Omron Evolve blood pressure meters.
  • To quantify the drop in blood pressure with repeat measurements at the same sitting.

Background

See previous post.


Results & Discussion

Within-meter Repeatability

First, let’s take a look at the within meter precision. The pooled standard deviation over 19 sets of 5 measurements was 2.5-3.5 mmHg (95% CI) for systolic and a bit lower for diastolic. This means that for a single-point measurement, I’d have a 95% confidence interval of ~6 mmHg, larger than most effect sizes seen for BP interventions and half the reduction I need to get to normal blood pressure.

To quantify the drop in blood pressure with repeat measurement, I looked at both the initial drop (1st – 2nd measurement) and the slope over all 5 measurements. I observed a drop for systolic and diastolic pressure in both cases. Only the diastolic slope was statistically significant (95% CI does not overlap 0), but given that I see an effect for all four metrics and of consistent magnitude, the drop is likely real. That said, the drop is only ~0.5-1 mmHg/measurement, small enough to safely ignore for most experiments I plan to do.


Between-meter Reproducibility

Next, let’s look at the variation between meters. For this experiment, I used an older meter that I’ve been using daily for ~4 months and compared it to a newer meter of the same make/model that I bought when I mistakenly thought I had lost the original.

For the 56 paired reproducibility measurements, I alternated which meter I used first, giving me another data set to test for a drop in reading with repeat measurements. In this case, I saw a drop with diastolic pressure, 1.4 mmHg [0.4, 2.4 95% CI], but not systolic pressure, -0.3 [-1.4, 0.8 95% CI]. However, the confidence intervals are consistent with the previous measurements, again indicating the effect is likely real.

Comparing the two meters, there’s no measurable difference. Average difference is <0.3 mmHg with 95% confidence intervals comfortably overlapping zero. Since the older one has been used for ~4 months, that also indicates that there’s likely no change in the meter over time.


Conclusions & Next Experiments

Given the high observed variance, going forward I will start measuring sets of 5 repeat measurements for each observation. This gives an estimated 95% CI of 2.6 mmHg systolic. Still higher than I’d like, but it should allow me to identify reasonable effect sizes (I’ll, of course, need to do power calculations for each planned experiment).

Unfortunately, I’ve already finished my initial testing of deep breathing protocols using only single-point measurements. I’ll go ahead and analyze that data, but if the results are inconclusive, I will repeat the experiment with this new protocol.


– QD


Methods

Pre-registration

This experiment was not pre-registered.


Blinding

This experiment was not blinded


Procedure

  • General:
    • Blood pressure measurements we performed using an Omron Evolve blood pressure meter.
    • For each measurement, I placed the meter on my left arm, ~4 cm above my elbow. Measurements were taken seated, with my feet on the ground and arms resting on a flat surface at a comfortable height (same every time).
  • Repeatability
    • For 8 days, whenever I measured my blood pressure, I would repeat the measurement 5 times, with no breaks in between measurements.
  • Reproducibility
    • For 14 days, whenever I measured my blood pressure, I would repeat the measurement twice, once with each of two meters.


Data Visualization

Data was visualized using Tableau.


Data


Get new posts by email or rss feed

Blood Glucose Testing of Whole Foods: Initial Results & Request for Suggestions

Get new posts by email or rss feed


This post is an update on my experiments measuring the effect of low-carb foods and dietary supplements on blood sugar.

I’m still working my way through whole foods, but it’s going to take a while to get through them all.

In the meantime, I wanted to share my preliminary results and see if anyone has suggestions/requests for what I should include.

If you have any whole foods you like or would like to see tested, please post it in the comments or send me a PM.


Testing Queue:


Whole Foods

For the last several months I’ve been testing the blood glucose impact of tons of different low-carb prepared foods and ingredients. While those tests have been very informative and uncovered a number of surprises (especially around what fibers do/don’t impact my blood glucose), most of what I eat is food I prepare myself using regular meats, vegetables, nuts, and seeds.

Given that I wanted to test the blood glucose impact of regular foods and see how it compares to the macronutrients (total carbs, net carbs, protein, etc.). Towards that end, I’m going to test as many low-carb foods as I can, then see if I can determine any consistent trends.

So far, I’ve tested 15 foods from 4 categories:

The initial results have been pretty interesting. Here are the key insights:

  • All foods tested so far we very low BG impact, so the nutrition labels must be accurate and all of the fibers must be relatively indigestible.
  • The vegetables were the lowest impact per gram, largely due to being such a high percentage water. I was really shocked by how much I could eat (250g mushrooms, 434g celery).
    • If you look at BG impact per calorie, of course, then trend flips around with meat, fish, and nuts having much lower impact than vegetables.
  • I was also pleasantly surprised by how much I could eat of the lowest carb fruits. Raspberries, blackberries, and strawberries were pretty similar to meats on a per gram basis (though not per calorie). I think I’ll start trying adding some in to recipes in small quantities.
  • The zero carb foods (lupini, sacha inchi, salmon, tuna, pork cracklings) still had a noticeable BG impact, presumably coming from the protein content. Once I have more data, I’ll try to fit a model for BG impact as a function of carbs, protein, and fat. It will be interesting to see if there are any interaction effects.

As mentioned above, there’s some many different foods to test, it’s going to take me a while to get a comprehensive set tested. Once I do, I’ll post a full update with a more detail analysis.

In the meantime, since I’ve gotten such great recommendations from the readers, I wanted to solicit suggestions for additional foods to add to this study.

If you have any whole foods you like or would like to see tested, please post it in the comments or send me a PM.

I’ll test all the requests over the next couple weeks and post the results.


– QD


Get new posts by email or rss feed

Using Chess Puzzles to Assess Cognition: Exploratory Analysis of CO2 and other Mediators Shows Suggestive, but not Conclusive Effects

Get new posts by email or rss feed


About three months ago, Scott Alexander from Astral Codex Ten, posted an observational study looking at his performance on WordTwist as a function of CO2 level. In a dataset of ~800 games, he saw no correlation between his relative performance (vs. all players) and CO2 levels (R = 0.001, p = 0.97).

This was in stark contrast to a study by Fisk and co-workers, that found that CO2 levels of 1,000 and 2,500 ppm significantly reduced cognitive performance across a broad range of tasks.

I was really interested to see this. Back in 2014, I started a company, Mosaic Materials, to commercialize a CO2 capture material. At the time, a lot of people I talked with were excited about this study, but I was always really suspicious of the effect size. Since then, studies have come out that both did and did not observe this effect, though the lack of greater follow up further increased my skepticism.

In addition to being curious regarding the effect of CO2 on cognition, I found the idea of using simple, fun games to study cognitive effects to be extremely interesting. Since even small cognitive effects would be extremely important/valuable, a quick, fun to use test like WordTwist would allow for the required large dataset.

I don’t enjoy word games, but Scott pointed to a post on LessWrong by KPier that suggested using Chess, which I play regularly. Actual games seemed too high variance and time consuming, but puzzles seemed like a good choice.

Based on all that, I got a CO2 meter and started doing 10 chess puzzles every morning when I woke up, recording the CO2 level in addition to all my standard metrics. So far, I have ~100 data points, so I did an interim analysis to see if I could detect any significant correlations.

Here’s a summary of what I found:

  • Chess puzzles are a low effort (for me), but high variance and streaky measure of cognitive performance
    • Note: I didn’t test whether performance on chess puzzles generalizes to other cognitive tasks
  • No statistically significant effects were observed, but I saw modest effect sizes and p-values for:
    • CO2 Levels >600 ppm:
      • R2 = 0.14
      • p = 0.067
    • Coefficient of Variation in blood glucose
      • R2 = 0.079
      • p = 0.16
  • The current sample size is underpowered to detect the effects I’m looking for. I likely need 3-4x as much data to reliably detect the effect sizes I’m looking for.
  • Given how many correlations I looked at, the lack of pre-registration of analyses, and the small number of data points, these effects are likely due to chance/noise in the data, but they’re suggestive enough for me to continue the study.


Next Steps

  • Continue the study with the same protocol. Analyze the data again in another 3 months.


Questions/Requests for assistance:

  • My variation in rating has long stretches of better or worse than average performance that seem unlikely to be due to chance. Does anyone know of a way to test if this is the case?
  • Any statisticians interested in taking a deeper/more rigorous look at my data or have advice on how I should do so?
  • Any suggestions on other quick cognitive assessments that would be less noisy?


– QD


Details

Purpose

  • To determine if any of the metrics I track correlates with chess puzzle performance.
  • To assess the usefulness of Chess puzzles as a cognitive assessment.

Background

About three months ago, Scott Alexander from Astral Codex Ten, posted an observational study looking at his performance on WordTwist as a function of CO2 level. In a dataset of ~800 games, he saw no correlation between his relative performance (vs. all players) and CO2 levels (R = 0.001, p = 0.97).

This was in stark contrast to a study by Fisk and co-workers, that found that CO2 levels of 1,000 and 2,500 ppm significantly reduced cognitive performance across a broad range of tasks.

I was really interested to see this. Back in 2014, I started a company, Mosaic Materials, to commercialize a CO2 capture material. At the time, a lot of people I talked with were excited about this study, but I was always really suspicious of the effect size. Since then, studies have come out that both did and did not observe this effect, though the lack of greater follow up further increased my skepticism.

In addition to being curious regarding the effect of CO2 on cognition, I found the idea of using simple, fun games to study cognitive effects to be extremely interesting. Since even small cognitive effects would be extremely important/valuable, a quick, fun to use test like WordTwist would allow for the required large dataset.

I don’t enjoy word games, but Scott pointed to a post on LessWrong by KPier that suggested using Chess, which I play regularly. Actual games seemed too high variance and time consuming, but puzzles seemed like a good choice.

Based on all that, I got a CO2 meter and started doing 10 chess puzzles every morning when I woke up, recording the CO2 level in addition to all my standard metrics. So far, I have ~100 data points, so I did an interim analysis to see if I could detect any significant correlations.


Results & Discussion

Performance vs. Time

Before checking for correlations, I first looked at my puzzle performance over time. As shown above (top left), over the course of this study, my rating improved from 1873 to 2085, a substantial practice effect. To correct for this, all further analyses were done using the daily change in rating.

Looking at the daily change, we see a huge variation:

  • Average = 3
  • 1σ = 29

Moreover, the variation is clearly not random, with long stretches of better or worse than average performance that seem unlikely to occur be chance (does anyone know how to test for this?).

All this points to Chess puzzles not being a great metric for cognitive performance (high variance, streaky), but I enjoy it and therefore am willing to do it long-term, which is a big plus.


CO2 Levels

During the course of this study, CO2 levels varied from 414 to 979 ppm. Anecdotally, this seemed to be driven largely by how many windows were open in my house, which was affected by the outside temperature. Before October, when was relatively warm and we kept the windows open, CO2 levels were almost exclusively <550 ppm. After that, it got colder and we tended to keep the windows closed, leading to much higher and more varied CO2 levels.

Unfortunately for the study, the CO2 levels I measured were much lower than those seen by Scott Alexander and tested by the Fisk and co-workers. In particular, Fisk and co-workers only compared levels of 600 ppm to 1000 & 2500 ppm.

Given this difference in the data, I performed a regression analysis on both my full dataset and the subset of data with CO2 > 600 ppm. The results are shown below:

For the full dataset, I see a small effect size (R2 = 0.02) with p=0.19. Restricting to only CO2 > 600 ppm, the effect size is much larger (R2 = 0.14), with p = 0.067. Given how many comparisons I’m making, the lack of pre-registration of the CO2 > 600 ppm filter, and the small number of data points (only 27 samples with CO2 > 600), this is likely due to chance/noise in the data, but it’s suggestive enough for me to continue the experiment. We’ve got a few more months of cold weather, so I should be able to collect a decent number of samples with higher CO2 values.


Sleep

Since I had all this puzzle data, I decided to check for correlations with all the other metrics I track. Intuitively, sleep seemed like it would have a large effect of cognitive ability, but the data shows otherwise. Looking at time asleep from both my Apple Watch and manual recording, I see low R2 (0.035 & 0.01) with p-values of 0.10 and 0.34, respectively. Moreover, the trend is in the opposite direction as expected, with performance getting worse with increasing sleep.

I was surprised not to see an effect here. It’s possible this is due to the lack of reliability in my measurement of sleep. Neither the Apple Watch or manual recording are particularly accurate, which may obscure smaller effects. I have ordered an Oura Ring 3, which is supposed to be much more accurate. I’ll see if I can measure an effect with that.

The other possibility is that since I’m doing the puzzles first thing in the morning, when I’m most rested, sleep doesn’t have as strong an effect. I could test this by also doing puzzles in the evening, but not sure whether I’m up for that…


Blood Pressure & Pulse

Not much to say for blood pressure. R2 was extremely small and p-values were extremely high for all metrics. Clearly no effect of a meaningful magnitude.


Blood Glucose

With the exception of coefficient of variation, no sign of an impact of blood glucose on puzzle performance (low R2, high p-value). For coefficient of variation, there was only a modest R2 of 0.079 and a p-value of 0.16. Still likely to be chance, especially with the number of comparisons I’m doing, but worth keeping an eye on as I collect more data.

Similar to sleep, I was surprised not to see an effect here. Low blood glucose is widely reported to impair cognitive performance, every doctor I’ve been to since getting diabetes has commented on low blood sugar impairing cognitive performance, and subjectively I feel as though I’m thinking less clearly when my blood sugar is outside my normal range and am worn out by it for a while after the fact.

All that said, as mentioned in the section on sleep, doing the puzzles first thing in the morning, when I’m most rested, might be masking the effect. The only way I can think to test this is to do puzzles in the evening, but that’s much less convenient.


Power Analysis

One concern with all of these analysis is whether the study had sufficient power to detect an effect. To check this, I looked at the statistical power at the sample and effect sizes that were seen.

For sample size, there were 100 total samples, 88 with CO2 measurements and 27 with CO2 levels >600 ppm. With 88 samples, there was a ~90% chance of detecting an R2 of 0.1, but this dropped to only ~40% with 27 samples. Given that R2 = 0.1 would be a practically meaningful effect size for the impact of natural variation in room atmosphere on cognitive ability, this indicates that it’s not surprising that the CO2 analyses did not reach statistical significance and that substantially more data is needed to rule out an effect.

In terms of detectable effect sizes, 88 samples gives a pretty good chance of detecting R2 = 0.1 (~90%), but the power drops rapidly below that, with an R2 of 0.025 having a power of only ~35%. Again, given the practical importance of cognitive performance, I’m interested in detecting small effect sizes, so it seems worthwhile to collect more data, especially as I enjoy the chess puzzles and am already collecting all the other metrics.


Conclusions & Next Experiments

Conclusions

  • Chess puzzles are a low effort (for me), but high variance and streaky measure of cognitive performance
    • Note: I didn’t test whether performance on chess puzzles generalizes to other cognitive tasks
  • No statistically significant effects were observed, but I saw modest effect sizes and p-values for:
    • CO2 Levels >600 ppm:
      • R2 = 0.14
      • p = 0.067
    • Coefficient of Variation in blood glucose
      • R2 = 0.079
      • p = 0.16
  • The current sample size is underpowered to detect the effects I’m looking for. I likely need 3-4x as much data to reliably detect the effect sizes I’m looking for.
  • Given how many correlations I looked at, the lack of pre-registration of analyses, and the small number of data points, these effects are likely due to chance/noise in the data, but they’re suggestive enough for me to continue the study.


Next Steps

  • Continue the study with the same protocol. Analyze the data again in another 3 months.


Questions/Requests for assistance:

  • My variation in rating has long stretches of better or worse than average performance that seem unlikely to be due to chance. Does anyone know of a way to test if this is the case?
  • Any statisticians interested in taking a deeper/more rigorous look at my data or have advice on how I should do so?
  • Any suggestions on other quick cognitive assessments that would be less noisy?


– QD


Methods

Pre-registration

My intention to study the effect of CO2 on cognition was pre-registered in the ACX comment section, but I never ended up pre-registering the exact protocol or analysis.

Differences from the original pre-registration:

  • I only used chess puzzles to assess cognition and did not include working memory or math tests.
  • Evaluated other mediators (blood pressure, blood glucose, and sleep) in addition to CO2 levels.


Procedure

  • Chess puzzles:
    • Each morning, ~15 min. after I woke up, I played 10 puzzles on Chess.com and recorded my final rating.
    • No puzzles were played on Chess.com at any other time, though I occasionally played puzzles on other sites.
  • Manual measurements:
    • Manual recording of sleep, blood pressure, and pulse was performed upon waking, before playing the chess puzzles.
    • CO2 was recorded immediately after completion of the chess puzzles.


Measurements

  • Chess puzzles: Chess.com iPhone app
  • Sleep: Apple watch + Autosleep app
  • Blood glucose: Dexcom G6 CGM
  • Blood pressure & pulse: Omron Evolve 


Analysis & Visualization

  • Sleep and blood glucose data was processed using custom python scripts (sleep, Dexcom)
  • Linear regression was performed using the analysis function in Tableau.
  • Data was visualized using Tableau.
  • Power calculations for linear regression were performed using Statistics Kingdom with alpha=0.05 and digits=10.

Data


Get new posts by email or rss feed