Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning

Erin M. Buchanan

Harrisburg University

Power and Sample Size Planning

  • Sample Size Planning: New Tools and Innovations
    • Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning, Erin M. Buchanan
    • Power Analyses for Interaction Effects in Observational Studies, David A. Baranger
    • Empowering Sample Size Justification with the Superpower R Package, Aaron Caldwell

Power and Sample Size Planning

A Blender Mix

  • Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning
  • How we took a bunch of interesting ideas and mixed them together

Sample Size Planning

  • Sample size planning is often thought of as “point and click”
    • G*Power: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
    • https://jakewestfall.shinyapps.io/pangea/
    • https://pwrss.shinyapps.io/index/
    • https://designingexperiments.com/
  • Sample size planning is technically a closed-form solution for many analysis plans
  • An incredible number of cool R packages for sample size planning, such as pwr
  • So, why do we need new innovations for power?

The Need

  • TOPS movement + pre-registration + grants + registered reports = need for power analyses
  • Power analyses are just our best guesses and are likely wrong
  • Many Analyst studies show us that there is no design = analysis answer
  • The smallest effect of interest may be unknown
  • Some research papers do not have one specific hypothesis (i.e., dataset creations)
  • Once you leave the t-test behind, power becomes more complicated and often based on simulation

Our Use Case

  • Research studies that use many items to assess the parameter of interest
  • Research studies designed to collect data on many items and share the data
  • We should be careful not to assume all items are equal …
  • And move away from using item-level averages as parameters of interest

Combining Toolkits

  • Accuracy in Parameter Estimation: finding the sample size that allows for “accurately measured” parameters
    • Determine a “sufficiently narrow” confidence interval around your parameter
    • Determine the sample size that should provide that CI
  • Bootstrapping (sort of) and Simulation
    • Taking pilot data and simulating various sample sizes based on bootstrapping your sample
    • Use this technique to find the sample size for a “sufficiently narrow” CI for items

Sequential Testing

  • Sequential testing: examine the parameter of interest for the intended CI
    • After each participant
    • At regular intervals during data collection
  • Benefits:
    • Maximizing the usefulness of data collection
  • Cons:
    • Usually requires code based skill sets

Proposed Method

Proposed Procedure for Powering Studies with Multiple Items
Step Proposed Steps
1 Use representative pilot data.
2 Calculate standard error of each of the items in the pilot data. Using the 40%, determine the cutoff and stopping rule for the standard error of the items.
3 Create bootstrapped samples of your pilot data starting with at least 20 participants up to a maximum number of participants.
4 Calculate the standard error of each of the items in the bootstrapped data. From these scores, calculate the percent of items below the cutoff score from Step 2.
5 Determine the sample size at which 80%, 85%, 90%, 95% of items are below the cutoff score. Use the correction formula to adjust your proposed sample size based on pilot data size, power, and percent variability.
6 Report all values. Designate one as the minimum sample size, the cutoff score as the stopping rule for adaptive designs, and the maximum sample size.

Package

  • Upcoming package semanticprimeR as part of a larger project
  • devtools::install_github("SemanticPriming/semanticprimeR")
  • Functions for each step of the proposed process
  • Functionality for when you have example data and when you do not (i.e., simulate example multiple-item data)
  • As part of the manuscript and semanticprimeR package, we provide 12+ examples online
  • Psycholinguistics, social psychology, COVID related, traditional cognitive psychology

Example: Step 1 (Pilot Sample)

  • You want to run a lexical decision project measuring response latencies for concrete and abstract words
  • You can use the English Lexicon Project as pilot data + previous publications of concreteness ratings
  • In these studies, we also have to factor in data loss!
    • Combined data includes 27031 real words filtered down to 40 selected stimuli
    • Average sample size per word: 32.67 (SD = 0.53)
    • Pilot sample size: n = 33

Example: Step 2 (Calculate Cutoff)

library(semanticprimeR)
cutoff <- calculate_cutoff(population = elp_use, # pilot data or simulated data
  grouping_items = "Stimulus", # name of the item indicator column
  score = "RT", # name of the dependent variable column
  minimum = min(elp_use$RT), # minimum possible/found score
  maximum = max(elp_use$RT)) # maximum possible/found score

Example: Step 2 (Calculate Cutoff)

cutoff$se_items # all standard errors of items
 [1]  56.83131  58.59754  38.40305  69.22966  76.96831  51.80277  89.16515
 [8]  55.81059  36.93046  80.47134  42.17122  17.72957  39.32024  46.65783
[15]  72.07065 248.68735  93.89229  89.69502  46.02416  87.82424 140.39440
[22]  24.65804  45.83884  51.05279  36.09320  56.19962  79.21760  41.87754
[29]  59.16929  32.45934  62.30085  21.44458  30.91690  37.13134  55.69565
[36]  39.11986  66.73485  77.64671  34.97541 208.74359
cutoff$sd_items # standard deviation of the standard errors
[1] 45.19364
cutoff$cutoff # 40% decile score
     40% 
46.40436 
cutoff$prop_var # proportion of possible variance 
[1] 0.02466902

Example: Step 3 (BootSim Samples)

samples <- bootstrap_samples(start = 20, # starting sample size
  stop = 100, # stopping sample size
  increase = 5, # increase bootstrapped samples by this amount
  population = elp_use, # population or pilot data
  replace = TRUE, # bootstrap with replacement? 
  nsim = 500, # number of simulations to run
  grouping_items = "Stimulus") # item column label  

head(samples[[1]])
# A tibble: 6 × 6
# Groups:   Stimulus [1]
  Trial  Type Accuracy    RT Stimulus  Participant   
  <int> <int>    <int> <int> <chr>     <chr>         
1  1521     1        1   563 admirable participant629
2  2512     1        1   692 admirable participant63 
3  3078     1        1   781 admirable participant102
4  2354     1        1   635 admirable participant39 
5   634     1        1   463 admirable participant344
6  2274     1        1   729 admirable participant404

Example: Step 4-5 (Calculate Proportion)

proportion_summary <- calculate_proportion(samples = samples, # samples list
  cutoff = cutoff$cutoff, # cut off score 
  grouping_items = "Stimulus", # item column name
  score = "RT") # dependent variable column name 

head(proportion_summary)
# A tibble: 6 × 2
  sample_size percent_below
        <dbl>         <dbl>
1          20         0.35 
2          25         0.375
3          30         0.425
4          35         0.5  
5          40         0.575
6          45         0.7  

Example: Step 6 (Apply Correction)

corrected_summary <- calculate_correction(
  proportion_summary = proportion_summary, # prop from above
  pilot_sample_size = pilot_size_e, # number of participants in the pilot data 
  proportion_variability = cutoff$prop_var, # proportion variance from cutoff scores
  power_levels = c(80, 85, 90, 95)) # what levels of power to calculate 

corrected_summary
# A tibble: 3 × 3
  percent_below sample_size corrected_sample_size
          <dbl>       <dbl>                 <dbl>
1          82.5          80                  74.1
2          90            90                  82.3
3          90            90                  82.3

Last Thoughts

  • Use case: multiple items that intend on using item level focused analyses
  • Should simulate only what is expected for a participant to do in the study
    • Large numbers of items may bias estimates
  • Could combine with “traditional” power
  • Provides “well-measured” data –> not a specific decision for a specific sample

Thanks

  • Thanks for listening!
  • Reproducible manuscript: https://github.com/SemanticPriming/stimuli-power
  • Package: https://github.com/SemanticPriming/semanticprimeR
  • Scan me for a copy of this talk with links:

Simulation Method

  • To evaluate our approach, we ran a simulation study:
    • Scale size: popular cognitive scales (1-7 measurements, 0-100 percentage measurements, and 0-3000 response latency type scale data)
    • Item heterogeneity: small, medium, large
    • Skew: normal distributions versus skewed (ceiling) distributions
    • Pilot sample size: 20 to 100 increasing in units of 10
  • 1,620,000 simulations of 3 X 3 X 2 X 9 design
Parameter Values for Data Simulation
Information Likert Percent Milliseconds
Minimum 1.00 0 0
Maximum 7.00 100 3000
Mu 4.00 50 1000
Skewed Mu 6.00 85 2500
Sigma Mu 0.25 10 150
Sigma 2.00 25 400
Small Sigma Sigma 0.20 4 50
Medium Sigma Sigma 0.40 8 100
Large Sigma Sigma 0.80 16 200

Simulation Results: Scale Size

Simulation Results: Skew

Simulation Results: Item Heterogeneity

Dealing with Pilot Sample Size

  • At some point, power usually asymptotes with increasing sample size
  • So, we need a correction:

\[ 1 - \sqrt{\frac{N_{Pilot} - min(N_{Simulation})}{N_{Pilot}}}^{log_2(N_{Pilot})}\]

Dealing with Pilot Sample Size

Researchers Have One Sample

  • Long story short: we can provide a function for researchers to use to control pilot sample size
  • We also determined which level “sufficiently small” was probably best
Parameters for 40% Decile Cutoff Scores
Term Estimate SE t p
Intercept 206.589 128.861 1.603 .109
Projected Sample Size 0.368 0.005 71.269 < .001
Pilot Sample Size -0.770 0.013 -59.393 < .001
Log2 Projected Sample Size 27.541 0.552 49.883 < .001
Log2 Pilot Sample Size 2.583 0.547 4.725 < .001
Log2 Power -66.151 25.760 -2.568 .010
Proportion Variability 16.405 6.005 2.732 .006
Log2 Proportion Variability -1.367 0.382 -3.577 < .001
Power 1.088 0.426 2.552 .011