Add DA00 F-max, DA01 ANOVA script, knit Rmds; ignore lock files in .gitignore (#6 )

Reviewed-on: #6 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>
Merge pull request 'eohi3 var creation and recode scripts' (#5 ) from workB into master
2026-02-03 15:19:27 -05:00 · 2026-02-02 10:26:22 -05:00 · 2026-01-28 17:25:10 -05:00 · 2026-01-26 16:30:09 -05:00
50 changed files with 12702 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1 +1,4 @@
 .history/
 eohi3_2.csv
 *~
 .~lock*
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@ -0,0 +1,55 @@
 {
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "R-Debugger",
            "name": "Launch R-Workspace",
            "request": "launch",
            "debugMode": "workspace",
            "workingDirectory": "${workspaceFolder}",
            "splitOverwrittenOutput": true
        },
        {
            "type": "R-Debugger",
            "name": "Debug R-File",
            "request": "launch",
            "debugMode": "file",
            "workingDirectory": "${workspaceFolder}",
            "file": "${file}",
            "splitOverwrittenOutput": true,
            "stopOnEntry": false
        },
        {
            "type": "R-Debugger",
            "name": "Debug R-Function",
            "request": "launch",
            "debugMode": "function",
            "workingDirectory": "${workspaceFolder}",
            "file": "${file}",
            "mainFunction": "main",
            "allowGlobalDebugging": false,
            "splitOverwrittenOutput": true
        },
        {
            "type": "R-Debugger",
            "name": "Debug R-Package",
            "request": "launch",
            "debugMode": "workspace",
            "workingDirectory": "${workspaceFolder}",
            "includePackageScopes": true,
            "loadPackages": [
                "."
            ],
            "splitOverwrittenOutput": true
        },
        {
            "type": "R-Debugger",
            "request": "attach",
            "name": "Attach to R process",
            "splitOverwrittenOutput": true
        }
    ]
 }
--- a/creation.md
+++ b/creation.md
@ -0,0 +1,315 @@
 # Variable Creation Scripts Documentation
 This document describes the data processing scripts used to create derived variables in the EOHI3 dataset. Each script performs specific transformations and should be run in sequence.
 ---
 ## datap 04 - combined vars.r
 ### Goal
 Combine self-perspective and other-perspective variables into single columns. For each row, values exist in either the self-perspective variables OR the other-perspective variables, never both.
 ### Transformations
 #### Past Variables (p5 = past)
 Combines `self[VAL/PERS/PREF]_p5_[string]` and `other[VAL/PERS/PREF]_p5_[string]` into `past_[val/pers/pref]_[string]`.
 **Source Variables:**
 - **Values (VAL)**: `selfVAL_p5_trad`, `otherVAL_p5_trad`, `selfVAL_p5_autonomy`, `otherVAL_p5_autonomy`, `selfVAL_p5_personal`, `otherVAL_p5_personal`, `selfVAL_p5_justice`, `otherVAL_p5_justice`, `selfVAL_p5_close`, `otherVAL_p5_close`, `selfVAL_p5_connect`, `otherVAL_p5_connect`, `selfVAL_p5_dgen`, `otherVAL_p5_dgen`
 - **Personality (PERS)**: `selfPERS_p5_open`, `otherPESR_p5_open` (note: typo in source data), `selfPERS_p5_goal`, `otherPERS_p5_goal`, `selfPERS_p5_social`, `otherPERS_p5_social`, `selfPERS_p5_agree`, `otherPERS_p5_agree`, `selfPERS_p5_stress`, `otherPERS_p5_stress`, `selfPERS_p5_dgen`, `otherPERS_p5_dgen`
 - **Preferences (PREF)**: `selfPREF_p5_hobbies`, `otherPREF_p5_hobbies`, `selfPREF_p5_music`, `otherPREF_p5_music`, `selfPREF_p5_dress`, `otherPREF_p5_dress`, `selfPREF_p5_exer`, `otherPREF_p5_exer`, `selfPREF_p5_food`, `otherPREF_p5_food`, `selfPREF_p5_friends`, `otherPREF_p5_friends`, `selfPREF_p5_dgen`, `otherPREF_p5_dgen`
 **Target Variables:**
 - `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect`, `past_val_DGEN`
 - `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress`, `past_pers_DGEN`
 - `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends`, `past_pref_DGEN`
 #### Future Variables (f5 = future)
 Combines `self[VAL/PERS/PREF]_f5_[string]` and `other[VAL/PERS/PREF]_f5_[string]` into `fut_[val/pers/pref]_[string]`.
 **Source Variables:**
 - **Values (VAL)**: `selfVAL_f5_trad`, `otherVAL_f5_trad`, `selfVAL_f5_autonomy`, `otherVAL_f5_autonomy`, `selfVAL_f5_personal`, `otherVAL_f5_personal`, `selfVAL_f5_justice`, `otherVAL_f5_justice`, `selfVAL_f5_close`, `otherVAL_f5_close`, `selfVAL_f5_connect`, `otherVAL_f5_connect`, `selfVAL_f5_dgen`, `otherVAL_f5_dgen`
 - **Personality (PERS)**: `selfPERS_f5_open`, `otherPERS_f5_open`, `selfPERS_f5_goal`, `otherPERS_f5_goal`, `selfPERS_f5_social`, `otherPERS_f5_social`, `selfPERS_f5_agree`, `otherPERS_f5_agree`, `selfPERS_f5_stress`, `otherPERS_f5_stress`, `selfPERS_f5_dgen`, `otherPERS_f5_dgen`
 - **Preferences (PREF)**: `selfPREF_f5_hobbies`, `otherPREF_f5_hobbies`, `selfPREF_f5_music`, `otherPREF_f5_music`, `selfPREF_f5_dress`, `otherPREF_f5_dress`, `selfPREF_f5_exer`, `otherPREF_f5_exer`, `selfPREF_f5_food`, `otherPREF_f5_food`, `selfPREF_f5_friends`, `otherPREF_f5_friends`, `selfPREF_f5_dgen`, `otherPREF_f5_dgen`
 **Target Variables:**
 - `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect`, `fut_val_DGEN`
 - `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress`, `fut_pers_DGEN`
 - `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends`, `fut_pref_DGEN`
 ### Logic
 - Uses self value if present (not empty/NA), otherwise uses other value
 - If both are empty/NA, result is NA
 - Assumes mutual exclusivity: each row has values in either self OR other, never both
 ### Validation Checks
 1. **Conflict Check**: Verifies no rows have values in both self and other for the same variable
 2. **Coverage Check**: Verifies combined columns have expected number of non-empty values (self_count + other_count = combined_count)
 3. **Sample Row Check**: Shows examples of how values were combined
 ### Output
 - Updates existing target columns in `eohi3.csv`
 - Creates backup `eohi3_2.csv` before processing
 ---
 ## datap 05 - ehi vars.r
 ### Goal
 Calculate EHI (End of History Illusion) variables as the difference between past and future variables. Each EHI variable represents the change from past to future perspective.
 ### Transformations
 **Calculation Formula:** `ehi_[pref/pers/val]_[string] = past_[pref/pers/val]_[string] - fut_[pref/pers/val]_[string]`
 #### EHI Variables Created
 **EHI Preferences:**
 - `ehi_pref_hobbies` = `past_pref_hobbies` - `fut_pref_hobbies`
 - `ehi_pref_music` = `past_pref_music` - `fut_pref_music`
 - `ehi_pref_dress` = `past_pref_dress` - `fut_pref_dress`
 - `ehi_pref_exer` = `past_pref_exer` - `fut_pref_exer`
 - `ehi_pref_food` = `past_pref_food` - `fut_pref_food`
 - `ehi_pref_friends` = `past_pref_friends` - `fut_pref_friends`
 - `ehi_pref_DGEN` = `past_pref_DGEN` - `fut_pref_DGEN`
 **EHI Personality:**
 - `ehi_pers_open` = `past_pers_open` - `fut_pers_open`
 - `ehi_pers_goal` = `past_pers_goal` - `fut_pers_goal`
 - `ehi_pers_social` = `past_pers_social` - `fut_pers_social`
 - `ehi_pers_agree` = `past_pers_agree` - `fut_pers_agree`
 - `ehi_pers_stress` = `past_pers_stress` - `fut_pers_stress`
 - `ehi_pers_DGEN` = `past_pers_DGEN` - `fut_pers_DGEN`
 **EHI Values:**
 - `ehi_val_trad` = `past_val_trad` - `fut_val_trad`
 - `ehi_val_autonomy` = `past_val_autonomy` - `fut_val_autonomy`
 - `ehi_val_personal` = `past_val_personal` - `fut_val_personal`
 - `ehi_val_justice` = `past_val_justice` - `fut_val_justice`
 - `ehi_val_close` = `past_val_close` - `fut_val_close`
 - `ehi_val_connect` = `past_val_connect` - `fut_val_connect`
 - `ehi_val_DGEN` = `past_val_DGEN` - `fut_val_DGEN`
 ### Logic
 - Converts source variables to numeric (handling empty strings and NA)
 - Calculates difference: past - future
 - Result can be positive (past > future), negative (past < future), or zero (past = future)
 ### Validation Checks
 1. **Variable Existence**: Checks that all target variables exist before processing
 2. **Source Variable Check**: Verifies source columns exist
 3. **Random Row Validation**: Checks 5 random rows showing source values, target value, expected calculation, and match status
 ### Output
 - Updates existing target columns in `eohi3.csv`
 - Creates backup `eohi3_2.csv` before processing
 ---
 ## datap 06 - mean vars.r
 ### Goal
 Calculate mean variables for various scales by averaging multiple related variables. Creates both domain-specific means and overall composite means.
 ### Transformations
 #### Domain-Specific Means
 **Past Preferences MEAN:**
 - **Source Variables**: `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends` (6 variables)
 - **Target Variable**: `past_pref_MEAN`
 **Future Preferences MEAN:**
 - **Source Variables**: `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends` (6 variables)
 - **Target Variable**: `fut_pref_MEAN`
 **Past Personality MEAN:**
 - **Source Variables**: `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress` (5 variables)
 - **Target Variable**: `past_pers_MEAN`
 **Future Personality MEAN:**
 - **Source Variables**: `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress` (5 variables)
 - **Target Variable**: `fut_pers_MEAN`
 **Past Values MEAN:**
 - **Source Variables**: `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect` (6 variables)
 - **Target Variable**: `past_val_MEAN`
 **Future Values MEAN:**
 - **Source Variables**: `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect` (6 variables)
 - **Target Variable**: `fut_val_MEAN`
 **EHI Preferences MEAN:**
 - **Source Variables**: `ehi_pref_hobbies`, `ehi_pref_music`, `ehi_pref_dress`, `ehi_pref_exer`, `ehi_pref_food`, `ehi_pref_friends` (6 variables)
 - **Target Variable**: `ehi_pref_MEAN`
 **EHI Personality MEAN:**
 - **Source Variables**: `ehi_pers_open`, `ehi_pers_goal`, `ehi_pers_social`, `ehi_pers_agree`, `ehi_pers_stress` (5 variables)
 - **Target Variable**: `ehi_pers_MEAN`
 **EHI Values MEAN:**
 - **Source Variables**: `ehi_val_trad`, `ehi_val_autonomy`, `ehi_val_personal`, `ehi_val_justice`, `ehi_val_close`, `ehi_val_connect` (6 variables)
 - **Target Variable**: `ehi_val_MEAN`
 #### Composite Means
 **EHI Domain-Specific Mean:**
 - **Source Variables**: `ehi_pref_MEAN`, `ehi_pers_MEAN`, `ehi_val_MEAN` (3 variables)
 - **Target Variable**: `ehiDS_mean`
 **EHI Domain-General Mean:**
 - **Source Variables**: `ehi_pref_DGEN`, `ehi_pers_DGEN`, `ehi_val_DGEN` (3 variables)
 - **Target Variable**: `ehiDGEN_mean`
 ### Logic
 - Converts source variables to numeric (handling empty strings and NA)
 - Calculates row means using `rowMeans()` with `na.rm = TRUE` (ignores NA values)
 - Each mean represents the average of non-missing values for that row
 ### Validation Checks
 1. **Variable Existence**: Uses `setdiff()` to check source and target variables exist
 2. **Random Row Validation**: Checks 5 random rows showing source variable names, source values, target value, expected mean calculation, and match status
 ### Output
 - Updates existing target columns in `eohi3.csv`
 - Creates backup `eohi3_2.csv` before processing
 ---
 ## datap 07 - scales and recodes.r
 ### Goal
 Recode various variables and calculate scale scores. Includes recoding categorical variables, processing cognitive reflection test (CRT) items, calculating ICAR scores, and recoding demographic variables.
 ### Transformations
 #### 1. Recode other_length2 → other_length
 **Source Variable**: `other_length2`  
 **Target Variable**: `other_length`
 **Recoding Rules:**
 - Values 5-9 → "5-9"
 - Values 10-14 → "10-14"
 - Values 15-19 → "15-19"
 - Value "20+" → "20+" (handled as special case)
 - Empty strings → preserved as empty string (not NA)
 - NA → NA
 #### 2. Recode other_like2 → other_like
 **Source Variable**: `other_like2`  
 **Target Variable**: `other_like`
 **Recoding Rules:**
 - "Dislike a great deal" → "-2"
 - "Dislike somewhat" → "-1"
 - "Neither like nor dislike" → "0"
 - "Like somewhat" → "1"
 - "Like a great deal" → "2"
 - Empty strings → preserved as empty string (not NA)
 - NA → NA
 #### 3. Calculate aot_total (Actively Open-Minded Thinking)
 **Source Variables**: `aot01`, `aot02`, `aot03`, `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`, `aot08`  
 **Target Variable**: `aot_total`
 **Calculation:**
 1. Reverse code `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r` by multiplying by -1
 2. Calculate mean of all 8 variables: 4 original (`aot01`, `aot02`, `aot03`, `aot08`) + 4 reversed (`aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`)
 #### 4. Process CRT Questions → crt_correct and crt_int
 **Source Variables**: `crt01`, `crt02`, `crt03`  
 **Target Variables**: `crt_correct`, `crt_int`
 **CRT01:**
 - "5 cents" → `crt_correct` = 1, `crt_int` = 0
 - "10 cents" → `crt_correct` = 0, `crt_int` = 1
 - Other values → `crt_correct` = 0, `crt_int` = 0
 **CRT02:**
 - "5 minutes" → `crt_correct` += 1, `crt_int` unchanged
 - "100 minutes" → `crt_correct` unchanged, `crt_int` += 1
 - Other values → both unchanged
 **CRT03:**
 - "47 days" → `crt_correct` += 1, `crt_int` unchanged
 - "24 days" → `crt_correct` unchanged, `crt_int` += 1
 - Other values → both unchanged
 **Note**: `crt_correct` and `crt_int` are cumulative across all 3 questions (range: 0-3)
 #### 5. Calculate icar_verbal
 **Source Variables**: `verbal01`, `verbal02`, `verbal03`, `verbal04`, `verbal05`  
 **Target Variable**: `icar_verbal`
 **Correct Answers:**
 - `verbal01` = "5"
 - `verbal02` = "8"
 - `verbal03` = "It's impossible to tell"
 - `verbal04` = "47"
 - `verbal05` = "Sunday"
 **Calculation**: Proportion correct = (number of correct responses) / 5
 #### 6. Calculate icar_matrix
 **Source Variables**: `matrix01`, `matrix02`, `matrix03`, `matrix04`, `matrix05`  
 **Target Variable**: `icar_matrix`
 **Correct Answers:**
 - `matrix01` = "D"
 - `matrix02` = "E"
 - `matrix03` = "B"
 - `matrix04` = "B"
 - `matrix05` = "D"
 **Calculation**: Proportion correct = (number of correct responses) / 5
 #### 7. Calculate icar_total
 **Source Variables**: `verbal01`-`verbal05`, `matrix01`-`matrix05` (10 variables total)  
 **Target Variable**: `icar_total`
 **Calculation**: Proportion correct across all 10 items = (number of correct responses) / 10
 #### 8. Recode demo_sex → sex
 **Source Variable**: `demo_sex`  
 **Target Variable**: `sex`
 **Recoding Rules:**
 - "Male" (case-insensitive) → 0
 - "Female" (case-insensitive) → 1
 - Other values (e.g., "Prefer not to say") → 2
 - Empty/NA → NA
 #### 9. Recode demo_edu → education
 **Source Variable**: `demo_edu`  
 **Target Variable**: `education` (ordered factor)
 **Recoding Rules:**
 - "High School (or equivalent)" or "Trade School" → "HS_TS"
 - "College Diploma/Certificate" or "University - Undergraduate" → "C_Ug"
 - "University - Graduate (Masters)" or "University - PhD" or "Professional Degree (ex. JD/MD)" → "grad_prof"
 - Empty/NA → NA
 **Factor Levels**: `HS_TS` < `C_Ug` < `grad_prof` (ordered)
 ### Validation Checks
 Each transformation includes:
 1. **Variable Existence Check**: Verifies source and target variables exist
 2. **Value Check**: Verifies expected values exist in source variables (warns about unexpected values)
 3. **Post-Processing Verification**: Checks 5 random rows showing source values, target values, and calculations
 ### Output
 - Updates existing target columns in `eohi3.csv`
 - Creates backup `eohi3_2.csv` before processing
 ---
 ## Script Execution Order
 These scripts should be run in the following order:
 1. **datap 04 - combined vars.r** - Combines self/other variables into past/future variables
 2. **datap 05 - ehi vars.r** - Calculates EHI variables from past/future differences
 3. **datap 06 - mean vars.r** - Calculates mean variables for scales
 4. **datap 07 - scales and recodes.r** - Recodes variables and calculates scale scores
 Each script creates a backup (`eohi3_2.csv`) before processing and includes validation checks to ensure transformations are performed correctly.
--- a/eohi3/DA00_fmaxVALS.r
+++ b/eohi3/DA00_fmaxVALS.r
@ -0,0 +1,149 @@
 library(SuppDists)
 library(dplyr)
 library(tidyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 between_vars <- c("perspective", "temporalDO")
 within_vars_MEAN <- c(
  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
 )
 within_vars_DGEN <- c(
  "past_pref_DGEN", "past_pers_DGEN", "past_val_DGEN",
  "fut_pref_DGEN",  "fut_pers_DGEN",  "fut_val_DGEN"
 )
 df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 anova_data_MEAN <- df %>%
  select(pID, all_of(between_vars), all_of(within_vars_MEAN)) %>%
  filter(!is.na(perspective), perspective != "",
         !is.na(temporalDO), temporalDO != "")
 long_data_MEAN <- anova_data_MEAN %>%
  pivot_longer(
    cols = all_of(within_vars_MEAN),
    names_to = "variable",
    values_to = "MEAN_SCORE"
  ) %>%
  mutate(
    time   = ifelse(grepl("^past_", variable), "past", "fut"),
    domain = case_when(
      grepl("_pref_MEAN$", variable) ~ "pref",
      grepl("_pers_MEAN$", variable) ~ "pers",
      grepl("_val_MEAN$",  variable) ~ "val",
      TRUE ~ NA_character_
    )
  ) %>%
  mutate(
    TIME       = factor(time,   levels = c("past", "fut")),
    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
    perspective = factor(perspective),
    temporalDO  = factor(temporalDO)
  ) %>%
  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
  filter(!is.na(MEAN_SCORE))
 cell_vars_MEAN <- long_data_MEAN %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n        = n(),
    variance = var(MEAN_SCORE, na.rm = TRUE),
    .groups  = "drop"
  )
 fmax_by_cell_MEAN <- cell_vars_MEAN %>%
  group_by(TIME, DOMAIN) %>%
  summarise(
    Fmax_observed = max(variance, na.rm = TRUE) / min(variance, na.rm = TRUE),
    df_min        = min(n) - 1L,
    .groups       = "drop"
  )
 k <- 4
 fmax_table_MEAN <- fmax_by_cell_MEAN %>%
  rowwise() %>%
  mutate(
    alpha_0.05 = SuppDists::qmaxFratio(0.95, df = df_min, k = k),
    alpha_0.01 = SuppDists::qmaxFratio(0.99, df = df_min, k = k)
  ) %>%
  ungroup() %>%
  mutate(
    Fmax_observed = round(Fmax_observed, 4),
    alpha_0.05    = round(alpha_0.05, 4),
    alpha_0.01    = round(alpha_0.01, 4)
  ) %>%
  select(TIME, DOMAIN, Fmax_observed, alpha_0.05, alpha_0.01)
 # ---- MEAN: Print observed Hartley ratios ----
 cat("\n--- Hartley ratios (MEAN) ---\n")
 fmax_table_MEAN %>%
  mutate(across(where(is.numeric), ~ format(round(., 4), nsmall = 4))) %>%
  print()
 # ---- DGEN: Observed Hartley ratios ----
 anova_data_DGEN <- df %>%
  select(pID, all_of(between_vars), all_of(within_vars_DGEN)) %>%
  filter(!is.na(perspective), perspective != "",
         !is.na(temporalDO), temporalDO != "")
 long_data_DGEN <- anova_data_DGEN %>%
  pivot_longer(
    cols = all_of(within_vars_DGEN),
    names_to = "variable",
    values_to = "DGEN_SCORE"
  ) %>%
  mutate(
    time   = ifelse(grepl("^past_", variable), "past", "fut"),
    domain = case_when(
      grepl("_pref_DGEN$", variable) ~ "pref",
      grepl("_pers_DGEN$", variable) ~ "pers",
      grepl("_val_DGEN$",  variable) ~ "val",
      TRUE ~ NA_character_
    )
  ) %>%
  mutate(
    TIME       = factor(time,   levels = c("past", "fut")),
    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
    perspective = factor(perspective),
    temporalDO  = factor(temporalDO)
  ) %>%
  select(pID, perspective, temporalDO, TIME, DOMAIN, DGEN_SCORE) %>%
  filter(!is.na(DGEN_SCORE))
 cell_vars_DGEN <- long_data_DGEN %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n        = n(),
    variance = var(DGEN_SCORE, na.rm = TRUE),
    .groups  = "drop"
  )
 fmax_by_cell_DGEN <- cell_vars_DGEN %>%
  group_by(TIME, DOMAIN) %>%
  summarise(
    Fmax_observed = max(variance, na.rm = TRUE) / min(variance, na.rm = TRUE),
    df_min        = min(n) - 1L,
    .groups       = "drop"
  )
 fmax_table_DGEN <- fmax_by_cell_DGEN %>%
  rowwise() %>%
  mutate(
    alpha_0.05 = SuppDists::qmaxFratio(0.95, df = df_min, k = k),
    alpha_0.01 = SuppDists::qmaxFratio(0.99, df = df_min, k = k)
  ) %>%
  ungroup() %>%
  mutate(
    Fmax_observed = round(Fmax_observed, 4),
    alpha_0.05    = round(alpha_0.05, 4),
    alpha_0.01    = round(alpha_0.01, 4)
  ) %>%
  select(TIME, DOMAIN, Fmax_observed, alpha_0.05, alpha_0.01)
 cat("\n--- Hartley ratios (DGEN) ---\n")
 fmax_table_DGEN %>%
  mutate(across(where(is.numeric), ~ format(round(., 4), nsmall = 4))) %>%
  print()
--- a/eohi3/DA01_anova_DS.r
+++ b/eohi3/DA01_anova_DS.r
@ -0,0 +1,235 @@
 library(tidyverse)
 library(rstatix)
 library(emmeans)
 library(effectsize)
 library(afex)
 library(car)
 options(scipen = 999)
 afex::set_sum_contrasts()
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 df <- read.csv(
  "eohi3.csv",
  stringsAsFactors = FALSE,
  check.names = FALSE,
  na.strings = "NA"
 )
 between_vars <- c("perspective", "temporalDO")
 within_vars <- c(
  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
 )
 missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
 if (length(missing_vars) > 0) {
  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
 }
 anova_data <- df %>%
  select(pID, all_of(between_vars), all_of(within_vars)) %>%
  filter(
    !is.na(perspective), perspective != "",
    !is.na(temporalDO),  temporalDO  != ""
  )
 long_data <- anova_data %>%
  pivot_longer(
    cols = all_of(within_vars),
    names_to = "variable",
    values_to = "MEAN_SCORE"
  ) %>%
  mutate(
    time = case_when(
      grepl("^past_", variable) ~ "past",
      grepl("^fut_",  variable) ~ "fut",
      TRUE ~ NA_character_
    ),
    domain = case_when(
      grepl("_pref_MEAN$", variable) ~ "pref",
      grepl("_pers_MEAN$", variable) ~ "pers",
      grepl("_val_MEAN$",  variable) ~ "val",
      TRUE ~ NA_character_
    )
  ) %>%
  mutate(
    TIME       = factor(time,   levels = c("past", "fut")),
    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
    perspective = factor(perspective),
    temporalDO  = factor(temporalDO),
    pID         = factor(pID)
  ) %>%
  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
  filter(!is.na(MEAN_SCORE))
 desc_stats <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n        = n(),
    mean     = round(mean(MEAN_SCORE), 5),
    variance = round(var(MEAN_SCORE), 5),
    sd       = round(sd(MEAN_SCORE), 5),
    median   = round(median(MEAN_SCORE), 5),
    q1       = round(quantile(MEAN_SCORE, 0.25), 5),
    q3       = round(quantile(MEAN_SCORE, 0.75), 5),
    min      = round(min(MEAN_SCORE), 5),
    max      = round(max(MEAN_SCORE), 5),
    .groups  = "drop"
  )
 print(desc_stats, n = Inf)
 missing_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n_total     = n(),
    n_missing   = sum(is.na(MEAN_SCORE)),
    pct_missing = round(100 * n_missing / n_total, 2),
    .groups     = "drop"
  )
 print(missing_summary, n = Inf)
 outlier_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n          = n(),
    n_outliers = sum(abs(scale(MEAN_SCORE)) > 3),
    .groups    = "drop"
  )
 print(outlier_summary, n = Inf)
 homogeneity_between <- long_data %>%
  group_by(TIME, DOMAIN) %>%
  rstatix::levene_test(MEAN_SCORE ~ perspective * temporalDO)
 print(homogeneity_between, n = Inf)
 # Normality: within-subjects residuals (deviation from each participant's mean)
 resid_within <- long_data %>%
  group_by(pID) %>%
  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(resid = MEAN_SCORE - person_mean) %>%
  pull(resid)
 resid_within <- resid_within[!is.na(resid_within)]
 n_resid <- length(resid_within)
 if (n_resid < 3L) {
  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
 } else {
  resid_for_shapiro <- if (n_resid > 5000L) {
    set.seed(1L)
    sample(resid_within, 5000L)
  } else {
    resid_within
  }
  print(shapiro.test(resid_for_shapiro))
 }
 # qqnorm(resid_within)
 # qqline(resid_within)
 aov_afex <- aov_ez(
  id      = "pID",
  dv      = "MEAN_SCORE",
  data    = long_data,
  between = c("perspective", "temporalDO"),
  within  = c("TIME", "DOMAIN"),
  type    = 3
 )
 # ANOVA table: uncorrected and Greenhouse–Geisser
 cat("\n--- ANOVA Table (Type 3, uncorrected) ---\n")
 print(nice(aov_afex, correction = "none"))
 cat("\n--- ANOVA Table (Type 3, Greenhouse–Geisser correction) ---\n")
 print(nice(aov_afex, correction = "GG"))
 # Mauchly's test of sphericity and epsilon (via car::Anova on wide data)
 anova_wide <- anova_data %>%
  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
  filter(if_all(all_of(within_vars), ~ !is.na(.)))
 response_matrix <- as.matrix(anova_wide[, within_vars])
 rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
 idata <- data.frame(
  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
 )
 rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
 rm_summary <- summary(rm_anova, multivariate = FALSE)
 if (!is.null(rm_summary$sphericity.tests)) {
  cat("\nMauchly's Test of Sphericity:\n")
  print(rm_summary$sphericity.tests)
 }
 if (!is.null(rm_summary$epsilon)) {
  cat("\nEpsilon (GG, HF):\n")
  print(rm_summary$epsilon)
 }
 # Within-subjects residuals: deviation from each participant's mean (one per observation)
 resid_within <- long_data %>%
  group_by(pID) %>%
  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(resid = MEAN_SCORE - person_mean) %>%
  pull(resid)
 resid_within <- resid_within[!is.na(resid_within)]
 # R's shapiro.test() allows 3 <= n <= 5000; use a random sample of 5000 if we have more
 n_resid <- length(resid_within)
 if (n_resid < 3L) {
  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
 } else {
  resid_for_shapiro <- if (n_resid > 5000L) {
    set.seed(1L)
    sample(resid_within, 5000L)
  } else {
    resid_within
  }
  print(shapiro.test(resid_for_shapiro))
 }
 # qqnorm(resid_within)
 # qqline(resid_within)
 # POST-HOC COMPARISONS (significant effects only)
 # TIME (main effect)
 emm_TIME <- emmeans(aov_afex, ~ TIME)
 print(pairs(emm_TIME, adjust = "bonferroni"))
 # temporalDO:TIME — ~TIME and ~temporalDO
 emm_temporalDO_TIME <- emmeans(aov_afex, ~ TIME | temporalDO)
 print(pairs(emm_temporalDO_TIME, adjust = "bonferroni"))
 emm_temporalDO_temporalDO <- emmeans(aov_afex, ~ temporalDO | TIME)
 print(pairs(emm_temporalDO_temporalDO, adjust = "bonferroni"))
 # perspective:temporalDO:TIME — ~TIME, ~perspective, ~temporalDO
 emm_pt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO)
 print(pairs(emm_pt_TIME, adjust = "bonferroni"))
 emm_pt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME)
 print(pairs(emm_pt_perspective, adjust = "bonferroni"))
 emm_pt_temporalDO <- emmeans(aov_afex, ~ temporalDO | perspective + TIME)
 print(pairs(emm_pt_temporalDO, adjust = "bonferroni"))
 # perspective:DOMAIN — ~perspective and ~DOMAIN
 emm_perspective_DOMAIN <- emmeans(aov_afex, ~ perspective | DOMAIN)
 print(pairs(emm_perspective_DOMAIN, adjust = "bonferroni"))
 emm_perspective_DOMAIN_domain <- emmeans(aov_afex, ~ DOMAIN | perspective)
 print(pairs(emm_perspective_DOMAIN_domain, adjust = "bonferroni"))
 # perspective:TIME:DOMAIN — ~TIME, ~perspective, ~DOMAIN
 emm_pt_TIME_domain <- emmeans(aov_afex, ~ TIME | perspective + DOMAIN)
 print(pairs(emm_pt_TIME_domain, adjust = "bonferroni"))
 emm_pt_domain_perspective <- emmeans(aov_afex, ~ perspective | TIME + DOMAIN)
 print(pairs(emm_pt_domain_perspective, adjust = "bonferroni"))
 emm_pt_domain_domain <- emmeans(aov_afex, ~ DOMAIN | perspective + TIME)
 print(pairs(emm_pt_domain_domain, adjust = "bonferroni"))
 # perspective:temporalDO:TIME:DOMAIN — ~TIME, ~perspective, ~temporalDO
 emm_ptt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO + DOMAIN)
 print(pairs(emm_ptt_TIME, adjust = "bonferroni"))
 emm_ptt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME + DOMAIN)
 print(pairs(emm_ptt_perspective, adjust = "bonferroni"))
 emm_ptt_temporalDO <- emmeans(aov_afex, ~ temporalDO | perspective + TIME + DOMAIN)
 print(pairs(emm_ptt_temporalDO, adjust = "bonferroni"))
--- a/eohi3/dataREVIEW-JAN05/eohi3_filter2.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_filter2.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_raw.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_raw.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_raw2.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_raw2.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_unprocessed.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_unprocessed.csv
--- a/eohi3/dataREVIEW-JAN05/response
+++ b/eohi3/dataREVIEW-JAN05/response
@ -0,0 +1,68 @@
 ResponseId,RATIONALE
 R_12EXYt8gHauPaCb,duration
 R_142iZtlDp1Vam14,duration
 R_16eRiaoFPG5CpE4,duration
 R_1aK2JWzCFkpefUg,duration
 R_1FEuEk6VzuwxZby,duration
 R_1IsHUv4sb6oOphv,duration
 R_1J2cryciskOYjOV,duration
 R_1JFsZ1GXM7jDWmh,duration
 R_1JlV9H7AJKtNZ8g,duration
 R_1kgjhkT4sJwhfuV,duration
 R_1MAMwGkBHTTSyAh,duration
 R_1O6dV9hTlqpsYjP,duration
 R_1qatgZwcLPGctnd,age mismatch
 R_1QE5KaKNkt66Cer,duration
 R_1QsYazd3eOH62js,duration
 R_1vwOg7l0kSLHGRX,duration
 R_1YJ2G01dpxYqKAm,duration
 R_1YoddNWqybPbaNN,feedback in french
 R_1ZOjQ97Ph1VtRwp,duration
 R_347ABt6LFPUeVZS,duration
 R_34Ain6V2NbEDeQm,duration
 R_38J0VDB8JE8Dd0o,duration
 R_3DptQmS26X0Z8Wu,IP duplicate
 R_3Foc2aYGpXFrbnX,age mismatch + duration
 R_3HLz0FyaULkIPKu,IP duplicate
 R_3jUhefm4hAEQ6PC,duration
 R_3n8b0ndM4habNjB,age mismatch
 R_3nTLzs9jMwDHbFy,duration
 R_3rGudTtAd2oVze3,duration
 R_3t6giyCy5IwZgom,duration
 R_3WwXkl4IatPYDZ0,age mismatch
 R_5ByssDsdjMcQgUV,duration
 R_5cNBH4nxBlH8OSB,duration
 R_5FkttTgBeMePzhk,sex mismatch
 R_5FyLW7dHpyFojo5,duration
 R_5M3urkuYhhSG06E,duration
 R_5MRp7eFKMm59t14,feedback in french
 R_5n6H7xuYTQgvFEf,duration
 R_5rrbHXjKol6Zl9U,duration
 R_5youAGSa5hLGkuZ,age mismatch + duration
 R_5z5DYfTnai5Pj3j,duration
 R_64nOi2TWI4XCYkt,duration
 R_6BcdSiP0Nibxx1D,duration
 R_6C4v9kRnGm9Iqyj,IP duplicate
 R_6CpjN5tJoj8dYuB,duration
 R_6cwKXrr8R99m5ez,duration
 R_6F4ld4gRlKjsb06,age mismatch + duration
 R_6GqjTqXrehkbG0x,duration
 R_6HCtgHyy16nNMQ4,age mismatch
 R_6hQN1DUFkxGpDGD,IP duplicate
 R_6JKscJDUeAt7k1y,age mismatch
 R_6lKqtees5Z1hj2L,duration
 R_6m1NYZLedxbAxui,duration
 R_6pM4ierZhbT1FEb,duration
 R_6rQCiwlJHKrWWKB,duration
 R_7AwVrmL8AM0KLKx,duration
 R_7bH15XzvHpDCZO1,duration
 R_7Cl7KFkEiuYwdZn,duration
 R_7EfALTPED13tduG,duration
 R_7flJBV9qf88XSM5,duration
 R_7H0dTzsyEC1Pzyh,duration
 R_7HM0FXjrAoTeGqt,duration
 R_7HRMvwMPw3OBE7g,duration
 R_7o7FORJHlgWAahS,age mismatch
 R_7sTsQ9AI42QQgSV,duration
 R_7VJCRyovK5KAddn,duration
 R_7w4ggvRoPBkyTle,duration
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,189 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # RATIONALE column should exist in the CSV
 # Ensure RATIONALE is character and convert any NA values to empty strings
 if (!is.character(df$RATIONALE)) {
  df$RATIONALE <- as.character(df$RATIONALE)
 }
 df$RATIONALE[is.na(df$RATIONALE)] <- ""
 # Function to check if age falls within range
 check_age_range <- function(age_num, age_range_str) {
  # Check if data is missing or empty
  if (is.na(age_num) || is.null(age_num) || age_range_str == "" || is.na(age_range_str) || trimws(age_range_str) == "") {
    return(NULL)  # Can't check if data is missing - return NULL to indicate skip
  }
  # Parse range string (e.g., "46 - 52" or "25 - 31")
  range_parts <- strsplit(trimws(age_range_str), "\\s*-\\s*")[[1]]
  if (length(range_parts) != 2) {
    return(NULL)  # Invalid range format - return NULL to indicate skip
  }
  min_age <- as.numeric(trimws(range_parts[1]))
  max_age <- as.numeric(trimws(range_parts[2]))
  if (is.na(min_age) || is.na(max_age)) {
    return(NULL)  # Couldn't parse numbers - return NULL to indicate skip
  }
  # Check if age falls within range (inclusive)
  return(age_num >= min_age && age_num <= max_age)
 }
 # Function to check if a value is empty (empty string or whitespace only)
 # Empty cells are kept as empty strings, not NA
 # Vectorized to handle both single values and vectors
 is_empty <- function(x) {
  if (is.null(x)) return(TRUE)
  # Handle vectors
  if (length(x) > 1) {
    result <- rep(FALSE, length(x))
    result[is.na(x)] <- TRUE
    if (is.character(x)) {
      result[trimws(x) == ""] <- TRUE
      result[x == ""] <- TRUE
    }
    return(result)
  }
  # Handle single value
  if (is.na(x)) return(TRUE)
  if (is.character(x) && trimws(x) == "") return(TRUE)
  if (is.character(x) && x == "") return(TRUE)
  return(FALSE)
 }
 # 1. Check sex match
 # Only check if both values are non-empty
 sex_mismatch <- rep(FALSE, nrow(df))
 for (i in seq_len(nrow(df))) {
  demo_sex_val <- ifelse(is.na(df$demo_sex[i]), "", trimws(df$demo_sex[i]))
  taq_sex_val <- ifelse(is.na(df$taq_sex[i]), "", trimws(df$taq_sex[i]))
  # Only check if both are non-empty
  if (demo_sex_val != "" && taq_sex_val != "") {
    if (tolower(demo_sex_val) != tolower(taq_sex_val)) {
      sex_mismatch[i] <- TRUE
    }
  }
 }
 # 2. Check age range match
 age_mismatch <- rep(FALSE, nrow(df))
 for (i in seq_len(nrow(df))) {
  # Only check if demo_age is not empty/NA and taq_age is not empty
  if (!is.na(df$demo_age[i]) && !is_empty(df$taq_age[i])) {
    age_check <- check_age_range(df$demo_age[i], df$taq_age[i])
    # age_check is NULL if we can't check, FALSE if mismatch, TRUE if match
    if (!is.null(age_check) && !age_check) {
      age_mismatch[i] <- TRUE
    }
  }
 }
 # 3. Check citizenship (taq_cit_1 or taq_cit_2)
 no_cit <- is_empty(df$taq_cit_1) & is_empty(df$taq_cit_2)
 # 4. Check IP address duplicates
 # Find IP addresses that appear more than once (non-empty IPs only)
 ip_duplicate <- rep(FALSE, nrow(df))
 if ("IPAddress" %in% colnames(df)) {
  # Get non-empty IP addresses
  ip_addresses <- ifelse(is.na(df$IPAddress), "", trimws(df$IPAddress))
  # Count occurrences of each IP
  ip_counts <- table(ip_addresses)
  # Get IPs that appear more than once (and are not empty)
  duplicate_ips <- names(ip_counts)[ip_counts > 1 & names(ip_counts) != ""]
  # Mark rows with duplicate IPs
  if (length(duplicate_ips) > 0) {
    for (dup_ip in duplicate_ips) {
      ip_duplicate[ip_addresses == dup_ip] <- TRUE
    }
  }
 }
 # Build RATIONALE column - only populate when there are issues
 # Start with empty strings to preserve existing empty cells
 rationale_parts <- rep("", nrow(df))
 # Add sex mismatch
 rationale_parts[sex_mismatch] <- "sex mismatch"
 # Add age mismatch (append if sex mismatch already exists)
 for (i in seq_len(nrow(df))) {
  if (age_mismatch[i]) {
    if (rationale_parts[i] != "") {
      rationale_parts[i] <- paste(rationale_parts[i], "age mismatch", sep = "; ")
    } else {
      rationale_parts[i] <- "age mismatch"
    }
  }
 }
 # Add no cit (append if other issues already exist)
 for (i in seq_len(nrow(df))) {
  if (no_cit[i]) {
    if (rationale_parts[i] != "") {
      rationale_parts[i] <- paste(rationale_parts[i], "no cit", sep = "; ")
    } else {
      rationale_parts[i] <- "no cit"
    }
  }
 }
 # Add IP duplicate (append if other issues already exist)
 for (i in seq_len(nrow(df))) {
  if (ip_duplicate[i]) {
    if (rationale_parts[i] != "") {
      rationale_parts[i] <- paste(rationale_parts[i], "IP duplicate", sep = "; ")
    } else {
      rationale_parts[i] <- "IP duplicate"
    }
  }
 }
 # Update RATIONALE column - only set when there are issues, otherwise keep existing value
 # If no issues found, keep the cell empty (or existing value if any)
 for (i in seq_len(nrow(df))) {
  if (rationale_parts[i] != "") {
    df$RATIONALE[i] <- rationale_parts[i]
  }
  # If rationale_parts[i] is empty, leave RATIONALE as is (preserves existing empty or other values)
 }
 # Summary - using multiple methods to ensure output appears
 # Try message() first (better for debug console)
 message("Validation Summary:")
 message("Sex mismatches: ", sum(sex_mismatch))
 message("Age mismatches: ", sum(age_mismatch))
 message("No citizenship: ", sum(no_cit))
 message("IP duplicates: ", sum(ip_duplicate))
 message("Total rows with issues: ", sum(rationale_parts != ""))
 # Also use cat() to stdout (for terminal)
 cat("Validation Summary:\n", file = stdout())
 cat("Sex mismatches:", sum(sex_mismatch), "\n", file = stdout())
 cat("Age mismatches:", sum(age_mismatch), "\n", file = stdout())
 cat("No citizenship:", sum(no_cit), "\n", file = stdout())
 cat("IP duplicates:", sum(ip_duplicate), "\n", file = stdout())
 cat("Total rows with issues:", sum(rationale_parts != ""), "\n", file = stdout())
 flush(stdout())
 # Write the updated data
 # Preserve empty strings as empty (not NA)
 # Convert character column NAs to empty strings to preserve empty cells
 for (col in names(df)) {
  if (is.character(df[[col]])) {
    df[[col]][is.na(df[[col]])] <- ""
  }
 }
 write.csv(df, "eohi3_raw2.csv", row.names = FALSE, na = "", quote = TRUE)
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,39 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # Populate citizenship column from taq_cit_1 and taq_cit_2
 # If both have values, set to "Both"
 # Otherwise, use the value from whichever column has a value
 # Empty values remain as empty strings (not NA)
 # Ensure citizenship column exists, initialize with empty strings if needed
 if (!"citizenship" %in% names(df)) {
  df$citizenship <- ""
 }
 # Convert NA to empty string for taq_cit columns to ensure consistent handling
 df$taq_cit_1[is.na(df$taq_cit_1)] <- ""
 df$taq_cit_2[is.na(df$taq_cit_2)] <- ""
 # Populate citizenship based on taq_cit_1 and taq_cit_2 using base R
 # Check if both have values (non-empty)
 both_have_values <- df$taq_cit_1 != "" & df$taq_cit_2 != ""
 # Check if only taq_cit_1 has a value
 only_cit1 <- df$taq_cit_1 != "" & df$taq_cit_2 == ""
 # Check if only taq_cit_2 has a value
 only_cit2 <- df$taq_cit_2 != "" & df$taq_cit_1 == ""
 # Assign values
 df$citizenship[both_have_values] <- "Both"
 df$citizenship[only_cit1] <- df$taq_cit_1[only_cit1]
 df$citizenship[only_cit2] <- df$taq_cit_2[only_cit2]
 # For rows where neither has a value, citizenship keeps its original value (may be empty string)
 write.csv(df, "eohi3_raw.csv", row.names = FALSE, na = "", quote = TRUE)
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,130 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # Remove trailing columns with empty names (dplyr requires all columns to have names)
 empty_cols <- which(names(df) == "" | is.na(names(df)))
 if (length(empty_cols) > 0) {
  df <- df[, -empty_cols, drop = FALSE]
 }
 # Set to TRUE to save all distributions to a document file
 save_to_doc <- TRUE
 doc_filename <- "eohi3_quotas.txt"
 # =============================================================================
 # SINGLE VARIABLE DISTRIBUTIONS
 # =============================================================================
 dist_age <- df %>% count(taq_age, sort = TRUE)
 print(dist_age)
 dist_sex <- df %>% count(taq_sex, sort = TRUE)
 print(dist_sex)
 dist_citizenship <- df %>% count(citizenship, sort = TRUE)
 print(dist_citizenship)
 dist_group <- df %>% count(group, sort = TRUE)
 print(dist_group)
 dist_temporalDO <- df %>% count(temporalDO, sort = TRUE)
 print(dist_temporalDO)
 dist_perspective <- df %>% count(perspective, sort = TRUE)
 print(dist_perspective)
 # =============================================================================
 # NESTED DISTRIBUTIONS
 # =============================================================================
 dist_age_citizenship <- df %>% count(citizenship, taq_age) %>% arrange(citizenship, taq_age)
 print(dist_age_citizenship)
 dist_sex_citizenship <- df %>% count(citizenship, taq_sex) %>% arrange(citizenship, taq_sex)
 print(dist_sex_citizenship)
 dist_age_temporalDO <- df %>% count(temporalDO, taq_age) %>% arrange(temporalDO, taq_age)
 print(dist_age_temporalDO)
 dist_age_perspective <- df %>% count(perspective, taq_age) %>% arrange(perspective, taq_age)
 print(dist_age_perspective)
 dist_sex_temporalDO <- df %>% count(temporalDO, taq_sex) %>% arrange(temporalDO, taq_sex)
 print(dist_sex_temporalDO)
 dist_sex_perspective <- df %>% count(perspective, taq_sex) %>% arrange(perspective, taq_sex)
 print(dist_sex_perspective)
 # =============================================================================
 # OPTIONAL: SAVE ALL DISTRIBUTIONS TO DOCUMENT
 # =============================================================================
 if (save_to_doc) {
  sink(doc_filename)
  cat("DISTRIBUTION REPORT\n")
  cat("==================\n\n")
  cat("SINGLE VARIABLE DISTRIBUTIONS\n")
  cat("------------------------------\n\n")
  cat("Distribution of taq_age:\n")
  print(dist_age)
  cat("\n\n")
  cat("Distribution of taq_sex:\n")
  print(dist_sex)
  cat("\n\n")
  cat("Distribution of citizenship:\n")
  print(dist_citizenship)
  cat("\n\n")
  cat("Distribution of group:\n")
  print(dist_group)
  cat("\n\n")
  cat("Distribution of temporalDO:\n")
  print(dist_temporalDO)
  cat("\n\n")
  cat("Distribution of perspective:\n")
  print(dist_perspective)
  cat("\n\n")
  cat("NESTED DISTRIBUTIONS\n")
  cat("---------------------\n\n")
  cat("Age within Citizenship:\n")
  print(dist_age_citizenship)
  cat("\n\n")
  cat("Sex within Citizenship:\n")
  print(dist_sex_citizenship)
  cat("\n\n")
  cat("Age within temporalDO:\n")
  print(dist_age_temporalDO)
  cat("\n\n")
  cat("Age within perspective:\n")
  print(dist_age_perspective)
  cat("\n\n")
  cat("Sex within temporalDO:\n")
  print(dist_sex_temporalDO)
  cat("\n\n")
  cat("Sex within perspective:\n")
  print(dist_sex_perspective)
  cat("\n")
  sink()
  cat("Distributions saved to:", doc_filename, "\n")
 }
--- a/eohi3/dataREVIEW-JAN21/eohi3_quotas.txt
+++ b/eohi3/dataREVIEW-JAN21/eohi3_quotas.txt
@ -0,0 +1,177 @@
 DISTRIBUTION REPORT
 ==================
 SINGLE VARIABLE DISTRIBUTIONS
 ------------------------------
 Distribution of taq_age:
  taq_age  n
 1 18 - 24 73
 2 53 - 59 67
 3 60 - 66 67
 4 67 - 73 65
 5 39 - 45 64
 6 46 - 52 63
 7 25 - 31 62
 8 32 - 38 61
 Distribution of taq_sex:
            taq_sex   n
 1            Female 260
 2              Male 257
 3 Prefer not to say   5
 Distribution of citizenship:
  citizenship   n
 1    American 262
 2    Canadian 258
 3        Both   2
 Distribution of group:
  group   n
 1 01FPV 177
 2 03VFP 174
 3 02PVF 171
 Distribution of temporalDO:
  temporalDO   n
 1       past 262
 2     future 260
 Distribution of perspective:
  perspective   n
 1       other 261
 2        self 261
 NESTED DISTRIBUTIONS
 ---------------------
 Age within Citizenship:
   citizenship taq_age  n
 1     American 18 - 24 38
 2     American 25 - 31 30
 3     American 32 - 38 29
 4     American 39 - 45 33
 5     American 46 - 52 31
 6     American 53 - 59 34
 7     American 60 - 66 34
 8     American 67 - 73 33
 9         Both 32 - 38  1
 10        Both 46 - 52  1
 11    Canadian 18 - 24 35
 12    Canadian 25 - 31 32
 13    Canadian 32 - 38 31
 14    Canadian 39 - 45 31
 15    Canadian 46 - 52 31
 16    Canadian 53 - 59 33
 17    Canadian 60 - 66 33
 18    Canadian 67 - 73 32
 Sex within Citizenship:
  citizenship           taq_sex   n
 1    American            Female 130
 2    American              Male 129
 3    American Prefer not to say   3
 4        Both            Female   1
 5        Both              Male   1
 6    Canadian            Female 129
 7    Canadian              Male 127
 8    Canadian Prefer not to say   2
 Age within temporalDO:
   temporalDO taq_age  n
 1      future 18 - 24 38
 2      future 25 - 31 31
 3      future 32 - 38 29
 4      future 39 - 45 34
 5      future 46 - 52 35
 6      future 53 - 59 36
 7      future 60 - 66 29
 8      future 67 - 73 28
 9        past 18 - 24 35
 10       past 25 - 31 31
 11       past 32 - 38 32
 12       past 39 - 45 30
 13       past 46 - 52 28
 14       past 53 - 59 31
 15       past 60 - 66 38
 16       past 67 - 73 37
 Age within perspective:
   perspective taq_age  n
 1        other 18 - 24 41
 2        other 25 - 31 36
 3        other 32 - 38 28
 4        other 39 - 45 32
 5        other 46 - 52 28
 6        other 53 - 59 33
 7        other 60 - 66 30
 8        other 67 - 73 33
 9         self 18 - 24 32
 10        self 25 - 31 26
 11        self 32 - 38 33
 12        self 39 - 45 32
 13        self 46 - 52 35
 14        self 53 - 59 34
 15        self 60 - 66 37
 16        self 67 - 73 32
 Sex within temporalDO:
  temporalDO           taq_sex   n
 1     future            Female 130
 2     future              Male 129
 3     future Prefer not to say   1
 4       past            Female 130
 5       past              Male 128
 6       past Prefer not to say   4
 Sex within perspective:
  perspective           taq_sex   n
 1       other            Female 130
 2       other              Male 128
 3       other Prefer not to say   3
 4        self            Female 130
 5        self              Male 129
 6        self Prefer not to say   2
--- a/eohi3/dataREVIEW-JAN21/eohi3_raw.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_raw.csv
--- a/eohi3/dataREVIEW-JAN21/eohi3_raw2.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_raw2.csv
--- a/eohi3/dataREVIEW-JAN21/eohi3_unprocessed_final.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_unprocessed_final.csv
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,343 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # =============================================================================
 # 1. CREATE BACKUP
 # =============================================================================
 #file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
 # =============================================================================
 # 2. DEFINE VARIABLE MAPPINGS
 # =============================================================================
 # Past variables mapping: [self/other][VAL/PERS/PREF]_p5_[string] -> past_[val/pers/pref]_[string]
 past_mappings <- list(
  # Values (VAL)
  "past_val_trad" = c("selfVAL_p5_trad", "otherVAL_p5_trad"),
  "past_val_autonomy" = c("selfVAL_p5_autonomy", "otherVAL_p5_autonomy"),
  "past_val_personal" = c("selfVAL_p5_personal", "otherVAL_p5_personal"),
  "past_val_justice" = c("selfVAL_p5_justice", "otherVAL_p5_justice"),
  "past_val_close" = c("selfVAL_p5_close", "otherVAL_p5_close"),
  "past_val_connect" = c("selfVAL_p5_connect", "otherVAL_p5_connect"),
  "past_val_DGEN" = c("selfVAL_p5_dgen", "otherVAL_p5_dgen"),
  # Personality (PERS)
  "past_pers_open" = c("selfPERS_p5_open", "otherPERS_p5_open"),
  "past_pers_goal" = c("selfPERS_p5_goal", "otherPERS_p5_goal"),
  "past_pers_social" = c("selfPERS_p5_social", "otherPERS_p5_social"),
  "past_pers_agree" = c("selfPERS_p5_agree", "otherPERS_p5_agree"),
  "past_pers_stress" = c("selfPERS_p5_stress", "otherPERS_p5_stress"),
  "past_pers_DGEN" = c("selfPERS_p5_dgen", "otherPERS_p5_dgen"),
  # Preferences (PREF)
  "past_pref_hobbies" = c("selfPREF_p5_hobbies", "otherPREF_p5_hobbies"),
  "past_pref_music" = c("selfPREF_p5_music", "otherPREF_p5_music"),
  "past_pref_dress" = c("selfPREF_p5_dress", "otherPREF_p5_dress"),
  "past_pref_exer" = c("selfPREF_p5_exer", "otherPREF_p5_exer"),
  "past_pref_food" = c("selfPREF_p5_food", "otherPREF_p5_food"),
  "past_pref_friends" = c("selfPREF_p5_friends", "otherPREF_p5_friends"),
  "past_pref_DGEN" = c("selfPREF_p5_dgen", "otherPREF_p5_dgen")
 )
 # Future variables mapping: [self/other][VAL/PERS/PREF]_f5_[string] -> fut_[val/pers/pref]_[string]
 future_mappings <- list(
  # Values (VAL)
  "fut_val_trad" = c("selfVAL_f5_trad", "otherVAL_f5_trad"),
  "fut_val_autonomy" = c("selfVAL_f5_autonomy", "otherVAL_f5_autonomy"),
  "fut_val_personal" = c("selfVAL_f5_personal", "otherVAL_f5_personal"),
  "fut_val_justice" = c("selfVAL_f5_justice", "otherVAL_f5_justice"),
  "fut_val_close" = c("selfVAL_f5_close", "otherVAL_f5_close"),
  "fut_val_connect" = c("selfVAL_f5_connect", "otherVAL_f5_connect"),
  "fut_val_DGEN" = c("selfVAL_f5_dgen", "otherVAL_f5_dgen"),
  # Personality (PERS)
  "fut_pers_open" = c("selfPERS_f5_open", "otherPERS_f5_open"),
  "fut_pers_goal" = c("selfPERS_f5_goal", "otherPERS_f5_goal"),
  "fut_pers_social" = c("selfPERS_f5_social", "otherPERS_f5_social"),
  "fut_pers_agree" = c("selfPERS_f5_agree", "otherPERS_f5_agree"),
  "fut_pers_stress" = c("selfPERS_f5_stress", "otherPERS_f5_stress"),
  "fut_pers_DGEN" = c("selfPERS_f5_dgen", "otherPERS_f5_dgen"),
  # Preferences (PREF)
  "fut_pref_hobbies" = c("selfPREF_f5_hobbies", "otherPREF_f5_hobbies"),
  "fut_pref_music" = c("selfPREF_f5_music", "otherPREF_f5_music"),
  "fut_pref_dress" = c("selfPREF_f5_dress", "otherPREF_f5_dress"),
  "fut_pref_exer" = c("selfPREF_f5_exer", "otherPREF_f5_exer"),
  "fut_pref_food" = c("selfPREF_f5_food", "otherPREF_f5_food"),
  "fut_pref_friends" = c("selfPREF_f5_friends", "otherPREF_f5_friends"),
  "fut_pref_DGEN" = c("selfPREF_f5_dgen", "otherPREF_f5_dgen")
 )
 # =============================================================================
 # 3. COMBINE VARIABLES
 # =============================================================================
 # Function to combine self and other variables
 # For each row, values exist in either self OR other, never both
 # NOTE: Column existence should be checked before calling this function
 combine_vars <- function(df, self_col, other_col) {
  # Safety check: if columns don't exist, return appropriate fallback
  if (!self_col %in% names(df)) {
    stop(paste("ERROR: Column", self_col, "not found. This should have been caught earlier."))
  }
  if (!other_col %in% names(df)) {
    stop(paste("ERROR: Column", other_col, "not found. This should have been caught earlier."))
  }
  # Combine: use self value if not empty/NA, otherwise use other value
  # Handle both NA and empty strings
  result <- ifelse(
    !is.na(df[[self_col]]) & df[[self_col]] != "",
    df[[self_col]],
    ifelse(
      !is.na(df[[other_col]]) & df[[other_col]] != "",
      df[[other_col]],
      NA
    )
  )
  return(result)
 }
 # Apply past mappings
 cat("\nCombining past variables...\n")
 missing_cols <- list()
 for (new_col in names(past_mappings)) {
  self_col <- past_mappings[[new_col]][1]
  other_col <- past_mappings[[new_col]][2]
  # Check if all required columns exist
  missing <- c()
  if (!new_col %in% names(df)) {
    missing <- c(missing, paste("target:", new_col))
  }
  if (!self_col %in% names(df)) {
    missing <- c(missing, paste("self:", self_col))
  }
  if (!other_col %in% names(df)) {
    missing <- c(missing, paste("other:", other_col))
  }
  if (length(missing) > 0) {
    missing_cols[[new_col]] <- missing
    warning(paste("Skipping", new_col, "- missing columns:", paste(missing, collapse = ", ")))
    next
  }
  # All columns exist, proceed with combination
  df[[new_col]] <- combine_vars(df, self_col, other_col)
  cat(paste("  Updated:", new_col, "\n"))
 }
 # Report any missing columns
 if (length(missing_cols) > 0) {
  cat("\n⚠ Missing columns detected in PAST variables:\n")
  for (var in names(missing_cols)) {
    cat(paste("  ", var, ":", paste(missing_cols[[var]], collapse = ", "), "\n"))
  }
 }
 # Apply future mappings
 cat("\nCombining future variables...\n")
 missing_cols_future <- list()
 for (new_col in names(future_mappings)) {
  self_col <- future_mappings[[new_col]][1]
  other_col <- future_mappings[[new_col]][2]
  # Check if all required columns exist
  missing <- c()
  if (!new_col %in% names(df)) {
    missing <- c(missing, paste("target:", new_col))
  }
  if (!self_col %in% names(df)) {
    missing <- c(missing, paste("self:", self_col))
  }
  if (!other_col %in% names(df)) {
    missing <- c(missing, paste("other:", other_col))
  }
  if (length(missing) > 0) {
    missing_cols_future[[new_col]] <- missing
    warning(paste("Skipping", new_col, "- missing columns:", paste(missing, collapse = ", ")))
    next
  }
  # All columns exist, proceed with combination
  df[[new_col]] <- combine_vars(df, self_col, other_col)
  cat(paste("  Updated:", new_col, "\n"))
 }
 # Report any missing columns
 if (length(missing_cols_future) > 0) {
  cat("\n⚠ Missing columns detected in FUTURE variables:\n")
  for (var in names(missing_cols_future)) {
    cat(paste("  ", var, ":", paste(missing_cols_future[[var]], collapse = ", "), "\n"))
  }
 }
 # =============================================================================
 # 4. VALIDATION CHECKS
 # =============================================================================
 cat("\n=== VALIDATION CHECKS ===\n\n")
 # Check 1: Ensure no row has values in both self and other for the same variable
 check_conflicts <- function(df, mappings) {
  conflicts <- data.frame()
  for (new_col in names(mappings)) {
    self_col <- mappings[[new_col]][1]
    other_col <- mappings[[new_col]][2]
    if (self_col %in% names(df) && other_col %in% names(df)) {
      # Find rows where both self and other have non-empty values
      both_filled <- !is.na(df[[self_col]]) & df[[self_col]] != "" &
                     !is.na(df[[other_col]]) & df[[other_col]] != ""
      if (any(both_filled, na.rm = TRUE)) {
        conflict_rows <- which(both_filled)
        conflicts <- rbind(conflicts, data.frame(
          variable = new_col,
          self_col = self_col,
          other_col = other_col,
          n_conflicts = length(conflict_rows),
          example_rows = paste(head(conflict_rows, 5), collapse = ", ")
        ))
      }
    }
  }
  return(conflicts)
 }
 past_conflicts <- check_conflicts(df, past_mappings)
 future_conflicts <- check_conflicts(df, future_mappings)
 if (nrow(past_conflicts) > 0) {
  cat("WARNING: Found conflicts in PAST variables (both self and other have values):\n")
  print(past_conflicts)
 } else {
  cat("✓ No conflicts found in PAST variables\n")
 }
 if (nrow(future_conflicts) > 0) {
  cat("\nWARNING: Found conflicts in FUTURE variables (both self and other have values):\n")
  print(future_conflicts)
 } else {
  cat("✓ No conflicts found in FUTURE variables\n")
 }
 # Check 2: Verify that combined columns have values where expected
 check_coverage <- function(df, mappings) {
  coverage <- data.frame()
  for (new_col in names(mappings)) {
    self_col <- mappings[[new_col]][1]
    other_col <- mappings[[new_col]][2]
    # Check if columns exist before counting
    self_exists <- self_col %in% names(df)
    other_exists <- other_col %in% names(df)
    target_exists <- new_col %in% names(df)
    # Count non-empty values in original columns (only if they exist)
    self_count <- if (self_exists) {
      sum(!is.na(df[[self_col]]) & df[[self_col]] != "", na.rm = TRUE)
    } else {
      NA
    }
    other_count <- if (other_exists) {
      sum(!is.na(df[[other_col]]) & df[[other_col]] != "", na.rm = TRUE)
    } else {
      NA
    }
    combined_count <- if (target_exists) {
      sum(!is.na(df[[new_col]]) & df[[new_col]] != "", na.rm = TRUE)
    } else {
      NA
    }
    # Combined should equal sum of self and other (since they don't overlap)
    expected_count <- if (!is.na(self_count) && !is.na(other_count)) {
      self_count + other_count
    } else {
      NA
    }
    match <- if (!is.na(combined_count) && !is.na(expected_count)) {
      combined_count == expected_count
    } else {
      NA
    }
    coverage <- rbind(coverage, data.frame(
      variable = new_col,
      self_non_empty = self_count,
      other_non_empty = other_count,
      combined_non_empty = combined_count,
      expected_non_empty = expected_count,
      match = match
    ))
  }
  return(coverage)
 }
 past_coverage <- check_coverage(df, past_mappings)
 future_coverage <- check_coverage(df, future_mappings)
 cat("\n=== COVERAGE CHECK ===\n")
 cat("\nPAST variables:\n")
 print(past_coverage)
 cat("\nFUTURE variables:\n")
 print(future_coverage)
 # Check if all coverage matches
 all_past_match <- all(past_coverage$match, na.rm = TRUE)
 all_future_match <- all(future_coverage$match, na.rm = TRUE)
 if (all_past_match && all_future_match) {
  cat("\n✓ All combined variables have correct coverage\n")
 } else {
  cat("\n⚠ Some variables may have missing coverage - check the table above\n")
 }
 # Check 3: Sample check - verify a few rows manually
 cat("\n=== SAMPLE ROW CHECK ===\n")
 sample_rows <- min(5, nrow(df))
 cat(paste("Checking first", sample_rows, "rows:\n\n"))
 for (i in 1:sample_rows) {
  cat(paste("Row", i, ":\n"))
  # Check one past variable
  test_var <- "past_val_trad"
  self_val <- if (past_mappings[[test_var]][1] %in% names(df)) df[i, past_mappings[[test_var]][1]] else NA
  other_val <- if (past_mappings[[test_var]][2] %in% names(df)) df[i, past_mappings[[test_var]][2]] else NA
  combined_val <- df[i, test_var]
  cat(sprintf("  %s: self=%s, other=%s, combined=%s\n", 
              test_var, 
              ifelse(is.na(self_val) || self_val == "", "empty", self_val),
              ifelse(is.na(other_val) || other_val == "", "empty", other_val),
              ifelse(is.na(combined_val) || combined_val == "", "empty", combined_val)))
 }
 # =============================================================================
 # 5. SAVE UPDATED DATA
 # =============================================================================
 write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
 cat("Updated data saved to: eohi3.csv\n")
 cat(paste("Total rows:", nrow(df), "\n"))
 cat(paste("Total columns:", ncol(df), "\n"))
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,187 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # =============================================================================
 # 1. CREATE BACKUP
 # =============================================================================
 file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
 # =============================================================================
 # 2. DEFINE VARIABLE MAPPINGS
 # =============================================================================
 # Target variables (excluding those ending in _MEAN)
 # Each target var = past_var - fut_var
 ehi_mappings <- list(
  # Preferences (PREF)
  "ehi_pref_hobbies" = c("past_pref_hobbies", "fut_pref_hobbies"),
  "ehi_pref_music" = c("past_pref_music", "fut_pref_music"),
  "ehi_pref_dress" = c("past_pref_dress", "fut_pref_dress"),
  "ehi_pref_exer" = c("past_pref_exer", "fut_pref_exer"),
  "ehi_pref_food" = c("past_pref_food", "fut_pref_food"),
  "ehi_pref_friends" = c("past_pref_friends", "fut_pref_friends"),
  "ehi_pref_DGEN" = c("past_pref_DGEN", "fut_pref_DGEN"),
  # Personality (PERS)
  "ehi_pers_open" = c("past_pers_open", "fut_pers_open"),
  "ehi_pers_goal" = c("past_pers_goal", "fut_pers_goal"),
  "ehi_pers_social" = c("past_pers_social", "fut_pers_social"),
  "ehi_pers_agree" = c("past_pers_agree", "fut_pers_agree"),
  "ehi_pers_stress" = c("past_pers_stress", "fut_pers_stress"),
  "ehi_pers_DGEN" = c("past_pers_DGEN", "fut_pers_DGEN"),
  # Values (VAL)
  "ehi_val_trad" = c("past_val_trad", "fut_val_trad"),
  "ehi_val_autonomy" = c("past_val_autonomy", "fut_val_autonomy"),
  "ehi_val_personal" = c("past_val_personal", "fut_val_personal"),
  "ehi_val_justice" = c("past_val_justice", "fut_val_justice"),
  "ehi_val_close" = c("past_val_close", "fut_val_close"),
  "ehi_val_connect" = c("past_val_connect", "fut_val_connect"),
  "ehi_val_DGEN" = c("past_val_DGEN", "fut_val_DGEN")
 )
 # =============================================================================
 # 3. CHECK IF TARGET VARIABLES EXIST
 # =============================================================================
 missing_targets <- c()
 for (target_var in names(ehi_mappings)) {
  if (!target_var %in% names(df)) {
    missing_targets <- c(missing_targets, target_var)
    cat(paste("⚠ Target variable not found:", target_var, "\n"))
  }
 }
 if (length(missing_targets) > 0) {
  cat("\nERROR: The following target variables are missing from eohi3.csv:\n")
  for (var in missing_targets) {
    cat(paste("  -", var, "\n"))
  }
  stop("Cannot proceed without target variables. Please add them to the CSV file.")
 }
 # =============================================================================
 # 4. CALCULATE EHI VARIABLES (past - future)
 # =============================================================================
 missing_source_cols <- list()
 for (target_var in names(ehi_mappings)) {
  past_var <- ehi_mappings[[target_var]][1]
  fut_var <- ehi_mappings[[target_var]][2]
  # Check if source columns exist
  missing <- c()
  if (!past_var %in% names(df)) {
    missing <- c(missing, past_var)
  }
  if (!fut_var %in% names(df)) {
    missing <- c(missing, fut_var)
  }
  if (length(missing) > 0) {
    missing_source_cols[[target_var]] <- missing
    warning(paste("Skipping", target_var, "- missing source columns:", paste(missing, collapse = ", ")))
    next
  }
  # Convert to numeric, handling empty strings and NA
  past_vals <- as.numeric(ifelse(df[[past_var]] == "" | is.na(df[[past_var]]), NA, df[[past_var]]))
  fut_vals <- as.numeric(ifelse(df[[fut_var]] == "" | is.na(df[[fut_var]]), NA, df[[fut_var]]))
  # Calculate difference: past - future
  ehi_vals <- past_vals - fut_vals
  # Update target column
  df[[target_var]] <- ehi_vals
  cat(paste("  Calculated:", target_var, "=", past_var, "-", fut_var, "\n"))
 }
 # Report any missing source columns
 if (length(missing_source_cols) > 0) {
  for (var in names(missing_source_cols)) {
    cat(paste("  ", var, ":", paste(missing_source_cols[[var]], collapse = ", "), "\n"))
  }
 }
 # =============================================================================
 # 5. VALIDATION: CHECK 5 RANDOM ROWS
 # =============================================================================
 cat("\n=== VALIDATION: CHECKING 5 RANDOM ROWS ===\n\n")
 # Set seed for reproducibility
 set.seed(123)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 sample_rows <- sort(sample_rows)
 for (i in sample_rows) {
  cat(paste("Row", i, ":\n"))
  # Check a few representative variables from each category
  test_vars <- c(
    "ehi_pref_hobbies",
    "ehi_pers_open", 
    "ehi_val_trad"
  )
  for (target_var in test_vars) {
    if (target_var %in% names(ehi_mappings)) {
      past_var <- ehi_mappings[[target_var]][1]
      fut_var <- ehi_mappings[[target_var]][2]
      if (past_var %in% names(df) && fut_var %in% names(df)) {
        past_val <- df[i, past_var]
        fut_val <- df[i, fut_var]
        ehi_val <- df[i, target_var]
        # Convert to numeric for calculation check
        past_num <- as.numeric(ifelse(past_val == "" | is.na(past_val), NA, past_val))
        fut_num <- as.numeric(ifelse(fut_val == "" | is.na(fut_val), NA, fut_val))
        ehi_num <- as.numeric(ifelse(is.na(ehi_val), NA, ehi_val))
        # Calculate expected value
        expected <- if (!is.na(past_num) && !is.na(fut_num)) {
          past_num - fut_num
        } else {
          NA
        }
        # Check if calculation is correct
        match <- if (!is.na(expected) && !is.na(ehi_num)) {
          abs(expected - ehi_num) < 0.0001  # Allow for floating point precision
        } else {
          is.na(expected) && is.na(ehi_num)
        }
        cat(sprintf("  %s:\n", target_var))
        cat(sprintf("    %s = %s\n", past_var, ifelse(is.na(past_val) || past_val == "", "NA/empty", past_val)))
        cat(sprintf("    %s = %s\n", fut_var, ifelse(is.na(fut_val) || fut_val == "", "NA/empty", fut_val)))
        cat(sprintf("    %s = %s\n", target_var, ifelse(is.na(ehi_val), "NA", ehi_val)))
        cat(sprintf("    Expected: %s - %s = %s\n", 
                    ifelse(is.na(past_num), "NA", past_num),
                    ifelse(is.na(fut_num), "NA", fut_num),
                    ifelse(is.na(expected), "NA", expected)))
        cat(sprintf("    Match: %s\n\n", ifelse(match, "✓", "✗ ERROR")))
      }
    }
  }
 }
 # =============================================================================
 # 6. SAVE UPDATED DATA
 # =============================================================================
 # COMMENTED OUT: Uncomment when ready to save
 # cat("\n=== SAVING DATA ===\n")
 write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
 # cat("Updated data saved to: eohi3.csv\n")
 # cat(paste("Total rows:", nrow(df), "\n"))
 # cat(paste("Total columns:", ncol(df), "\n"))
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,225 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # =============================================================================
 # 1. CREATE BACKUP
 # =============================================================================
 file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
 # =============================================================================
 # 2. DEFINE MEAN VARIABLE MAPPINGS
 # =============================================================================
 mean_mappings <- list(
  # Past Preferences MEAN
  "past_pref_MEAN" = c("past_pref_hobbies", "past_pref_music", "past_pref_dress", 
                       "past_pref_exer", "past_pref_food", "past_pref_friends"),
  # Future Preferences MEAN
  "fut_pref_MEAN" = c("fut_pref_hobbies", "fut_pref_music", "fut_pref_dress", 
                      "fut_pref_exer", "fut_pref_food", "fut_pref_friends"),
  # Past Personality MEAN
  "past_pers_MEAN" = c("past_pers_open", "past_pers_goal", "past_pers_social", 
                       "past_pers_agree", "past_pers_stress"),
  # Future Personality MEAN
  "fut_pers_MEAN" = c("fut_pers_open", "fut_pers_goal", "fut_pers_social", 
                       "fut_pers_agree", "fut_pers_stress"),
  # Past Values MEAN
  "past_val_MEAN" = c("past_val_trad", "past_val_autonomy", "past_val_personal", 
                      "past_val_justice", "past_val_close", "past_val_connect"),
  # Future Values MEAN
  "fut_val_MEAN" = c("fut_val_trad", "fut_val_autonomy", "fut_val_personal", 
                     "fut_val_justice", "fut_val_close", "fut_val_connect"),
  # EHI Preferences MEAN
  "ehi_pref_MEAN" = c("ehi_pref_hobbies", "ehi_pref_music", "ehi_pref_dress", 
                      "ehi_pref_exer", "ehi_pref_food", "ehi_pref_friends"),
  # EHI Personality MEAN
  "ehi_pers_MEAN" = c("ehi_pers_open", "ehi_pers_goal", "ehi_pers_social", 
                      "ehi_pers_agree", "ehi_pers_stress"),
  # EHI Values MEAN
  "ehi_val_MEAN" = c("ehi_val_trad", "ehi_val_autonomy", "ehi_val_personal", 
                     "ehi_val_justice", "ehi_val_close", "ehi_val_connect")
 )
 # Additional means
 additional_means <- list(
  "ehiDS_mean" = c("ehi_pref_MEAN", "ehi_pers_MEAN", "ehi_val_MEAN"),
  "ehiDGEN_mean" = c("ehi_pref_DGEN", "ehi_pers_DGEN", "ehi_val_DGEN")
 )
 # =============================================================================
 # 3. CHECK IF VARIABLES EXIST
 # =============================================================================
 # Check source variables for mean_mappings
 missing_source_vars <- list()
 for (target_var in names(mean_mappings)) {
  source_vars <- mean_mappings[[target_var]]
  missing <- setdiff(source_vars, names(df))
  if (length(missing) > 0) {
    missing_source_vars[[target_var]] <- missing
    cat(paste("⚠ Missing source variables for", target_var, ":", paste(missing, collapse = ", "), "\n"))
  }
 }
 # Check source variables for additional_means
 missing_additional_vars <- list()
 for (target_var in names(additional_means)) {
  source_vars <- additional_means[[target_var]]
  missing <- setdiff(source_vars, names(df))
  if (length(missing) > 0) {
    missing_additional_vars[[target_var]] <- missing
    cat(paste("⚠ Missing source variables for", target_var, ":", paste(missing, collapse = ", "), "\n"))
  }
 }
 # Check if target variables exist
 expected_targets <- c(names(mean_mappings), names(additional_means))
 actual_targets <- names(df)
 missing_targets <- setdiff(expected_targets, actual_targets)
 if (length(missing_targets) > 0) {
  cat("\nERROR: The following target variables are missing from eohi3.csv:\n")
  for (var in missing_targets) {
    cat(paste("  -", var, "\n"))
  }
  stop("Cannot proceed without target variables. Please add them to the CSV file.")
 }
 # =============================================================================
 # 4. CALCULATE MEAN VARIABLES
 # =============================================================================
 # Function to calculate row means, handling NA and empty strings
 calculate_mean <- function(df, source_vars) {
  # Extract columns and convert to numeric
  cols_data <- df[, source_vars, drop = FALSE]
  # Convert to numeric matrix, treating empty strings and "NA" as NA
  numeric_matrix <- apply(cols_data, 2, function(x) {
    as.numeric(ifelse(x == "" | is.na(x) | x == "NA", NA, x))
  })
  # Calculate row means, ignoring NA values
  rowMeans(numeric_matrix, na.rm = TRUE)
 }
 # Calculate means for main mappings
 for (target_var in names(mean_mappings)) {
  source_vars <- mean_mappings[[target_var]]
  # Check if all source variables exist
  missing <- setdiff(source_vars, names(df))
  if (length(missing) > 0) {
    warning(paste("Skipping", target_var, "- missing source variables:", paste(missing, collapse = ", ")))
    next
  }
  # Calculate mean
  df[[target_var]] <- calculate_mean(df, source_vars)
  cat(paste("  Calculated:", target_var, "from", length(source_vars), "variables\n"))
 }
 # Calculate additional means
 for (target_var in names(additional_means)) {
  source_vars <- additional_means[[target_var]]
  # Check if all source variables exist
  missing <- setdiff(source_vars, names(df))
  if (length(missing) > 0) {
    warning(paste("Skipping", target_var, "- missing source variables:", paste(missing, collapse = ", ")))
    next
  }
  # Calculate mean
  df[[target_var]] <- calculate_mean(df, source_vars)
  cat(paste("  Calculated:", target_var, "from", length(source_vars), "variables\n"))
 }
 # =============================================================================
 # 5. VALIDATION: CHECK 5 RANDOM ROWS
 # =============================================================================
 # Set seed for reproducibility
 set.seed(123)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 sample_rows <- sort(sample_rows)
 for (i in sample_rows) {
  cat(paste("Row", i, ":\n"))
  # Check a few representative mean variables
  test_vars <- c(
    "past_pref_MEAN",
    "ehi_pref_MEAN",
    "ehiDS_mean"
  )
  for (target_var in test_vars) {
    # Determine which mapping to use
    if (target_var %in% names(mean_mappings)) {
      source_vars <- mean_mappings[[target_var]]
    } else if (target_var %in% names(additional_means)) {
      source_vars <- additional_means[[target_var]]
    } else {
      next
    }
    # Check if all source variables exist
    if (!all(source_vars %in% names(df))) {
      next
    }
    # Get values
    source_vals <- df[i, source_vars]
    target_val <- df[i, target_var]
    # Convert to numeric for calculation
    source_nums <- as.numeric(ifelse(source_vals == "" | is.na(source_vals) | source_vals == "NA", NA, source_vals))
    target_num <- as.numeric(ifelse(is.na(target_val), NA, target_val))
    # Calculate expected mean (ignoring NA)
    expected <- mean(source_nums, na.rm = TRUE)
    if (all(is.na(source_nums))) {
      expected <- NA
    }
    # Check if calculation is correct
    match <- if (!is.na(expected) && !is.na(target_num)) {
      abs(expected - target_num) < 0.0001  # Allow for floating point precision
    } else {
      is.na(expected) && is.na(target_num)
    }
    cat(sprintf("  %s:\n", target_var))
    cat(sprintf("    Source variables: %s\n", paste(source_vars, collapse = ", ")))
    cat(sprintf("    Source values: %s\n", paste(ifelse(is.na(source_vals) | source_vals == "", "NA/empty", source_vals), collapse = ", ")))
    cat(sprintf("    %s = %s\n", target_var, ifelse(is.na(target_val), "NA", round(target_val, 4))))
    cat(sprintf("    Expected mean: %s\n", ifelse(is.na(expected), "NA", round(expected, 4))))
    cat(sprintf("    Match: %s\n\n", ifelse(match, "✓", "✗ ERROR")))
  }
 }
 # =============================================================================
 # 6. SAVE UPDATED DATA
 # =============================================================================
 # COMMENTED OUT: Uncomment when ready to save
 write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
 # cat("Updated data saved to: eohi3.csv\n")
 # cat(paste("Total rows:", nrow(df), "\n"))
 # cat(paste("Total columns:", ncol(df), "\n"))
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,462 @@
 library(dplyr)
 setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
 # Read the data (with check.names=FALSE to preserve original column names)
 # Keep empty cells as empty strings, not NA
 # Only convert the literal string "NA" to NA, not empty strings
 df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
 # =============================================================================
 # 1. CREATE BACKUP
 # =============================================================================
 file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
 # =============================================================================
 # HELPER FUNCTION: Check variable existence and values
 # =============================================================================
 check_vars_exist <- function(source_vars, target_vars) {
  missing_source <- setdiff(source_vars, names(df))
  missing_target <- setdiff(target_vars, names(df))
  if (length(missing_source) > 0) {
    stop(paste("Missing source variables:", paste(missing_source, collapse = ", ")))
  }
  if (length(missing_target) > 0) {
    stop(paste("Missing target variables:", paste(missing_target, collapse = ", ")))
  }
  return(TRUE)
 }
 check_values_exist <- function(var_name, expected_values) {
  unique_vals <- unique(df[[var_name]])
  unique_vals <- unique_vals[!is.na(unique_vals) & unique_vals != ""]
  missing_vals <- setdiff(expected_values, unique_vals)
  extra_vals <- setdiff(unique_vals, expected_values)
  if (length(missing_vals) > 0) {
    cat(paste("  ⚠ Expected values not found in", var_name, ":", paste(missing_vals, collapse = ", "), "\n"))
  }
  if (length(extra_vals) > 0) {
    cat(paste("  ⚠ Unexpected values found in", var_name, ":", paste(extra_vals, collapse = ", "), "\n"))
  }
  return(list(missing = missing_vals, extra = extra_vals))
 }
 # =============================================================================
 # 2. RECODE other_length2 TO other_length
 # =============================================================================
 cat("\n=== 1. RECODING other_length2 TO other_length ===\n\n")
 # Check variables exist
 check_vars_exist("other_length2", "other_length")
 # Check values in source
 cat("Checking source variable values...\n")
 length_vals <- unique(df$other_length2[!is.na(df$other_length2) & df$other_length2 != ""])
 cat(paste("  Unique values in other_length2:", paste(length_vals, collapse = ", "), "\n"))
 # Recode - handle "20+" as special case first, then convert to numeric for ranges
 # Convert to numeric once, suppressing warnings for non-numeric values
 num_length <- suppressWarnings(as.numeric(df$other_length2))
 df$other_length <- ifelse(
  is.na(df$other_length2),
  NA,
  ifelse(
    df$other_length2 == "",
    "",
    ifelse(
      df$other_length2 == "20+",
      "20+",
      ifelse(
        !is.na(num_length) & num_length >= 5 & num_length <= 9,
        "5-9",
        ifelse(
          !is.na(num_length) & num_length >= 10 & num_length <= 14,
          "10-14",
          ifelse(
            !is.na(num_length) & num_length >= 15 & num_length <= 19,
            "15-19",
            NA
          )
        )
      )
    )
  )
 )
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(123)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  source_val <- df$other_length2[i]
  target_val <- df$other_length[i]
  cat(sprintf("  Row %d: other_length2 = %s -> other_length = %s\n", 
              i, ifelse(is.na(source_val), "NA", ifelse(source_val == "", "empty", source_val)), 
              ifelse(is.na(target_val), "NA", ifelse(target_val == "", "empty", target_val))))
 }
 # =============================================================================
 # 3. RECODE other_like2 TO other_like
 # =============================================================================
 cat("\n=== 2. RECODING other_like2 TO other_like ===\n\n")
 # Check variables exist
 check_vars_exist("other_like2", "other_like")
 # Check expected values exist
 expected_like <- c("Dislike a great deal", "Dislike somewhat", "Neither like nor dislike", 
                   "Like somewhat", "Like a great deal")
 check_values_exist("other_like2", expected_like)
 # Recode
 df$other_like <- ifelse(
  is.na(df$other_like2),
  NA,
  ifelse(
    df$other_like2 == "",
    "",
    ifelse(
      df$other_like2 == "Dislike a great deal",
      "-2",
      ifelse(
        df$other_like2 == "Dislike somewhat",
        "-1",
        ifelse(
          df$other_like2 == "Neither like nor dislike",
          "0",
          ifelse(
            df$other_like2 == "Like somewhat",
            "1",
            ifelse(
              df$other_like2 == "Like a great deal",
              "2",
              NA
            )
          )
        )
      )
    )
  )
 )
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(456)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  source_val <- df$other_like2[i]
  target_val <- df$other_like[i]
  cat(sprintf("  Row %d: other_like2 = %s -> other_like = %s\n", 
              i, ifelse(is.na(source_val), "NA", ifelse(source_val == "", "empty", source_val)), 
              ifelse(is.na(target_val), "NA", ifelse(target_val == "", "empty", target_val))))
 }
 # =============================================================================
 # 4. CALCULATE aot_total
 # =============================================================================
 cat("\n=== 3. CALCULATING aot_total ===\n\n")
 # Check variables exist
 aot_vars <- c("aot01", "aot02", "aot03", "aot04_r", "aot05_r", "aot06_r", "aot07_r", "aot08")
 check_vars_exist(aot_vars, "aot_total")
 # Reverse code aot04_r through aot07_r
 reverse_vars <- c("aot04_r", "aot05_r", "aot06_r", "aot07_r")
 for (var in reverse_vars) {
  df[[paste0(var, "_reversed")]] <- as.numeric(ifelse(
    df[[var]] == "" | is.na(df[[var]]),
    NA,
    as.numeric(df[[var]]) * -1
  ))
 }
 # Calculate mean of all 8 variables (4 reversed + 4 original)
 all_aot_vars <- c("aot01", "aot02", "aot03", "aot04_r_reversed", "aot05_r_reversed", 
                  "aot06_r_reversed", "aot07_r_reversed", "aot08")
 # Convert to numeric matrix
 aot_matrix <- df[, all_aot_vars]
 aot_numeric <- apply(aot_matrix, 2, function(x) {
  as.numeric(ifelse(x == "" | is.na(x), NA, x))
 })
 # Calculate mean
 df$aot_total <- rowMeans(aot_numeric, na.rm = TRUE)
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(789)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  aot_vals <- df[i, all_aot_vars]
  aot_nums <- as.numeric(ifelse(aot_vals == "" | is.na(aot_vals), NA, aot_vals))
  expected_mean <- mean(aot_nums, na.rm = TRUE)
  actual_mean <- df$aot_total[i]
  cat(sprintf("  Row %d: aot_total = %s (expected: %s)\n", 
              i, ifelse(is.na(actual_mean), "NA", round(actual_mean, 4)),
              ifelse(is.na(expected_mean), "NA", round(expected_mean, 4))))
 }
 # =============================================================================
 # 5. PROCESS CRT QUESTIONS
 # =============================================================================
 cat("\n=== 4. PROCESSING CRT QUESTIONS ===\n\n")
 # Check variables exist
 check_vars_exist(c("crt01", "crt02", "crt03"), c("crt_correct", "crt_int"))
 # Initialize CRT variables
 df$crt_correct <- 0
 df$crt_int <- 0
 # CRT01: "5 cents" = correct (1,0), "10 cents" = intuitive (0,1), else (0,0)
 df$crt_correct <- ifelse(df$crt01 == "5 cents", 1, df$crt_correct)
 df$crt_int <- ifelse(df$crt01 == "10 cents", 1, df$crt_int)
 # CRT02: "5 minutes" = correct (1,0), "100 minutes" = intuitive (0,1), else (0,0)
 df$crt_correct <- ifelse(df$crt02 == "5 minutes", df$crt_correct + 1, df$crt_correct)
 df$crt_int <- ifelse(df$crt02 == "100 minutes", df$crt_int + 1, df$crt_int)
 # CRT03: "47 days" = correct (1,0), "24 days" = intuitive (0,1), else (0,0)
 df$crt_correct <- ifelse(df$crt03 == "47 days", df$crt_correct + 1, df$crt_correct)
 df$crt_int <- ifelse(df$crt03 == "24 days", df$crt_int + 1, df$crt_int)
 # Check expected values exist
 expected_crt01 <- c("5 cents", "10 cents")
 expected_crt02 <- c("5 minutes", "100 minutes")
 expected_crt03 <- c("47 days", "24 days")
 check_values_exist("crt01", expected_crt01)
 check_values_exist("crt02", expected_crt02)
 check_values_exist("crt03", expected_crt03)
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(1011)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  cat(sprintf("  Row %d:\n", i))
  cat(sprintf("    crt01 = %s -> crt_correct = %d, crt_int = %d\n", 
              ifelse(is.na(df$crt01[i]) || df$crt01[i] == "", "NA/empty", df$crt01[i]),
              ifelse(df$crt01[i] == "5 cents", 1, 0),
              ifelse(df$crt01[i] == "10 cents", 1, 0)))
  cat(sprintf("    crt02 = %s -> crt_correct = %d, crt_int = %d\n", 
              ifelse(is.na(df$crt02[i]) || df$crt02[i] == "", "NA/empty", df$crt02[i]),
              ifelse(df$crt02[i] == "5 minutes", 1, 0),
              ifelse(df$crt02[i] == "100 minutes", 1, 0)))
  cat(sprintf("    crt03 = %s -> crt_correct = %d, crt_int = %d\n", 
              ifelse(is.na(df$crt03[i]) || df$crt03[i] == "", "NA/empty", df$crt03[i]),
              ifelse(df$crt03[i] == "47 days", 1, 0),
              ifelse(df$crt03[i] == "24 days", 1, 0)))
  cat(sprintf("    Total: crt_correct = %d, crt_int = %d\n\n", 
              df$crt_correct[i], df$crt_int[i]))
 }
 # =============================================================================
 # 6. CALCULATE icar_verbal
 # =============================================================================
 cat("\n=== 5. CALCULATING icar_verbal ===\n\n")
 # Check variables exist
 verbal_vars <- c("verbal01", "verbal02", "verbal03", "verbal04", "verbal05")
 check_vars_exist(verbal_vars, "icar_verbal")
 # Correct answers
 correct_verbal <- c("5", "8", "It's impossible to tell", "47", "Sunday")
 # Calculate proportion correct
 verbal_responses <- df[, verbal_vars]
 correct_count <- rowSums(
  sapply(1:5, function(i) {
    verbal_responses[, i] == correct_verbal[i]
  }),
  na.rm = TRUE
 )
 df$icar_verbal <- correct_count / 5
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(1213)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  responses <- df[i, verbal_vars]
  correct <- sum(sapply(1:5, function(j) responses[j] == correct_verbal[j]), na.rm = TRUE)
  prop <- correct / 5
  cat(sprintf("  Row %d: Correct = %d/5, icar_verbal = %s\n", 
              i, correct, round(prop, 4)))
 }
 # =============================================================================
 # 7. CALCULATE icar_matrix
 # =============================================================================
 cat("\n=== 6. CALCULATING icar_matrix ===\n\n")
 # Check variables exist
 matrix_vars <- c("matrix01", "matrix02", "matrix03", "matrix04", "matrix05")
 check_vars_exist(matrix_vars, "icar_matrix")
 # Correct answers
 correct_matrix <- c("D", "E", "B", "B", "D")
 # Calculate proportion correct
 matrix_responses <- df[, matrix_vars]
 correct_count <- rowSums(
  sapply(1:5, function(i) {
    matrix_responses[, i] == correct_matrix[i]
  }),
  na.rm = TRUE
 )
 df$icar_matrix <- correct_count / 5
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(1415)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  responses <- df[i, matrix_vars]
  correct <- sum(sapply(1:5, function(j) responses[j] == correct_matrix[j]), na.rm = TRUE)
  prop <- correct / 5
  cat(sprintf("  Row %d: Correct = %d/5, icar_matrix = %s\n", 
              i, correct, round(prop, 4)))
 }
 # =============================================================================
 # 8. CALCULATE icar_total
 # =============================================================================
 cat("\n=== 7. CALCULATING icar_total ===\n\n")
 # Check variables exist
 check_vars_exist(c(verbal_vars, matrix_vars), "icar_total")
 # Calculate proportion correct across all 10 items
 all_correct <- c(correct_verbal, correct_matrix)
 all_responses <- df[, c(verbal_vars, matrix_vars)]
 correct_count <- rowSums(
  sapply(1:10, function(i) {
    all_responses[, i] == all_correct[i]
  }),
  na.rm = TRUE
 )
 df$icar_total <- correct_count / 10
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(1617)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  responses <- df[i, c(verbal_vars, matrix_vars)]
  correct <- sum(sapply(1:10, function(j) responses[j] == all_correct[j]), na.rm = TRUE)
  prop <- correct / 10
  cat(sprintf("  Row %d: Correct = %d/10, icar_total = %s\n", 
              i, correct, round(prop, 4)))
 }
 # =============================================================================
 # 9. RECODE demo_sex TO sex
 # =============================================================================
 cat("\n=== 8. RECODING demo_sex TO sex ===\n\n")
 # Check variables exist
 check_vars_exist("demo_sex", "sex")
 # Check values
 sex_vals <- unique(df$demo_sex[!is.na(df$demo_sex) & df$demo_sex != ""])
 cat(paste("  Unique values in demo_sex:", paste(sex_vals, collapse = ", "), "\n"))
 # Recode: male = 0, female = 1, else = 2
 df$sex <- ifelse(
  is.na(df$demo_sex) | df$demo_sex == "",
  NA,
  ifelse(
    tolower(df$demo_sex) == "male",
    0,
    ifelse(
      tolower(df$demo_sex) == "female",
      1,
      2
    )
  )
 )
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(1819)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  source_val <- df$demo_sex[i]
  target_val <- df$sex[i]
  cat(sprintf("  Row %d: demo_sex = %s -> sex = %s\n", 
              i, ifelse(is.na(source_val) || source_val == "", "NA/empty", source_val), 
              ifelse(is.na(target_val), "NA", target_val)))
 }
 # =============================================================================
 # 10. RECODE demo_edu TO education
 # =============================================================================
 cat("\n=== 9. RECODING demo_edu TO education ===\n\n")
 # Check variables exist
 check_vars_exist("demo_edu", "education")
 # Check values
 edu_vals <- unique(df$demo_edu[!is.na(df$demo_edu) & df$demo_edu != ""])
 cat(paste("  Unique values in demo_edu:", paste(edu_vals, collapse = ", "), "\n"))
 # Recode
 df$education <- ifelse(
  is.na(df$demo_edu) | df$demo_edu == "",
  NA,
  ifelse(
    df$demo_edu %in% c("High School (or equivalent)", "Trade School"),
    "HS_TS",
    ifelse(
      df$demo_edu %in% c("College Diploma/Certificate", "University - Undergraduate"),
      "C_Ug",
      ifelse(
        df$demo_edu %in% c("University - Graduate (Masters)", "University - PhD", "Professional Degree (ex. JD/MD)"),
        "grad_prof",
        NA
      )
    )
  )
 )
 # Convert to ordered factor
 df$education <- factor(df$education, 
                       levels = c("HS_TS", "C_Ug", "grad_prof"),
                       ordered = TRUE)
 # Verification check
 cat("\nVerification (5 random rows):\n")
 set.seed(2021)
 sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
 for (i in sample_rows) {
  source_val <- df$demo_edu[i]
  target_val <- df$education[i]
  cat(sprintf("  Row %d: demo_edu = %s -> education = %s\n", 
              i, ifelse(is.na(source_val) || source_val == "", "NA/empty", source_val), 
              ifelse(is.na(target_val), "NA", as.character(target_val))))
 }
 # =============================================================================
 # 11. SAVE UPDATED DATA
 # =============================================================================
 # COMMENTED OUT: Uncomment when ready to save
 # write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
 # cat("\nUpdated data saved to: eohi3.csv\n")
 # cat(paste("Total rows:", nrow(df), "\n"))
 # cat(paste("Total columns:", ncol(df), "\n"))
--- a/eohi3/eohi3.csv
+++ b/eohi3/eohi3.csv
--- a/eohi3/knit/DA01_anova_DS.Rmd
+++ b/eohi3/knit/DA01_anova_DS.Rmd
@ -0,0 +1,501 @@
 ---
 title: "Mixed ANOVA - Domain Specific Means (DA01)"
 author: ""
 date: "`r Sys.Date()`"
 output:
  html_document:
    toc: true
    toc_float: true
    code_folding: hide
 ---
 ```{r setup, include = FALSE}
 knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
 ```
 # Setup
 ```{r libraries}
 library(tidyverse)
 library(rstatix)
 library(emmeans)
 library(effectsize)
 library(afex)
 library(car)
 options(scipen = 999)
 afex::set_sum_contrasts()
 ```
 # Data
 ```{r read-data}
 # Data file is in parent of knit/ (eohi3/eohi3.csv)
 df <- read.csv(
  "/home/ladmin/Documents/DND/EOHI/eohi3/eohi3.csv",
  stringsAsFactors = FALSE,
  check.names = FALSE,
  na.strings = "NA"
 )
 between_vars <- c("perspective", "temporalDO")
 within_vars <- c(
  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
 )
 missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
 if (length(missing_vars) > 0) {
  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
 }
 anova_data <- df %>%
  select(pID, all_of(between_vars), all_of(within_vars)) %>%
  filter(
    !is.na(perspective), perspective != "",
    !is.na(temporalDO),  temporalDO  != ""
  )
 ```
 # Long format
 ```{r long-format}
 long_data <- anova_data %>%
  pivot_longer(
    cols = all_of(within_vars),
    names_to = "variable",
    values_to = "MEAN_SCORE"
  ) %>%
  mutate(
    time = case_when(
      grepl("^past_", variable) ~ "past",
      grepl("^fut_",  variable) ~ "fut",
      TRUE ~ NA_character_
    ),
    domain = case_when(
      grepl("_pref_MEAN$", variable) ~ "pref",
      grepl("_pers_MEAN$", variable) ~ "pers",
      grepl("_val_MEAN$",  variable) ~ "val",
      TRUE ~ NA_character_
    )
  ) %>%
  mutate(
    TIME       = factor(time,   levels = c("past", "fut")),
    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
    perspective = factor(perspective),
    temporalDO  = factor(temporalDO),
    pID         = factor(pID)
  ) %>%
  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
  filter(!is.na(MEAN_SCORE))
 ```
 # Descriptive statistics
 ```{r desc-stats}
 desc_stats <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n        = n(),
    mean     = round(mean(MEAN_SCORE), 5),
    variance = round(var(MEAN_SCORE), 5),
    sd       = round(sd(MEAN_SCORE), 5),
    median   = round(median(MEAN_SCORE), 5),
    q1       = round(quantile(MEAN_SCORE, 0.25), 5),
    q3       = round(quantile(MEAN_SCORE, 0.75), 5),
    min      = round(min(MEAN_SCORE), 5),
    max      = round(max(MEAN_SCORE), 5),
    .groups  = "drop"
  )
 # Show all rows and columns (no truncation)
 options(tibble.width = Inf)
 print(desc_stats, n = Inf)
 ```
 Interpretations:  
 1. Mean and median values are similar, indicating distribution is relatively symmetric and any skew is minimal. Any outliers are not extreme.  
 2. Highest to lowest group n size ratio is 1.14 (139/122). Acceptable ratio for ANOVA (under 1.5).  
 3. Highest to lowest overall group variance ratio is 1.67 (7.93/4.74). Acceptable ratio for ANOVA (under 4).  
   For the sake  of consistency w/ the other EHI studies, I also calculated Hartley's F-max ratio.  
   The conservative F-max critical value is 1.60, which is still higher than the highest observed F-max ratio of 1.53.  
 # Assumption checks
 ## Missing values
 ```{r missing}
 missing_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n_total     = n(),
    n_missing   = sum(is.na(MEAN_SCORE)),
    pct_missing = round(100 * n_missing / n_total, 2),
    .groups     = "drop"
  )
 print(missing_summary, n = Inf)
 ```
 No missing values. As expected. 
 ## Outliers
 ```{r outliers}
 outlier_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n          = n(),
    n_outliers = sum(abs(scale(MEAN_SCORE)) > 3),
    .groups    = "drop"
  )
 print(outlier_summary, n = Inf)
 ```
 No outliers present. Good. 
 ## Homogeneity of variance
 ```{r homogeneity}
 homogeneity_between <- long_data %>%
  group_by(TIME, DOMAIN) %>%
  rstatix::levene_test(MEAN_SCORE ~ perspective * temporalDO)
 print(homogeneity_between, n = Inf)
 ```
 Levene's test is sigifnicant for two cells: fut-pers and fut-val. 
 However, variance ratios and F-max tests show that any heteroscedasticity is mild. 
 ## Normality (within-subjects residuals)
 ```{r normality}
 resid_within <- long_data %>%
  group_by(pID) %>%
  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(resid = MEAN_SCORE - person_mean) %>%
  pull(resid)
 resid_within <- resid_within[!is.na(resid_within)]
 n_resid <- length(resid_within)
 if (n_resid < 3L) {
  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
 } else {
  resid_for_shapiro <- if (n_resid > 5000L) {
    set.seed(1L)
    sample(resid_within, 5000L)
  } else {
    resid_within
  }
  print(shapiro.test(resid_for_shapiro))
 }
 ```
 ### Q-Q plot
 ```{r qqplot, fig.height = 4}
 qqnorm(resid_within)
 qqline(resid_within)
 ```
 Shapiro-Wilk is significant but is sensitive to large sample size. 
 QQ plot shows that centre residuals are normally distributed, with some tail heaviness. 
 ANOVA is robust to violations of normality w/ large sample size. 
 Overall, ANOVA can proceed. 
 # Mixed ANOVA
 ```{r anova}
 aov_afex <- aov_ez(
  id      = "pID",
  dv      = "MEAN_SCORE",
  data    = long_data,
  between = c("perspective", "temporalDO"),
  within  = c("TIME", "DOMAIN"),
  type    = 3
 )
 cat("\n--- ANOVA Table (Type 3, uncorrected) ---\n")
 print(nice(aov_afex, correction = "none"))
 cat("\n--- ANOVA Table (Type 3, Greenhouse–Geisser correction) ---\n")
 print(nice(aov_afex, correction = "GG"))
 ```
 Mauchly's test of sphericity is sig for DOMAIN main effect and interaction (except w/ TIME). Use GG correction for these:  
 - 8                              DOMAIN 1.94, 1004.66  1.21     0.63 <.001, p = .529  
 ## 9                  perspective:DOMAIN 1.94, 1004.66  1.21 7.79 *** <.001, p <.001  
 ## 10                  temporalDO:DOMAIN 1.94, 1004.66  1.21     0.76 <.001, p = .466  
 ## 11      perspective:temporalDO:DOMAIN 1.94, 1004.66  1.21     0.17 <.001, p = .837  
 The following are significant main effects and interactions:  
 ## 15 perspective:temporalDO:TIME:DOMAIN 2, 1036  0.75   3.11 * <.001    .045  
 ## 13            perspective:TIME:DOMAIN 2, 1036  0.75   3.58 * <.001    .028  
 ## 9                  perspective:DOMAIN 1.94, 1004.66  1.21 7.79 *** <.001, p <.001 (GG corrected)   
 ## 6                     temporalDO:TIME  1, 518  1.86  9.81 ** <.001    .002  
 ## 7         perspective:temporalDO:TIME  1, 518  1.86  7.91 ** <.001    .005  
 ## 4                                TIME  1, 518  1.86 10.05 ** <.001    .002  
 # Mauchly and epsilon
 ```{r mauchly}
 anova_wide <- anova_data %>%
  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
  filter(if_all(all_of(within_vars), ~ !is.na(.)))
 response_matrix <- as.matrix(anova_wide[, within_vars])
 rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
 idata <- data.frame(
  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
 )
 rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
 rm_summary <- summary(rm_anova, multivariate = FALSE)
 if (!is.null(rm_summary$sphericity.tests)) {
  cat("\nMauchly's Test of Sphericity:\n")
  print(rm_summary$sphericity.tests)
 }
 if (!is.null(rm_summary$epsilon)) {
  cat("\nEpsilon (GG, HF):\n")
  print(rm_summary$epsilon)
 }
 ```
 # Post-hoc comparisons
 ## TIME (main effect)
 ```{r posthoc-TIME}
 emm_TIME <- emmeans(aov_afex, ~ TIME)
 print(pairs(emm_TIME, adjust = "bonferroni"))
 ```
 Pairwise comparison provide supprot for EHI effect. 
 ## temporalDO:TIME
 ```{r posthoc-temporalDO-TIME}
 emm_temporalDO_TIME <- emmeans(aov_afex, ~ TIME | temporalDO)
 print(pairs(emm_temporalDO_TIME, adjust = "bonferroni"))
 ```
 Contrast significant only for temporal display order of past first, then future. 
 When grouped by time instead of temporalDO, no contrasts are significant. 
 ## perspective:temporalDO:TIME
 ```{r posthoc-pt-TIME}
 emm_pt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO)
 print(pairs(emm_pt_TIME, adjust = "bonferroni"))
 ```
 EHI is significant only for self perspective and past first temporal display order.  
 When grouped by perspective or temporalDO instead of TIME, no contrasts are significant. 
 ## perspective:DOMAIN
 ```{r posthoc-perspective-DOMAIN}
 emm_perspective_DOMAIN <- emmeans(aov_afex, ~ perspective | DOMAIN)
 print(pairs(emm_perspective_DOMAIN, adjust = "bonferroni"))
 emm_perspective_DOMAIN_domain <- emmeans(aov_afex, ~ DOMAIN | perspective)
 print(pairs(emm_perspective_DOMAIN_domain, adjust = "bonferroni"))
 ```
 significance is driven by the change from preferences to values in the "other" perspective. 
 ## perspective:TIME:DOMAIN
 ```{r posthoc-pt-DOMAIN}
 emm_pt_TIME_domain <- emmeans(aov_afex, ~ TIME | perspective + DOMAIN)
 print(pairs(emm_pt_TIME_domain, adjust = "bonferroni"))
 ```
 EHI effects present for other-perspective in the preferences domain and for self-perspective in the values domain.  
 Estimate is higher in the self-perspective than in the other-perspective.  
 ```{r posthoc-pt-DOMAIN-domain}
 emm_pt_domain_domain <- emmeans(aov_afex, ~ DOMAIN | perspective + TIME)
 print(pairs(emm_pt_domain_domain, adjust = "bonferroni"))
 ```
 Significant contrasts are driven by domain changes from preferences to values in the self vs other perspectives, in the past-oriented questions.  
 Trends reverse depending on perspective, where values have higher estimates than preferences in the self-perspective, but lower estimates than preferences in the other-perspective.
 ## perspective:temporalDO:TIME:DOMAIN
 ```{r posthoc-ptt-TIME}
 emm_ptt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO + DOMAIN)
 print(pairs(emm_ptt_TIME, adjust = "bonferroni"))
 ```
 EHI effects are present for three contrasts:  
 - past - fut   0.2806 0.118 518   2.380  0.0177 (other-perspective, preferences domain, past-first temporal display order)  
 - past - fut   0.4358 0.138 518   3.156  0.0017 (self-perspective, personality domain, past-first temporal display order)  
 - past - fut   0.7276 0.141 518   5.169 <0.0001 (self-perspective, values domain, past-first temporal display order)  
 A reverse EHI effect is present for 1 contrast:  
 - past - fut  -0.2367 0.118 518  -2.001  0.0459 (self-personality, preferences domain, future-first temporal display order)
 ```{r posthoc-ptt-perspective}
 emm_ptt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME + DOMAIN)
 print(pairs(emm_ptt_perspective, adjust = "bonferroni"))
 ```
 1 significant contrast:  
 - other - self  -0.6972 0.314 518  -2.220  0.0268 (values domain, past-oriented questions, past-first temporal display order)
 not really of theoretical interest but speaks to the perspective:TIME:DOMAIN interaction. 
 no significant contrasts when grouped by temporalDO instead of TIME or perspective.
 ## Cohen's d (significant contrasts only)
 ```{r cohens-d-significant}
 d_data <- anova_data %>%
  mutate(
    past_mean = (past_pref_MEAN + past_pers_MEAN + past_val_MEAN) / 3,
    fut_mean  = (fut_pref_MEAN + fut_pers_MEAN + fut_val_MEAN) / 3,
    pref_mean = (past_pref_MEAN + fut_pref_MEAN) / 2,
    pers_mean = (past_pers_MEAN + fut_pers_MEAN) / 2,
    val_mean  = (past_val_MEAN + fut_val_MEAN) / 2
  )
 cohens_d_results <- tibble(
  contrast = character(),
  condition = character(),
  d = double()
 )
 # TIME main: past vs fut
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "overall",
    d = suppressMessages(effectsize::cohens_d(d_data$past_mean, d_data$fut_mean, paired = TRUE)$Cohens_d)
  )
 # temporalDO:TIME: past vs fut for temporalDO = past
 d_past_tdo <- d_data %>% filter(temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "temporalDO = past",
    d = suppressMessages(effectsize::cohens_d(d_past_tdo$past_mean, d_past_tdo$fut_mean, paired = TRUE)$Cohens_d)
  )
 # perspective:temporalDO:TIME: past vs fut for self, past temporalDO
 d_self_past <- d_data %>% filter(perspective == "self", temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, temporalDO = past",
    d = suppressMessages(effectsize::cohens_d(d_self_past$past_mean, d_self_past$fut_mean, paired = TRUE)$Cohens_d)
  )
 # perspective:DOMAIN: pref vs val for perspective = other
 d_other <- d_data %>% filter(perspective == "other")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "perspective = other",
    d = suppressMessages(effectsize::cohens_d(d_other$pref_mean, d_other$val_mean, paired = TRUE)$Cohens_d)
  )
 # perspective:TIME:DOMAIN - TIME: other, pref
 d_other_pref <- d_data %>% filter(perspective == "other")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "other, pref domain",
    d = suppressMessages(effectsize::cohens_d(d_other_pref$past_pref_MEAN, d_other_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
  )
 # perspective:TIME:DOMAIN - TIME: self, val
 d_self_val <- d_data %>% filter(perspective == "self")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, val domain",
    d = suppressMessages(effectsize::cohens_d(d_self_val$past_val_MEAN, d_self_val$fut_val_MEAN, paired = TRUE)$Cohens_d)
  )
 # perspective:TIME:DOMAIN - DOMAIN: other, past TIME
 d_other_past <- d_data %>% filter(perspective == "other")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "other, past TIME",
    d = suppressMessages(effectsize::cohens_d(d_other_past$past_pref_MEAN, d_other_past$past_val_MEAN, paired = TRUE)$Cohens_d)
  )
 # perspective:TIME:DOMAIN - DOMAIN: self, past TIME: pref - pers
 d_self_past_t <- d_data %>% filter(perspective == "self")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - pers)",
    condition = "self, past TIME",
    d = suppressMessages(effectsize::cohens_d(d_self_past_t$past_pref_MEAN, d_self_past_t$past_pers_MEAN, paired = TRUE)$Cohens_d)
  )
 # perspective:TIME:DOMAIN - DOMAIN: self, past TIME: pref - val
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "self, past TIME",
    d = suppressMessages(effectsize::cohens_d(d_self_past_t$past_pref_MEAN, d_self_past_t$past_val_MEAN, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: self, future temporalDO, pref (reverse EHI)
 d_self_fut_pref <- d_data %>% filter(perspective == "self", temporalDO == "future")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut) [reverse]",
    condition = "self, future temporalDO, pref domain",
    d = suppressMessages(effectsize::cohens_d(d_self_fut_pref$past_pref_MEAN, d_self_fut_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: other, past temporalDO, pref
 d_other_past_pref <- d_data %>% filter(perspective == "other", temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "other, past temporalDO, pref domain",
    d = suppressMessages(effectsize::cohens_d(d_other_past_pref$past_pref_MEAN, d_other_past_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: self, past temporalDO, pers
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, past temporalDO, pers domain",
    d = suppressMessages(effectsize::cohens_d(d_self_past$past_pers_MEAN, d_self_past$fut_pers_MEAN, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: self, past temporalDO, val
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, past temporalDO, val domain",
    d = suppressMessages(effectsize::cohens_d(d_self_past$past_val_MEAN, d_self_past$fut_val_MEAN, paired = TRUE)$Cohens_d)
  )
 # 4-way perspective: past temporalDO, past TIME, val domain (other - self, between-subjects)
 d_ptt_val <- d_data %>%
  filter(temporalDO == "past") %>%
  select(perspective, past_val_MEAN)
 d_other_ptt <- d_ptt_val %>% filter(perspective == "other") %>% pull(past_val_MEAN)
 d_self_ptt <- d_ptt_val %>% filter(perspective == "self") %>% pull(past_val_MEAN)
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "perspective (other - self)",
    condition = "past temporalDO, past TIME, val domain",
    d = suppressMessages(effectsize::cohens_d(d_other_ptt, d_self_ptt, paired = FALSE)$Cohens_d)
  )
 cohens_d_results %>%
  mutate(d = round(d, 3)) %>%
  print(n = Inf)
 ```
--- a/eohi3/knit/DA01_anova_DS.html
+++ b/eohi3/knit/DA01_anova_DS.html
--- a/eohi3/knit/DA02_anova_DGEN.html
+++ b/eohi3/knit/DA02_anova_DGEN.html
--- a/eohi3/knit/DA02_anova_DGEN.rmd
+++ b/eohi3/knit/DA02_anova_DGEN.rmd
@ -0,0 +1,434 @@
 ---
 title: "Mixed ANOVA - Domain General Vars"
 author: ""
 date: "`r Sys.Date()`"
 output:
  html_document:
    toc: true
    toc_float: true
    code_folding: hide
 ---
 ```{r setup, include = FALSE}
 knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
 ```
 # Setup
 ```{r libraries}
 library(tidyverse)
 library(rstatix)
 library(emmeans)
 library(effectsize)
 library(afex)
 library(car)
 options(scipen = 999)
 afex::set_sum_contrasts()
 ```
 # Data
 ```{r read-data}
 # Data file is in parent of knit/ (eohi3/eohi3.csv)
 df <- read.csv(
  "/home/ladmin/Documents/DND/EOHI/eohi3/eohi3.csv",
  stringsAsFactors = FALSE,
  check.names = FALSE,
  na.strings = "NA"
 )
 between_vars <- c("perspective", "temporalDO")
 within_vars <- c(
  "past_pref_DGEN", "past_pers_DGEN", "past_val_DGEN",
  "fut_pref_DGEN",  "fut_pers_DGEN",  "fut_val_DGEN"
 )
 missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
 if (length(missing_vars) > 0) {
  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
 }
 anova_data <- df %>%
  select(pID, all_of(between_vars), all_of(within_vars)) %>%
  filter(
    !is.na(perspective), perspective != "",
    !is.na(temporalDO),  temporalDO  != ""
  )
 ```
 # Long format
 ```{r long-format}
 long_data <- anova_data %>%
  pivot_longer(
    cols = all_of(within_vars),
    names_to = "variable",
    values_to = "DGEN_SCORE"
  ) %>%
  mutate(
    time = case_when(
      grepl("^past_", variable) ~ "past",
      grepl("^fut_",  variable) ~ "fut",
      TRUE ~ NA_character_
    ),
    domain = case_when(
      grepl("_pref_DGEN$", variable) ~ "pref",
      grepl("_pers_DGEN$", variable) ~ "pers",
      grepl("_val_DGEN$",  variable) ~ "val",
      TRUE ~ NA_character_
    )
  ) %>%
  mutate(
    TIME       = factor(time,   levels = c("past", "fut")),
    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
    perspective = factor(perspective),
    temporalDO  = factor(temporalDO),
    pID         = factor(pID)
  ) %>%
  select(pID, perspective, temporalDO, TIME, DOMAIN, DGEN_SCORE) %>%
  filter(!is.na(DGEN_SCORE))
 ```
 # Descriptive statistics
 ```{r desc-stats}
 desc_stats <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n        = n(),
    mean     = round(mean(DGEN_SCORE), 5),
    variance = round(var(DGEN_SCORE), 5),
    sd       = round(sd(DGEN_SCORE), 5),
    median   = round(median(DGEN_SCORE), 5),
    q1       = round(quantile(DGEN_SCORE, 0.25), 5),
    q3       = round(quantile(DGEN_SCORE, 0.75), 5),
    min      = round(min(DGEN_SCORE), 5),
    max      = round(max(DGEN_SCORE), 5),
    .groups  = "drop"
  )
 # Show all rows and columns (no truncation)
 options(tibble.width = Inf)
 print(desc_stats, n = Inf)
 ```
 Interpretation:  
 1. Mean and median values are similar w/ slightly more variation than in the domain specific anova.  
 2. Highest to lowest group n size ratio is 1.14 (139/122). Acceptable ratio for ANOVA (under 1.5).  
 3. Highest to lowest overall group variance ratio is 1.40 (9.32/6.65). Acceptable ratio for ANOVA (under 4).  
   For the sake  of consistency w/ the other EHI studies, I also calculated Hartley's F-max ratio.  
   The conservative F-max critical value is 1.60 (same as DS anova since number of groups and n sizes doesn't change), which is still higher than the highest observed F-max ratio of 1.28.  
 # Assumption checks
 ## Missing values
 ```{r missing}
 missing_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n_total     = n(),
    n_missing   = sum(is.na(DGEN_SCORE)),
    pct_missing = round(100 * n_missing / n_total, 2),
    .groups     = "drop"
  )
 print(missing_summary, n = Inf)
 ```
 No missing values. As expected. 
 ## Outliers
 ```{r outliers}
 outlier_summary <- long_data %>%
  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
  summarise(
    n          = n(),
    n_outliers = sum(abs(scale(DGEN_SCORE)) > 3),
    .groups    = "drop"
  )
 print(outlier_summary, n = Inf)
 ```
 No outliers present. Good. Same as DS anova.
 ## Homogeneity of variance
 ```{r homogeneity}
 homogeneity_between <- long_data %>%
  group_by(TIME, DOMAIN) %>%
  rstatix::levene_test(DGEN_SCORE ~ perspective * temporalDO)
 print(homogeneity_between, n = Inf)
 ```
 Levene's test is signiicant for 1 cell only: past-val. However, variance ratios and F-max tests show that any heteroscedasticity is mild.
 ## Normality (within-subjects residuals)
 ```{r normality}
 resid_within <- long_data %>%
  group_by(pID) %>%
  mutate(person_mean = mean(DGEN_SCORE, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(resid = DGEN_SCORE - person_mean) %>%
  pull(resid)
 resid_within <- resid_within[!is.na(resid_within)]
 n_resid <- length(resid_within)
 if (n_resid < 3L) {
  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
 } else {
  resid_for_shapiro <- if (n_resid > 5000L) {
    set.seed(1L)
    sample(resid_within, 5000L)
  } else {
    resid_within
  }
  print(shapiro.test(resid_for_shapiro))
 }
 ```
 ### Q-Q plot
 ```{r qqplot, fig.height = 4}
 qqnorm(resid_within)
 qqline(resid_within)
 ```
 Shapiro-Wilk test is significant but is sensitive to large sample size.  
 QQ plot shows that strict centre residuals are normally distributed, but there is some deviation from normality. 
 ANOVA is robust to violations of normality w/ large sample size. 
 Overall, ANOVA can proceed. 
 # Mixed ANOVA
 ```{r anova}
 aov_afex <- aov_ez(
  id      = "pID",
  dv      = "DGEN_SCORE",
  data    = long_data,
  between = c("perspective", "temporalDO"),
  within  = c("TIME", "DOMAIN"),
  type    = 3,
  anova_table = list(correction = "none")
 )
 print(aov_afex)
 ```
 Mauchly's test of sphericity is not significant. Using uncorrected values for interpretation and analysis.  
 Significant main effects and interactions:  
                                Effect        df   MSE        F   ges   p
 4                                TIME     1, 518  3.11  8.39 **  .001   .004  
 8                              DOMAIN    2, 1036  2.13 7.85 ***  .001   <.001  
 10                  temporalDO:DOMAIN    2, 1036  2.13  5.00 ** <.001   .007  
 15 perspective:temporalDO:TIME:DOMAIN    2, 1036  1.52   3.12 * <.001   .045  
 # Mauchly and epsilon
 ```{r mauchly}
 anova_wide <- anova_data %>%
  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
  filter(if_all(all_of(within_vars), ~ !is.na(.)))
 response_matrix <- as.matrix(anova_wide[, within_vars])
 rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
 idata <- data.frame(
  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
 )
 rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
 rm_summary <- summary(rm_anova, multivariate = FALSE)
 if (!is.null(rm_summary$sphericity.tests)) {
  print(rm_summary$sphericity.tests)
 }
 if (!is.null(rm_summary$epsilon)) {
  print(rm_summary$epsilon)
 }
 ```
 # Post hoc comparisons
 ## TIME (main effect)
 ```{r posthoc-TIME}
 emm_TIME <- emmeans(aov_afex, ~ TIME)
 print(pairs(emm_TIME, adjust = "none"))
 ```
 Supports presence of EHI effect.
 ## DOMAIN (main effect)
 ```{r posthoc-domain}
 emm_DOMAIN <- emmeans(aov_afex, ~ DOMAIN)
 print(pairs(emm_DOMAIN, adjust = "tukey"))
 ```
 Only preference to values contrast is significant.
 ## temporalDO:DOMAIN
 ```{r posthoc-temporaldo-domain}
 emmeans(aov_afex, pairwise ~ temporalDO | DOMAIN, adjust = "none")$contrasts
 emmeans(aov_afex, pairwise ~ DOMAIN | temporalDO, adjust = "tukey")$contrasts
 ```
 When grouped by domain, no contrasts are significant.
 When grouped by temporalDO, some contrasts are significant: 
 Future-first temporal display order:  
 contrast    estimate     SE  df t.ratio p.value  
 pref - pers  0.25065 0.0892 518   2.810  0.0142
 Past-first temporal display order:  
 contrast    estimate     SE  df t.ratio p.value  
 pref - val   0.33129 0.0895 518   3.702  0.0007  
 pers - val   0.32478 0.0921 518   3.527  0.0013
 ## perspective:temporalDO:TIME:DOMAIN
 ### contrasts for TIME grouped by perspective, temporalDO, and DOMAIN
 ```{r posthoc-fourway}
 emm_fourway <- emmeans(aov_afex, pairwise ~ TIME | perspective * temporalDO * DOMAIN, adjust = "tukey")
 print(emm_fourway$contrasts)
 ```
 Significant contrasts: 
 contrast    estimate     SE  df t.ratio p.value  
 past - fut   0.5285 0.179 518   2.957  0.0032 (self-perspective, personality domain, past-first temporal display order)  
 past - fut   0.5366 0.187 518   2.863  0.0044 (self-perspective, values domain, past-first temporal display order)
 ### contrasts for DOMAIN grouped by perspective, TIME, and temporalDO
 ```{r posthoc-fourway2}
 emm_fourway2 <- emmeans(aov_afex, pairwise ~ DOMAIN | perspective * TIME * temporalDO, adjust = "tukey")
 print(emm_fourway2$contrasts)
 ```
 Significant contrasts:  
 contrast    estimate     SE  df t.ratio p.value  
 pref - val    0.6259 0.166 518   3.778  0.0005 (other-perspective, past-directed questions, past-first temporal display order)  
 pers - val    0.4892 0.160 518   3.056  0.0066 (other-perspective, past-directed questions, past-first temporal display order)  
 pref - val    0.4309 0.168 518   2.559  0.0290 (self-perspective, future-directed questions, past-first temporal display order)
 ## Cohen's d (significant contrasts only)
 ```{r cohens-d-significant}
 d_data <- anova_data %>%
  mutate(
    past_mean = (past_pref_DGEN + past_pers_DGEN + past_val_DGEN) / 3,
    fut_mean  = (fut_pref_DGEN + fut_pers_DGEN + fut_val_DGEN) / 3,
    pref_mean = (past_pref_DGEN + fut_pref_DGEN) / 2,
    pers_mean = (past_pers_DGEN + fut_pers_DGEN) / 2,
    val_mean  = (past_val_DGEN + fut_val_DGEN) / 2
  )
 cohens_d_results <- tibble(
  contrast = character(),
  condition = character(),
  d = double()
 )
 # TIME main: past vs fut
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "overall",
    d = suppressMessages(effectsize::cohens_d(d_data$past_mean, d_data$fut_mean, paired = TRUE)$Cohens_d)
  )
 # DOMAIN main: pref vs val
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "overall",
    d = suppressMessages(effectsize::cohens_d(d_data$pref_mean, d_data$val_mean, paired = TRUE)$Cohens_d)
  )
 # temporalDO:DOMAIN - future: pref vs pers
 d_fut <- d_data %>% filter(temporalDO == "future")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - pers)",
    condition = "temporalDO = future",
    d = suppressMessages(effectsize::cohens_d(d_fut$pref_mean, d_fut$pers_mean, paired = TRUE)$Cohens_d)
  )
 # temporalDO:DOMAIN - past: pref vs val, pers vs val
 d_past <- d_data %>% filter(temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "temporalDO = past",
    d = suppressMessages(effectsize::cohens_d(d_past$pref_mean, d_past$val_mean, paired = TRUE)$Cohens_d)
  ) %>%
  add_row(
    contrast = "DOMAIN (pers - val)",
    condition = "temporalDO = past",
    d = suppressMessages(effectsize::cohens_d(d_past$pers_mean, d_past$val_mean, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: self, past temporalDO, pers
 d_self_past <- d_data %>% filter(perspective == "self", temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, past temporalDO, pers domain",
    d = suppressMessages(effectsize::cohens_d(d_self_past$past_pers_DGEN, d_self_past$fut_pers_DGEN, paired = TRUE)$Cohens_d)
  )
 # 4-way TIME: self, past temporalDO, val
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "TIME (past - fut)",
    condition = "self, past temporalDO, val domain",
    d = suppressMessages(effectsize::cohens_d(d_self_past$past_val_DGEN, d_self_past$fut_val_DGEN, paired = TRUE)$Cohens_d)
  )
 # 4-way DOMAIN: other, past TIME, past temporalDO - pref vs val
 d_other_past_tpast <- d_data %>% filter(perspective == "other", temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "other, past TIME, past temporalDO",
    d = suppressMessages(effectsize::cohens_d(d_other_past_tpast$past_pref_DGEN, d_other_past_tpast$past_val_DGEN, paired = TRUE)$Cohens_d)
  )
 # 4-way DOMAIN: other, past TIME, past temporalDO - pers vs val
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pers - val)",
    condition = "other, past TIME, past temporalDO",
    d = suppressMessages(effectsize::cohens_d(d_other_past_tpast$past_pers_DGEN, d_other_past_tpast$past_val_DGEN, paired = TRUE)$Cohens_d)
  )
 # 4-way DOMAIN: self, fut TIME, past temporalDO - pref vs val
 d_self_fut_tpast <- d_data %>% filter(perspective == "self", temporalDO == "past")
 cohens_d_results <- cohens_d_results %>%
  add_row(
    contrast = "DOMAIN (pref - val)",
    condition = "self, fut TIME, past temporalDO",
    d = suppressMessages(effectsize::cohens_d(d_self_fut_tpast$fut_pref_DGEN, d_self_fut_tpast$fut_val_DGEN, paired = TRUE)$Cohens_d)
  )
 cohens_d_results %>%
  mutate(d = round(d, 3)) %>%
  print(n = Inf)
 ```  
 Size	d	Interpretation  
 Small	0.2	Weak effect  
 Medium	0.5	Moderate effect  
 Large	0.8	Strong effect  
--- a/eohi3/test-DEC29/eohi3-test.csv
+++ b/eohi3/test-DEC29/eohi3-test.csv
--- a/review/Diener_lifeScale.pdf
+++ b/review/Diener_lifeScale.pdf
--- a/review/brietzke_ehi2.pdf
+++ b/review/brietzke_ehi2.pdf
--- a/review/carmen_ehi2.pdf
+++ b/review/carmen_ehi2.pdf
--- a/review/fleming_pro-retro.pdf
+++ b/review/fleming_pro-retro.pdf
--- a/review/guo_ehi2.pdf
+++ b/review/guo_ehi2.pdf
--- a/review/gutral_ehi2.pdf
+++ b/review/gutral_ehi2.pdf
--- a/review/haas_ehi1.pdf
+++ b/review/haas_ehi1.pdf
--- a/review/haddock_futResearch.pdf
+++ b/review/haddock_futResearch.pdf
--- a/review/harris_ehi1.pdf
+++ b/review/harris_ehi1.pdf
--- a/review/hershfield_FUTforecast.pdf
+++ b/review/hershfield_FUTforecast.pdf
--- a/review/lechner_valueScale.pdf
+++ b/review/lechner_valueScale.pdf
--- a/review/pdf_to_txt.py
+++ b/review/pdf_to_txt.py
@ -0,0 +1,128 @@
 #!/home/ladmin/miniconda3/envs/nlp/bin/python
 """
 PDF to Text Converter
 Converts PDF files to plain text files.
 Usage:
    python pdf_to_txt.py <input.pdf>                    # Creates input.txt
    python pdf_to_txt.py <input.pdf> <output.txt>       # Custom output name
    python pdf_to_txt.py --all                           # Convert all PDFs in current directory
 Requirements:
    pip install pypdf
 """
 import sys
 import os
 from pathlib import Path
 try:
    from pypdf import PdfReader
 except ImportError:
    print("Error: pypdf library not found.")
    print("Please install it with: pip install pypdf")
    sys.exit(1)
 def pdf_to_text(pdf_path, output_path=None):
    """
    Convert a PDF file to a text file.
    Args:
        pdf_path: Path to the PDF file
        output_path: Path to the output text file (optional)
    Returns:
        True if successful, False otherwise
    """
    try:
        # Convert to Path objects
        pdf_path = Path(pdf_path)
        if not pdf_path.exists():
            print(f"Error: File not found: {pdf_path}")
            return False
        # Determine output path
        if output_path is None:
            output_path = pdf_path.with_suffix('.txt')
        else:
            output_path = Path(output_path)
        print(f"Converting: {pdf_path.name}")
        # Read the PDF
        reader = PdfReader(str(pdf_path))
        # Extract text from all pages
        text_content = []
        for i, page in enumerate(reader.pages, 1):
            text = page.extract_text()
            if text:
                text_content.append(f"--- Page {i} ---\n{text}\n")
        # Write to text file
        full_text = "\n".join(text_content)
        output_path.write_text(full_text, encoding='utf-8')
        print(f"✓ Created: {output_path.name} ({len(reader.pages)} pages, {len(full_text):,} characters)")
        return True
    except Exception as e:
        print(f"✗ Error processing {pdf_path.name}: {str(e)}")
        return False
 def convert_all_pdfs():
    """Convert all PDF files in the current directory to text files."""
    current_dir = Path.cwd()
    pdf_files = list(current_dir.glob("*.pdf"))
    if not pdf_files:
        print("No PDF files found in the current directory.")
        return
    print(f"Found {len(pdf_files)} PDF file(s) to convert.\n")
    successful = 0
    failed = 0
    for pdf_file in pdf_files:
        if pdf_to_text(pdf_file):
            successful += 1
        else:
            failed += 1
    print(f"\n{'='*60}")
    print(f"Conversion complete: {successful} successful, {failed} failed")
 def main():
    if len(sys.argv) < 2:
        print(__doc__)
        sys.exit(1)
    # Convert all PDFs in directory
    if sys.argv[1] == "--all":
        convert_all_pdfs()
    # Convert single PDF
    elif len(sys.argv) == 2:
        pdf_path = sys.argv[1]
        pdf_to_text(pdf_path)
    # Convert single PDF with custom output name
    elif len(sys.argv) == 3:
        pdf_path = sys.argv[1]
        output_path = sys.argv[2]
        pdf_to_text(pdf_path, output_path)
    else:
        print("Error: Too many arguments")
        print(__doc__)
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/review/quoidbach.sm.pdf
+++ b/review/quoidbach.sm.pdf
--- a/review/quoidbach_ehi1.pdf
+++ b/review/quoidbach_ehi1.pdf
--- a/review/reiff_ehi2.pdf
+++ b/review/reiff_ehi2.pdf
--- a/review/rutt_ehi2.pdf
+++ b/review/rutt_ehi2.pdf
--- a/review/sachi_ehi1.pdf
+++ b/review/sachi_ehi1.pdf
--- a/review/siedlecka_pro-retro.pdf
+++ b/review/siedlecka_pro-retro.pdf
--- a/review/vanRyzin_ehi1.pdf
+++ b/review/vanRyzin_ehi1.pdf
--- a/review/verner_FUTforecast.pdf
+++ b/review/verner_FUTforecast.pdf
--- a/review/wilson_FUTforecast.pdf
+++ b/review/wilson_FUTforecast.pdf
--- a/review/yue_ehi2.pdf
+++ b/review/yue_ehi2.pdf
--- a/manuscript/EOHI
+++ b/manuscript/EOHI
Author	SHA1	Message	Date
Irina Levit	d0d0e0f8d4	Add DA00 F-max, DA01 ANOVA script, knit Rmds; ignore lock files in .gitignore (#6 ) Reviewed-on: #6 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>	2026-02-03 15:19:27 -05:00
ira	42ea52d859	Merge pull request 'eohi3 var creation and recode scripts' (#5 ) from workB into master Reviewed-on: #5	2026-02-02 10:26:22 -05:00
Irina Levit	094201eeb4	eohi3 var creation and recode scripts	2026-01-28 17:25:10 -05:00
Irina Levit	ba54687da2	eohi3-updates (#3 ) updating eohi folder w/ third eohi exp. Reviewed-on: #3 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>	2026-01-26 16:30:09 -05:00