Add DA00 F-max, DA01 ANOVA script, knit Rmds; ignore lock files in .gitignore (#6 )

Reviewed-on: #6 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>
Merge pull request 'eohi3 var creation and recode scripts' (#5 ) from workB into master
2026-02-03 15:19:27 -05:00 · 2026-02-02 10:26:22 -05:00 · 2026-01-28 17:25:10 -05:00 · 2026-01-26 16:30:09 -05:00
50 changed files with 12702 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1 +1,4 @@
-.history/
+.history/
+eohi3_2.csv
+*~
+.~lock*
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@ -0,0 +1,55 @@
+{
+    // Use IntelliSense to learn about possible attributes.
+    // Hover to view descriptions of existing attributes.
+    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+    "version": "0.2.0",
+    "configurations": [
+        {
+            "type": "R-Debugger",
+            "name": "Launch R-Workspace",
+            "request": "launch",
+            "debugMode": "workspace",
+            "workingDirectory": "${workspaceFolder}",
+            "splitOverwrittenOutput": true
+        },
+        {
+            "type": "R-Debugger",
+            "name": "Debug R-File",
+            "request": "launch",
+            "debugMode": "file",
+            "workingDirectory": "${workspaceFolder}",
+            "file": "${file}",
+            "splitOverwrittenOutput": true,
+            "stopOnEntry": false
+        },
+        {
+            "type": "R-Debugger",
+            "name": "Debug R-Function",
+            "request": "launch",
+            "debugMode": "function",
+            "workingDirectory": "${workspaceFolder}",
+            "file": "${file}",
+            "mainFunction": "main",
+            "allowGlobalDebugging": false,
+            "splitOverwrittenOutput": true
+        },
+        {
+            "type": "R-Debugger",
+            "name": "Debug R-Package",
+            "request": "launch",
+            "debugMode": "workspace",
+            "workingDirectory": "${workspaceFolder}",
+            "includePackageScopes": true,
+            "loadPackages": [
+                "."
+            ],
+            "splitOverwrittenOutput": true
+        },
+        {
+            "type": "R-Debugger",
+            "request": "attach",
+            "name": "Attach to R process",
+            "splitOverwrittenOutput": true
+        }
+    ]
+}
--- a/creation.md
+++ b/creation.md
@ -0,0 +1,315 @@
+# Variable Creation Scripts Documentation
+
+This document describes the data processing scripts used to create derived variables in the EOHI3 dataset. Each script performs specific transformations and should be run in sequence.
+
+---
+
+## datap 04 - combined vars.r
+
+### Goal
+Combine self-perspective and other-perspective variables into single columns. For each row, values exist in either the self-perspective variables OR the other-perspective variables, never both.
+
+### Transformations
+
+#### Past Variables (p5 = past)
+Combines `self[VAL/PERS/PREF]_p5_[string]` and `other[VAL/PERS/PREF]_p5_[string]` into `past_[val/pers/pref]_[string]`.
+
+**Source Variables:**
+- **Values (VAL)**: `selfVAL_p5_trad`, `otherVAL_p5_trad`, `selfVAL_p5_autonomy`, `otherVAL_p5_autonomy`, `selfVAL_p5_personal`, `otherVAL_p5_personal`, `selfVAL_p5_justice`, `otherVAL_p5_justice`, `selfVAL_p5_close`, `otherVAL_p5_close`, `selfVAL_p5_connect`, `otherVAL_p5_connect`, `selfVAL_p5_dgen`, `otherVAL_p5_dgen`
+- **Personality (PERS)**: `selfPERS_p5_open`, `otherPESR_p5_open` (note: typo in source data), `selfPERS_p5_goal`, `otherPERS_p5_goal`, `selfPERS_p5_social`, `otherPERS_p5_social`, `selfPERS_p5_agree`, `otherPERS_p5_agree`, `selfPERS_p5_stress`, `otherPERS_p5_stress`, `selfPERS_p5_dgen`, `otherPERS_p5_dgen`
+- **Preferences (PREF)**: `selfPREF_p5_hobbies`, `otherPREF_p5_hobbies`, `selfPREF_p5_music`, `otherPREF_p5_music`, `selfPREF_p5_dress`, `otherPREF_p5_dress`, `selfPREF_p5_exer`, `otherPREF_p5_exer`, `selfPREF_p5_food`, `otherPREF_p5_food`, `selfPREF_p5_friends`, `otherPREF_p5_friends`, `selfPREF_p5_dgen`, `otherPREF_p5_dgen`
+
+**Target Variables:**
+- `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect`, `past_val_DGEN`
+- `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress`, `past_pers_DGEN`
+- `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends`, `past_pref_DGEN`
+
+#### Future Variables (f5 = future)
+Combines `self[VAL/PERS/PREF]_f5_[string]` and `other[VAL/PERS/PREF]_f5_[string]` into `fut_[val/pers/pref]_[string]`.
+
+**Source Variables:**
+- **Values (VAL)**: `selfVAL_f5_trad`, `otherVAL_f5_trad`, `selfVAL_f5_autonomy`, `otherVAL_f5_autonomy`, `selfVAL_f5_personal`, `otherVAL_f5_personal`, `selfVAL_f5_justice`, `otherVAL_f5_justice`, `selfVAL_f5_close`, `otherVAL_f5_close`, `selfVAL_f5_connect`, `otherVAL_f5_connect`, `selfVAL_f5_dgen`, `otherVAL_f5_dgen`
+- **Personality (PERS)**: `selfPERS_f5_open`, `otherPERS_f5_open`, `selfPERS_f5_goal`, `otherPERS_f5_goal`, `selfPERS_f5_social`, `otherPERS_f5_social`, `selfPERS_f5_agree`, `otherPERS_f5_agree`, `selfPERS_f5_stress`, `otherPERS_f5_stress`, `selfPERS_f5_dgen`, `otherPERS_f5_dgen`
+- **Preferences (PREF)**: `selfPREF_f5_hobbies`, `otherPREF_f5_hobbies`, `selfPREF_f5_music`, `otherPREF_f5_music`, `selfPREF_f5_dress`, `otherPREF_f5_dress`, `selfPREF_f5_exer`, `otherPREF_f5_exer`, `selfPREF_f5_food`, `otherPREF_f5_food`, `selfPREF_f5_friends`, `otherPREF_f5_friends`, `selfPREF_f5_dgen`, `otherPREF_f5_dgen`
+
+**Target Variables:**
+- `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect`, `fut_val_DGEN`
+- `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress`, `fut_pers_DGEN`
+- `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends`, `fut_pref_DGEN`
+
+### Logic
+- Uses self value if present (not empty/NA), otherwise uses other value
+- If both are empty/NA, result is NA
+- Assumes mutual exclusivity: each row has values in either self OR other, never both
+
+### Validation Checks
+1. **Conflict Check**: Verifies no rows have values in both self and other for the same variable
+2. **Coverage Check**: Verifies combined columns have expected number of non-empty values (self_count + other_count = combined_count)
+3. **Sample Row Check**: Shows examples of how values were combined
+
+### Output
+- Updates existing target columns in `eohi3.csv`
+- Creates backup `eohi3_2.csv` before processing
+
+---
+
+## datap 05 - ehi vars.r
+
+### Goal
+Calculate EHI (End of History Illusion) variables as the difference between past and future variables. Each EHI variable represents the change from past to future perspective.
+
+### Transformations
+
+**Calculation Formula:** `ehi_[pref/pers/val]_[string] = past_[pref/pers/val]_[string] - fut_[pref/pers/val]_[string]`
+
+#### EHI Variables Created
+
+**EHI Preferences:**
+- `ehi_pref_hobbies` = `past_pref_hobbies` - `fut_pref_hobbies`
+- `ehi_pref_music` = `past_pref_music` - `fut_pref_music`
+- `ehi_pref_dress` = `past_pref_dress` - `fut_pref_dress`
+- `ehi_pref_exer` = `past_pref_exer` - `fut_pref_exer`
+- `ehi_pref_food` = `past_pref_food` - `fut_pref_food`
+- `ehi_pref_friends` = `past_pref_friends` - `fut_pref_friends`
+- `ehi_pref_DGEN` = `past_pref_DGEN` - `fut_pref_DGEN`
+
+**EHI Personality:**
+- `ehi_pers_open` = `past_pers_open` - `fut_pers_open`
+- `ehi_pers_goal` = `past_pers_goal` - `fut_pers_goal`
+- `ehi_pers_social` = `past_pers_social` - `fut_pers_social`
+- `ehi_pers_agree` = `past_pers_agree` - `fut_pers_agree`
+- `ehi_pers_stress` = `past_pers_stress` - `fut_pers_stress`
+- `ehi_pers_DGEN` = `past_pers_DGEN` - `fut_pers_DGEN`
+
+**EHI Values:**
+- `ehi_val_trad` = `past_val_trad` - `fut_val_trad`
+- `ehi_val_autonomy` = `past_val_autonomy` - `fut_val_autonomy`
+- `ehi_val_personal` = `past_val_personal` - `fut_val_personal`
+- `ehi_val_justice` = `past_val_justice` - `fut_val_justice`
+- `ehi_val_close` = `past_val_close` - `fut_val_close`
+- `ehi_val_connect` = `past_val_connect` - `fut_val_connect`
+- `ehi_val_DGEN` = `past_val_DGEN` - `fut_val_DGEN`
+
+### Logic
+- Converts source variables to numeric (handling empty strings and NA)
+- Calculates difference: past - future
+- Result can be positive (past > future), negative (past < future), or zero (past = future)
+
+### Validation Checks
+1. **Variable Existence**: Checks that all target variables exist before processing
+2. **Source Variable Check**: Verifies source columns exist
+3. **Random Row Validation**: Checks 5 random rows showing source values, target value, expected calculation, and match status
+
+### Output
+- Updates existing target columns in `eohi3.csv`
+- Creates backup `eohi3_2.csv` before processing
+
+---
+
+## datap 06 - mean vars.r
+
+### Goal
+Calculate mean variables for various scales by averaging multiple related variables. Creates both domain-specific means and overall composite means.
+
+### Transformations
+
+#### Domain-Specific Means
+
+**Past Preferences MEAN:**
+- **Source Variables**: `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends` (6 variables)
+- **Target Variable**: `past_pref_MEAN`
+
+**Future Preferences MEAN:**
+- **Source Variables**: `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends` (6 variables)
+- **Target Variable**: `fut_pref_MEAN`
+
+**Past Personality MEAN:**
+- **Source Variables**: `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress` (5 variables)
+- **Target Variable**: `past_pers_MEAN`
+
+**Future Personality MEAN:**
+- **Source Variables**: `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress` (5 variables)
+- **Target Variable**: `fut_pers_MEAN`
+
+**Past Values MEAN:**
+- **Source Variables**: `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect` (6 variables)
+- **Target Variable**: `past_val_MEAN`
+
+**Future Values MEAN:**
+- **Source Variables**: `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect` (6 variables)
+- **Target Variable**: `fut_val_MEAN`
+
+**EHI Preferences MEAN:**
+- **Source Variables**: `ehi_pref_hobbies`, `ehi_pref_music`, `ehi_pref_dress`, `ehi_pref_exer`, `ehi_pref_food`, `ehi_pref_friends` (6 variables)
+- **Target Variable**: `ehi_pref_MEAN`
+
+**EHI Personality MEAN:**
+- **Source Variables**: `ehi_pers_open`, `ehi_pers_goal`, `ehi_pers_social`, `ehi_pers_agree`, `ehi_pers_stress` (5 variables)
+- **Target Variable**: `ehi_pers_MEAN`
+
+**EHI Values MEAN:**
+- **Source Variables**: `ehi_val_trad`, `ehi_val_autonomy`, `ehi_val_personal`, `ehi_val_justice`, `ehi_val_close`, `ehi_val_connect` (6 variables)
+- **Target Variable**: `ehi_val_MEAN`
+
+#### Composite Means
+
+**EHI Domain-Specific Mean:**
+- **Source Variables**: `ehi_pref_MEAN`, `ehi_pers_MEAN`, `ehi_val_MEAN` (3 variables)
+- **Target Variable**: `ehiDS_mean`
+
+**EHI Domain-General Mean:**
+- **Source Variables**: `ehi_pref_DGEN`, `ehi_pers_DGEN`, `ehi_val_DGEN` (3 variables)
+- **Target Variable**: `ehiDGEN_mean`
+
+### Logic
+- Converts source variables to numeric (handling empty strings and NA)
+- Calculates row means using `rowMeans()` with `na.rm = TRUE` (ignores NA values)
+- Each mean represents the average of non-missing values for that row
+
+### Validation Checks
+1. **Variable Existence**: Uses `setdiff()` to check source and target variables exist
+2. **Random Row Validation**: Checks 5 random rows showing source variable names, source values, target value, expected mean calculation, and match status
+
+### Output
+- Updates existing target columns in `eohi3.csv`
+- Creates backup `eohi3_2.csv` before processing
+
+---
+
+## datap 07 - scales and recodes.r
+
+### Goal
+Recode various variables and calculate scale scores. Includes recoding categorical variables, processing cognitive reflection test (CRT) items, calculating ICAR scores, and recoding demographic variables.
+
+### Transformations
+
+#### 1. Recode other_length2 → other_length
+**Source Variable**: `other_length2`  
+**Target Variable**: `other_length`
+
+**Recoding Rules:**
+- Values 5-9 → "5-9"
+- Values 10-14 → "10-14"
+- Values 15-19 → "15-19"
+- Value "20+" → "20+" (handled as special case)
+- Empty strings → preserved as empty string (not NA)
+- NA → NA
+
+#### 2. Recode other_like2 → other_like
+**Source Variable**: `other_like2`  
+**Target Variable**: `other_like`
+
+**Recoding Rules:**
+- "Dislike a great deal" → "-2"
+- "Dislike somewhat" → "-1"
+- "Neither like nor dislike" → "0"
+- "Like somewhat" → "1"
+- "Like a great deal" → "2"
+- Empty strings → preserved as empty string (not NA)
+- NA → NA
+
+#### 3. Calculate aot_total (Actively Open-Minded Thinking)
+**Source Variables**: `aot01`, `aot02`, `aot03`, `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`, `aot08`  
+**Target Variable**: `aot_total`
+
+**Calculation:**
+1. Reverse code `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r` by multiplying by -1
+2. Calculate mean of all 8 variables: 4 original (`aot01`, `aot02`, `aot03`, `aot08`) + 4 reversed (`aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`)
+
+#### 4. Process CRT Questions → crt_correct and crt_int
+**Source Variables**: `crt01`, `crt02`, `crt03`  
+**Target Variables**: `crt_correct`, `crt_int`
+
+**CRT01:**
+- "5 cents" → `crt_correct` = 1, `crt_int` = 0
+- "10 cents" → `crt_correct` = 0, `crt_int` = 1
+- Other values → `crt_correct` = 0, `crt_int` = 0
+
+**CRT02:**
+- "5 minutes" → `crt_correct` += 1, `crt_int` unchanged
+- "100 minutes" → `crt_correct` unchanged, `crt_int` += 1
+- Other values → both unchanged
+
+**CRT03:**
+- "47 days" → `crt_correct` += 1, `crt_int` unchanged
+- "24 days" → `crt_correct` unchanged, `crt_int` += 1
+- Other values → both unchanged
+
+**Note**: `crt_correct` and `crt_int` are cumulative across all 3 questions (range: 0-3)
+
+#### 5. Calculate icar_verbal
+**Source Variables**: `verbal01`, `verbal02`, `verbal03`, `verbal04`, `verbal05`  
+**Target Variable**: `icar_verbal`
+
+**Correct Answers:**
+- `verbal01` = "5"
+- `verbal02` = "8"
+- `verbal03` = "It's impossible to tell"
+- `verbal04` = "47"
+- `verbal05` = "Sunday"
+
+**Calculation**: Proportion correct = (number of correct responses) / 5
+
+#### 6. Calculate icar_matrix
+**Source Variables**: `matrix01`, `matrix02`, `matrix03`, `matrix04`, `matrix05`  
+**Target Variable**: `icar_matrix`
+
+**Correct Answers:**
+- `matrix01` = "D"
+- `matrix02` = "E"
+- `matrix03` = "B"
+- `matrix04` = "B"
+- `matrix05` = "D"
+
+**Calculation**: Proportion correct = (number of correct responses) / 5
+
+#### 7. Calculate icar_total
+**Source Variables**: `verbal01`-`verbal05`, `matrix01`-`matrix05` (10 variables total)  
+**Target Variable**: `icar_total`
+
+**Calculation**: Proportion correct across all 10 items = (number of correct responses) / 10
+
+#### 8. Recode demo_sex → sex
+**Source Variable**: `demo_sex`  
+**Target Variable**: `sex`
+
+**Recoding Rules:**
+- "Male" (case-insensitive) → 0
+- "Female" (case-insensitive) → 1
+- Other values (e.g., "Prefer not to say") → 2
+- Empty/NA → NA
+
+#### 9. Recode demo_edu → education
+**Source Variable**: `demo_edu`  
+**Target Variable**: `education` (ordered factor)
+
+**Recoding Rules:**
+- "High School (or equivalent)" or "Trade School" → "HS_TS"
+- "College Diploma/Certificate" or "University - Undergraduate" → "C_Ug"
+- "University - Graduate (Masters)" or "University - PhD" or "Professional Degree (ex. JD/MD)" → "grad_prof"
+- Empty/NA → NA
+
+**Factor Levels**: `HS_TS` < `C_Ug` < `grad_prof` (ordered)
+
+### Validation Checks
+Each transformation includes:
+1. **Variable Existence Check**: Verifies source and target variables exist
+2. **Value Check**: Verifies expected values exist in source variables (warns about unexpected values)
+3. **Post-Processing Verification**: Checks 5 random rows showing source values, target values, and calculations
+
+### Output
+- Updates existing target columns in `eohi3.csv`
+- Creates backup `eohi3_2.csv` before processing
+
+---
+
+## Script Execution Order
+
+These scripts should be run in the following order:
+
+1. **datap 04 - combined vars.r** - Combines self/other variables into past/future variables
+2. **datap 05 - ehi vars.r** - Calculates EHI variables from past/future differences
+3. **datap 06 - mean vars.r** - Calculates mean variables for scales
+4. **datap 07 - scales and recodes.r** - Recodes variables and calculates scale scores
+
+Each script creates a backup (`eohi3_2.csv`) before processing and includes validation checks to ensure transformations are performed correctly.
--- a/eohi3/DA00_fmaxVALS.r
+++ b/eohi3/DA00_fmaxVALS.r
@ -0,0 +1,149 @@
+library(SuppDists)
+library(dplyr)
+library(tidyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+between_vars <- c("perspective", "temporalDO")
+within_vars_MEAN <- c(
+  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
+  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
+)
+within_vars_DGEN <- c(
+  "past_pref_DGEN", "past_pers_DGEN", "past_val_DGEN",
+  "fut_pref_DGEN",  "fut_pers_DGEN",  "fut_val_DGEN"
+)
+
+df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+anova_data_MEAN <- df %>%
+  select(pID, all_of(between_vars), all_of(within_vars_MEAN)) %>%
+  filter(!is.na(perspective), perspective != "",
+         !is.na(temporalDO), temporalDO != "")
+
+long_data_MEAN <- anova_data_MEAN %>%
+  pivot_longer(
+    cols = all_of(within_vars_MEAN),
+    names_to = "variable",
+    values_to = "MEAN_SCORE"
+  ) %>%
+  mutate(
+    time   = ifelse(grepl("^past_", variable), "past", "fut"),
+    domain = case_when(
+      grepl("_pref_MEAN$", variable) ~ "pref",
+      grepl("_pers_MEAN$", variable) ~ "pers",
+      grepl("_val_MEAN$",  variable) ~ "val",
+      TRUE ~ NA_character_
+    )
+  ) %>%
+  mutate(
+    TIME       = factor(time,   levels = c("past", "fut")),
+    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
+    perspective = factor(perspective),
+    temporalDO  = factor(temporalDO)
+  ) %>%
+  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
+  filter(!is.na(MEAN_SCORE))
+
+cell_vars_MEAN <- long_data_MEAN %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n        = n(),
+    variance = var(MEAN_SCORE, na.rm = TRUE),
+    .groups  = "drop"
+  )
+
+fmax_by_cell_MEAN <- cell_vars_MEAN %>%
+  group_by(TIME, DOMAIN) %>%
+  summarise(
+    Fmax_observed = max(variance, na.rm = TRUE) / min(variance, na.rm = TRUE),
+    df_min        = min(n) - 1L,
+    .groups       = "drop"
+  )
+
+k <- 4
+
+fmax_table_MEAN <- fmax_by_cell_MEAN %>%
+  rowwise() %>%
+  mutate(
+    alpha_0.05 = SuppDists::qmaxFratio(0.95, df = df_min, k = k),
+    alpha_0.01 = SuppDists::qmaxFratio(0.99, df = df_min, k = k)
+  ) %>%
+  ungroup() %>%
+  mutate(
+    Fmax_observed = round(Fmax_observed, 4),
+    alpha_0.05    = round(alpha_0.05, 4),
+    alpha_0.01    = round(alpha_0.01, 4)
+  ) %>%
+  select(TIME, DOMAIN, Fmax_observed, alpha_0.05, alpha_0.01)
+
+# ---- MEAN: Print observed Hartley ratios ----
+cat("\n--- Hartley ratios (MEAN) ---\n")
+fmax_table_MEAN %>%
+  mutate(across(where(is.numeric), ~ format(round(., 4), nsmall = 4))) %>%
+  print()
+
+# ---- DGEN: Observed Hartley ratios ----
+anova_data_DGEN <- df %>%
+  select(pID, all_of(between_vars), all_of(within_vars_DGEN)) %>%
+  filter(!is.na(perspective), perspective != "",
+         !is.na(temporalDO), temporalDO != "")
+
+long_data_DGEN <- anova_data_DGEN %>%
+  pivot_longer(
+    cols = all_of(within_vars_DGEN),
+    names_to = "variable",
+    values_to = "DGEN_SCORE"
+  ) %>%
+  mutate(
+    time   = ifelse(grepl("^past_", variable), "past", "fut"),
+    domain = case_when(
+      grepl("_pref_DGEN$", variable) ~ "pref",
+      grepl("_pers_DGEN$", variable) ~ "pers",
+      grepl("_val_DGEN$",  variable) ~ "val",
+      TRUE ~ NA_character_
+    )
+  ) %>%
+  mutate(
+    TIME       = factor(time,   levels = c("past", "fut")),
+    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
+    perspective = factor(perspective),
+    temporalDO  = factor(temporalDO)
+  ) %>%
+  select(pID, perspective, temporalDO, TIME, DOMAIN, DGEN_SCORE) %>%
+  filter(!is.na(DGEN_SCORE))
+
+cell_vars_DGEN <- long_data_DGEN %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n        = n(),
+    variance = var(DGEN_SCORE, na.rm = TRUE),
+    .groups  = "drop"
+  )
+
+fmax_by_cell_DGEN <- cell_vars_DGEN %>%
+  group_by(TIME, DOMAIN) %>%
+  summarise(
+    Fmax_observed = max(variance, na.rm = TRUE) / min(variance, na.rm = TRUE),
+    df_min        = min(n) - 1L,
+    .groups       = "drop"
+  )
+
+fmax_table_DGEN <- fmax_by_cell_DGEN %>%
+  rowwise() %>%
+  mutate(
+    alpha_0.05 = SuppDists::qmaxFratio(0.95, df = df_min, k = k),
+    alpha_0.01 = SuppDists::qmaxFratio(0.99, df = df_min, k = k)
+  ) %>%
+  ungroup() %>%
+  mutate(
+    Fmax_observed = round(Fmax_observed, 4),
+    alpha_0.05    = round(alpha_0.05, 4),
+    alpha_0.01    = round(alpha_0.01, 4)
+  ) %>%
+  select(TIME, DOMAIN, Fmax_observed, alpha_0.05, alpha_0.01)
+
+cat("\n--- Hartley ratios (DGEN) ---\n")
+fmax_table_DGEN %>%
+  mutate(across(where(is.numeric), ~ format(round(., 4), nsmall = 4))) %>%
+  print()
--- a/eohi3/DA01_anova_DS.r
+++ b/eohi3/DA01_anova_DS.r
@ -0,0 +1,235 @@
+library(tidyverse)
+library(rstatix)
+library(emmeans)
+library(effectsize)
+library(afex)
+library(car)
+
+options(scipen = 999)
+afex::set_sum_contrasts()
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+df <- read.csv(
+  "eohi3.csv",
+  stringsAsFactors = FALSE,
+  check.names = FALSE,
+  na.strings = "NA"
+)
+
+between_vars <- c("perspective", "temporalDO")
+within_vars <- c(
+  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
+  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
+)
+
+missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
+if (length(missing_vars) > 0) {
+  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
+}
+
+anova_data <- df %>%
+  select(pID, all_of(between_vars), all_of(within_vars)) %>%
+  filter(
+    !is.na(perspective), perspective != "",
+    !is.na(temporalDO),  temporalDO  != ""
+  )
+
+long_data <- anova_data %>%
+  pivot_longer(
+    cols = all_of(within_vars),
+    names_to = "variable",
+    values_to = "MEAN_SCORE"
+  ) %>%
+  mutate(
+    time = case_when(
+      grepl("^past_", variable) ~ "past",
+      grepl("^fut_",  variable) ~ "fut",
+      TRUE ~ NA_character_
+    ),
+    domain = case_when(
+      grepl("_pref_MEAN$", variable) ~ "pref",
+      grepl("_pers_MEAN$", variable) ~ "pers",
+      grepl("_val_MEAN$",  variable) ~ "val",
+      TRUE ~ NA_character_
+    )
+  ) %>%
+  mutate(
+    TIME       = factor(time,   levels = c("past", "fut")),
+    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
+    perspective = factor(perspective),
+    temporalDO  = factor(temporalDO),
+    pID         = factor(pID)
+  ) %>%
+  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
+  filter(!is.na(MEAN_SCORE))
+
+desc_stats <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n        = n(),
+    mean     = round(mean(MEAN_SCORE), 5),
+    variance = round(var(MEAN_SCORE), 5),
+    sd       = round(sd(MEAN_SCORE), 5),
+    median   = round(median(MEAN_SCORE), 5),
+    q1       = round(quantile(MEAN_SCORE, 0.25), 5),
+    q3       = round(quantile(MEAN_SCORE, 0.75), 5),
+    min      = round(min(MEAN_SCORE), 5),
+    max      = round(max(MEAN_SCORE), 5),
+    .groups  = "drop"
+  )
+
+print(desc_stats, n = Inf)
+
+missing_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n_total     = n(),
+    n_missing   = sum(is.na(MEAN_SCORE)),
+    pct_missing = round(100 * n_missing / n_total, 2),
+    .groups     = "drop"
+  )
+
+print(missing_summary, n = Inf)
+
+outlier_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n          = n(),
+    n_outliers = sum(abs(scale(MEAN_SCORE)) > 3),
+    .groups    = "drop"
+  )
+
+print(outlier_summary, n = Inf)
+
+homogeneity_between <- long_data %>%
+  group_by(TIME, DOMAIN) %>%
+  rstatix::levene_test(MEAN_SCORE ~ perspective * temporalDO)
+
+print(homogeneity_between, n = Inf)
+
+# Normality: within-subjects residuals (deviation from each participant's mean)
+resid_within <- long_data %>%
+  group_by(pID) %>%
+  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
+  ungroup() %>%
+  mutate(resid = MEAN_SCORE - person_mean) %>%
+  pull(resid)
+resid_within <- resid_within[!is.na(resid_within)]
+n_resid <- length(resid_within)
+if (n_resid < 3L) {
+  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
+} else {
+  resid_for_shapiro <- if (n_resid > 5000L) {
+    set.seed(1L)
+    sample(resid_within, 5000L)
+  } else {
+    resid_within
+  }
+  print(shapiro.test(resid_for_shapiro))
+}
+# qqnorm(resid_within)
+# qqline(resid_within)
+
+aov_afex <- aov_ez(
+  id      = "pID",
+  dv      = "MEAN_SCORE",
+  data    = long_data,
+  between = c("perspective", "temporalDO"),
+  within  = c("TIME", "DOMAIN"),
+  type    = 3
+)
+
+# ANOVA table: uncorrected and Greenhouse–Geisser
+cat("\n--- ANOVA Table (Type 3, uncorrected) ---\n")
+print(nice(aov_afex, correction = "none"))
+cat("\n--- ANOVA Table (Type 3, Greenhouse–Geisser correction) ---\n")
+print(nice(aov_afex, correction = "GG"))
+
+# Mauchly's test of sphericity and epsilon (via car::Anova on wide data)
+anova_wide <- anova_data %>%
+  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
+  filter(if_all(all_of(within_vars), ~ !is.na(.)))
+response_matrix <- as.matrix(anova_wide[, within_vars])
+rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
+idata <- data.frame(
+  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
+  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
+)
+rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
+rm_summary <- summary(rm_anova, multivariate = FALSE)
+if (!is.null(rm_summary$sphericity.tests)) {
+  cat("\nMauchly's Test of Sphericity:\n")
+  print(rm_summary$sphericity.tests)
+}
+if (!is.null(rm_summary$epsilon)) {
+  cat("\nEpsilon (GG, HF):\n")
+  print(rm_summary$epsilon)
+}
+
+# Within-subjects residuals: deviation from each participant's mean (one per observation)
+resid_within <- long_data %>%
+  group_by(pID) %>%
+  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
+  ungroup() %>%
+  mutate(resid = MEAN_SCORE - person_mean) %>%
+  pull(resid)
+resid_within <- resid_within[!is.na(resid_within)]
+# R's shapiro.test() allows 3 <= n <= 5000; use a random sample of 5000 if we have more
+n_resid <- length(resid_within)
+if (n_resid < 3L) {
+  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
+} else {
+  resid_for_shapiro <- if (n_resid > 5000L) {
+    set.seed(1L)
+    sample(resid_within, 5000L)
+  } else {
+    resid_within
+  }
+  print(shapiro.test(resid_for_shapiro))
+}
+
+# qqnorm(resid_within)
+# qqline(resid_within)
+
+# POST-HOC COMPARISONS (significant effects only)
+
+# TIME (main effect)
+emm_TIME <- emmeans(aov_afex, ~ TIME)
+print(pairs(emm_TIME, adjust = "bonferroni"))
+
+# temporalDO:TIME — ~TIME and ~temporalDO
+emm_temporalDO_TIME <- emmeans(aov_afex, ~ TIME | temporalDO)
+print(pairs(emm_temporalDO_TIME, adjust = "bonferroni"))
+emm_temporalDO_temporalDO <- emmeans(aov_afex, ~ temporalDO | TIME)
+print(pairs(emm_temporalDO_temporalDO, adjust = "bonferroni"))
+
+# perspective:temporalDO:TIME — ~TIME, ~perspective, ~temporalDO
+emm_pt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO)
+print(pairs(emm_pt_TIME, adjust = "bonferroni"))
+emm_pt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME)
+print(pairs(emm_pt_perspective, adjust = "bonferroni"))
+emm_pt_temporalDO <- emmeans(aov_afex, ~ temporalDO | perspective + TIME)
+print(pairs(emm_pt_temporalDO, adjust = "bonferroni"))
+
+# perspective:DOMAIN — ~perspective and ~DOMAIN
+emm_perspective_DOMAIN <- emmeans(aov_afex, ~ perspective | DOMAIN)
+print(pairs(emm_perspective_DOMAIN, adjust = "bonferroni"))
+emm_perspective_DOMAIN_domain <- emmeans(aov_afex, ~ DOMAIN | perspective)
+print(pairs(emm_perspective_DOMAIN_domain, adjust = "bonferroni"))
+
+# perspective:TIME:DOMAIN — ~TIME, ~perspective, ~DOMAIN
+emm_pt_TIME_domain <- emmeans(aov_afex, ~ TIME | perspective + DOMAIN)
+print(pairs(emm_pt_TIME_domain, adjust = "bonferroni"))
+emm_pt_domain_perspective <- emmeans(aov_afex, ~ perspective | TIME + DOMAIN)
+print(pairs(emm_pt_domain_perspective, adjust = "bonferroni"))
+emm_pt_domain_domain <- emmeans(aov_afex, ~ DOMAIN | perspective + TIME)
+print(pairs(emm_pt_domain_domain, adjust = "bonferroni"))
+
+# perspective:temporalDO:TIME:DOMAIN — ~TIME, ~perspective, ~temporalDO
+emm_ptt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO + DOMAIN)
+print(pairs(emm_ptt_TIME, adjust = "bonferroni"))
+emm_ptt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME + DOMAIN)
+print(pairs(emm_ptt_perspective, adjust = "bonferroni"))
+emm_ptt_temporalDO <- emmeans(aov_afex, ~ temporalDO | perspective + TIME + DOMAIN)
+print(pairs(emm_ptt_temporalDO, adjust = "bonferroni"))
--- a/eohi3/dataREVIEW-JAN05/eohi3_filter2.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_filter2.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_raw.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_raw.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_raw2.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_raw2.csv
--- a/eohi3/dataREVIEW-JAN05/eohi3_unprocessed.csv
+++ b/eohi3/dataREVIEW-JAN05/eohi3_unprocessed.csv
--- a/eohi3/dataREVIEW-JAN05/response
+++ b/eohi3/dataREVIEW-JAN05/response
@ -0,0 +1,68 @@
+ResponseId,RATIONALE
+R_12EXYt8gHauPaCb,duration
+R_142iZtlDp1Vam14,duration
+R_16eRiaoFPG5CpE4,duration
+R_1aK2JWzCFkpefUg,duration
+R_1FEuEk6VzuwxZby,duration
+R_1IsHUv4sb6oOphv,duration
+R_1J2cryciskOYjOV,duration
+R_1JFsZ1GXM7jDWmh,duration
+R_1JlV9H7AJKtNZ8g,duration
+R_1kgjhkT4sJwhfuV,duration
+R_1MAMwGkBHTTSyAh,duration
+R_1O6dV9hTlqpsYjP,duration
+R_1qatgZwcLPGctnd,age mismatch
+R_1QE5KaKNkt66Cer,duration
+R_1QsYazd3eOH62js,duration
+R_1vwOg7l0kSLHGRX,duration
+R_1YJ2G01dpxYqKAm,duration
+R_1YoddNWqybPbaNN,feedback in french
+R_1ZOjQ97Ph1VtRwp,duration
+R_347ABt6LFPUeVZS,duration
+R_34Ain6V2NbEDeQm,duration
+R_38J0VDB8JE8Dd0o,duration
+R_3DptQmS26X0Z8Wu,IP duplicate
+R_3Foc2aYGpXFrbnX,age mismatch + duration
+R_3HLz0FyaULkIPKu,IP duplicate
+R_3jUhefm4hAEQ6PC,duration
+R_3n8b0ndM4habNjB,age mismatch
+R_3nTLzs9jMwDHbFy,duration
+R_3rGudTtAd2oVze3,duration
+R_3t6giyCy5IwZgom,duration
+R_3WwXkl4IatPYDZ0,age mismatch
+R_5ByssDsdjMcQgUV,duration
+R_5cNBH4nxBlH8OSB,duration
+R_5FkttTgBeMePzhk,sex mismatch
+R_5FyLW7dHpyFojo5,duration
+R_5M3urkuYhhSG06E,duration
+R_5MRp7eFKMm59t14,feedback in french
+R_5n6H7xuYTQgvFEf,duration
+R_5rrbHXjKol6Zl9U,duration
+R_5youAGSa5hLGkuZ,age mismatch + duration
+R_5z5DYfTnai5Pj3j,duration
+R_64nOi2TWI4XCYkt,duration
+R_6BcdSiP0Nibxx1D,duration
+R_6C4v9kRnGm9Iqyj,IP duplicate
+R_6CpjN5tJoj8dYuB,duration
+R_6cwKXrr8R99m5ez,duration
+R_6F4ld4gRlKjsb06,age mismatch + duration
+R_6GqjTqXrehkbG0x,duration
+R_6HCtgHyy16nNMQ4,age mismatch
+R_6hQN1DUFkxGpDGD,IP duplicate
+R_6JKscJDUeAt7k1y,age mismatch
+R_6lKqtees5Z1hj2L,duration
+R_6m1NYZLedxbAxui,duration
+R_6pM4ierZhbT1FEb,duration
+R_6rQCiwlJHKrWWKB,duration
+R_7AwVrmL8AM0KLKx,duration
+R_7bH15XzvHpDCZO1,duration
+R_7Cl7KFkEiuYwdZn,duration
+R_7EfALTPED13tduG,duration
+R_7flJBV9qf88XSM5,duration
+R_7H0dTzsyEC1Pzyh,duration
+R_7HM0FXjrAoTeGqt,duration
+R_7HRMvwMPw3OBE7g,duration
+R_7o7FORJHlgWAahS,age mismatch
+R_7sTsQ9AI42QQgSV,duration
+R_7VJCRyovK5KAddn,duration
+R_7w4ggvRoPBkyTle,duration
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,189 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# RATIONALE column should exist in the CSV
+# Ensure RATIONALE is character and convert any NA values to empty strings
+if (!is.character(df$RATIONALE)) {
+  df$RATIONALE <- as.character(df$RATIONALE)
+}
+df$RATIONALE[is.na(df$RATIONALE)] <- ""
+
+# Function to check if age falls within range
+check_age_range <- function(age_num, age_range_str) {
+  # Check if data is missing or empty
+  if (is.na(age_num) || is.null(age_num) || age_range_str == "" || is.na(age_range_str) || trimws(age_range_str) == "") {
+    return(NULL)  # Can't check if data is missing - return NULL to indicate skip
+  }
+  
+  # Parse range string (e.g., "46 - 52" or "25 - 31")
+  range_parts <- strsplit(trimws(age_range_str), "\\s*-\\s*")[[1]]
+  if (length(range_parts) != 2) {
+    return(NULL)  # Invalid range format - return NULL to indicate skip
+  }
+  
+  min_age <- as.numeric(trimws(range_parts[1]))
+  max_age <- as.numeric(trimws(range_parts[2]))
+  
+  if (is.na(min_age) || is.na(max_age)) {
+    return(NULL)  # Couldn't parse numbers - return NULL to indicate skip
+  }
+  
+  # Check if age falls within range (inclusive)
+  return(age_num >= min_age && age_num <= max_age)
+}
+
+# Function to check if a value is empty (empty string or whitespace only)
+# Empty cells are kept as empty strings, not NA
+# Vectorized to handle both single values and vectors
+is_empty <- function(x) {
+  if (is.null(x)) return(TRUE)
+  # Handle vectors
+  if (length(x) > 1) {
+    result <- rep(FALSE, length(x))
+    result[is.na(x)] <- TRUE
+    if (is.character(x)) {
+      result[trimws(x) == ""] <- TRUE
+      result[x == ""] <- TRUE
+    }
+    return(result)
+  }
+  # Handle single value
+  if (is.na(x)) return(TRUE)
+  if (is.character(x) && trimws(x) == "") return(TRUE)
+  if (is.character(x) && x == "") return(TRUE)
+  return(FALSE)
+}
+
+# 1. Check sex match
+# Only check if both values are non-empty
+sex_mismatch <- rep(FALSE, nrow(df))
+for (i in seq_len(nrow(df))) {
+  demo_sex_val <- ifelse(is.na(df$demo_sex[i]), "", trimws(df$demo_sex[i]))
+  taq_sex_val <- ifelse(is.na(df$taq_sex[i]), "", trimws(df$taq_sex[i]))
+  
+  # Only check if both are non-empty
+  if (demo_sex_val != "" && taq_sex_val != "") {
+    if (tolower(demo_sex_val) != tolower(taq_sex_val)) {
+      sex_mismatch[i] <- TRUE
+    }
+  }
+}
+
+# 2. Check age range match
+age_mismatch <- rep(FALSE, nrow(df))
+for (i in seq_len(nrow(df))) {
+  # Only check if demo_age is not empty/NA and taq_age is not empty
+  if (!is.na(df$demo_age[i]) && !is_empty(df$taq_age[i])) {
+    age_check <- check_age_range(df$demo_age[i], df$taq_age[i])
+    # age_check is NULL if we can't check, FALSE if mismatch, TRUE if match
+    if (!is.null(age_check) && !age_check) {
+      age_mismatch[i] <- TRUE
+    }
+  }
+}
+
+# 3. Check citizenship (taq_cit_1 or taq_cit_2)
+no_cit <- is_empty(df$taq_cit_1) & is_empty(df$taq_cit_2)
+
+# 4. Check IP address duplicates
+# Find IP addresses that appear more than once (non-empty IPs only)
+ip_duplicate <- rep(FALSE, nrow(df))
+if ("IPAddress" %in% colnames(df)) {
+  # Get non-empty IP addresses
+  ip_addresses <- ifelse(is.na(df$IPAddress), "", trimws(df$IPAddress))
+  
+  # Count occurrences of each IP
+  ip_counts <- table(ip_addresses)
+  # Get IPs that appear more than once (and are not empty)
+  duplicate_ips <- names(ip_counts)[ip_counts > 1 & names(ip_counts) != ""]
+  
+  # Mark rows with duplicate IPs
+  if (length(duplicate_ips) > 0) {
+    for (dup_ip in duplicate_ips) {
+      ip_duplicate[ip_addresses == dup_ip] <- TRUE
+    }
+  }
+}
+
+# Build RATIONALE column - only populate when there are issues
+# Start with empty strings to preserve existing empty cells
+rationale_parts <- rep("", nrow(df))
+
+# Add sex mismatch
+rationale_parts[sex_mismatch] <- "sex mismatch"
+
+# Add age mismatch (append if sex mismatch already exists)
+for (i in seq_len(nrow(df))) {
+  if (age_mismatch[i]) {
+    if (rationale_parts[i] != "") {
+      rationale_parts[i] <- paste(rationale_parts[i], "age mismatch", sep = "; ")
+    } else {
+      rationale_parts[i] <- "age mismatch"
+    }
+  }
+}
+
+# Add no cit (append if other issues already exist)
+for (i in seq_len(nrow(df))) {
+  if (no_cit[i]) {
+    if (rationale_parts[i] != "") {
+      rationale_parts[i] <- paste(rationale_parts[i], "no cit", sep = "; ")
+    } else {
+      rationale_parts[i] <- "no cit"
+    }
+  }
+}
+
+# Add IP duplicate (append if other issues already exist)
+for (i in seq_len(nrow(df))) {
+  if (ip_duplicate[i]) {
+    if (rationale_parts[i] != "") {
+      rationale_parts[i] <- paste(rationale_parts[i], "IP duplicate", sep = "; ")
+    } else {
+      rationale_parts[i] <- "IP duplicate"
+    }
+  }
+}
+
+# Update RATIONALE column - only set when there are issues, otherwise keep existing value
+# If no issues found, keep the cell empty (or existing value if any)
+for (i in seq_len(nrow(df))) {
+  if (rationale_parts[i] != "") {
+    df$RATIONALE[i] <- rationale_parts[i]
+  }
+  # If rationale_parts[i] is empty, leave RATIONALE as is (preserves existing empty or other values)
+}
+
+# Summary - using multiple methods to ensure output appears
+# Try message() first (better for debug console)
+message("Validation Summary:")
+message("Sex mismatches: ", sum(sex_mismatch))
+message("Age mismatches: ", sum(age_mismatch))
+message("No citizenship: ", sum(no_cit))
+message("IP duplicates: ", sum(ip_duplicate))
+message("Total rows with issues: ", sum(rationale_parts != ""))
+
+# Also use cat() to stdout (for terminal)
+cat("Validation Summary:\n", file = stdout())
+cat("Sex mismatches:", sum(sex_mismatch), "\n", file = stdout())
+cat("Age mismatches:", sum(age_mismatch), "\n", file = stdout())
+cat("No citizenship:", sum(no_cit), "\n", file = stdout())
+cat("IP duplicates:", sum(ip_duplicate), "\n", file = stdout())
+cat("Total rows with issues:", sum(rationale_parts != ""), "\n", file = stdout())
+flush(stdout())
+
+# Write the updated data
+# Preserve empty strings as empty (not NA)
+# Convert character column NAs to empty strings to preserve empty cells
+for (col in names(df)) {
+  if (is.character(df[[col]])) {
+    df[[col]][is.na(df[[col]])] <- ""
+  }
+}
+write.csv(df, "eohi3_raw2.csv", row.names = FALSE, na = "", quote = TRUE)
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,39 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# Populate citizenship column from taq_cit_1 and taq_cit_2
+# If both have values, set to "Both"
+# Otherwise, use the value from whichever column has a value
+# Empty values remain as empty strings (not NA)
+
+# Ensure citizenship column exists, initialize with empty strings if needed
+if (!"citizenship" %in% names(df)) {
+  df$citizenship <- ""
+}
+
+# Convert NA to empty string for taq_cit columns to ensure consistent handling
+df$taq_cit_1[is.na(df$taq_cit_1)] <- ""
+df$taq_cit_2[is.na(df$taq_cit_2)] <- ""
+
+# Populate citizenship based on taq_cit_1 and taq_cit_2 using base R
+# Check if both have values (non-empty)
+both_have_values <- df$taq_cit_1 != "" & df$taq_cit_2 != ""
+
+# Check if only taq_cit_1 has a value
+only_cit1 <- df$taq_cit_1 != "" & df$taq_cit_2 == ""
+
+# Check if only taq_cit_2 has a value
+only_cit2 <- df$taq_cit_2 != "" & df$taq_cit_1 == ""
+
+# Assign values
+df$citizenship[both_have_values] <- "Both"
+df$citizenship[only_cit1] <- df$taq_cit_1[only_cit1]
+df$citizenship[only_cit2] <- df$taq_cit_2[only_cit2]
+# For rows where neither has a value, citizenship keeps its original value (may be empty string)
+write.csv(df, "eohi3_raw.csv", row.names = FALSE, na = "", quote = TRUE)
--- a/eohi3/dataREVIEW-JAN21/datap
+++ b/eohi3/dataREVIEW-JAN21/datap
@ -0,0 +1,130 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3/dataREVIEW-JAN21")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3_raw.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# Remove trailing columns with empty names (dplyr requires all columns to have names)
+empty_cols <- which(names(df) == "" | is.na(names(df)))
+if (length(empty_cols) > 0) {
+  df <- df[, -empty_cols, drop = FALSE]
+}
+
+# Set to TRUE to save all distributions to a document file
+save_to_doc <- TRUE
+doc_filename <- "eohi3_quotas.txt"
+
+# =============================================================================
+# SINGLE VARIABLE DISTRIBUTIONS
+# =============================================================================
+
+dist_age <- df %>% count(taq_age, sort = TRUE)
+print(dist_age)
+
+dist_sex <- df %>% count(taq_sex, sort = TRUE)
+print(dist_sex)
+
+dist_citizenship <- df %>% count(citizenship, sort = TRUE)
+print(dist_citizenship)
+
+dist_group <- df %>% count(group, sort = TRUE)
+print(dist_group)
+
+dist_temporalDO <- df %>% count(temporalDO, sort = TRUE)
+print(dist_temporalDO)
+
+dist_perspective <- df %>% count(perspective, sort = TRUE)
+print(dist_perspective)
+
+# =============================================================================
+# NESTED DISTRIBUTIONS
+# =============================================================================
+
+dist_age_citizenship <- df %>% count(citizenship, taq_age) %>% arrange(citizenship, taq_age)
+print(dist_age_citizenship)
+
+dist_sex_citizenship <- df %>% count(citizenship, taq_sex) %>% arrange(citizenship, taq_sex)
+print(dist_sex_citizenship)
+
+dist_age_temporalDO <- df %>% count(temporalDO, taq_age) %>% arrange(temporalDO, taq_age)
+print(dist_age_temporalDO)
+
+dist_age_perspective <- df %>% count(perspective, taq_age) %>% arrange(perspective, taq_age)
+print(dist_age_perspective)
+
+dist_sex_temporalDO <- df %>% count(temporalDO, taq_sex) %>% arrange(temporalDO, taq_sex)
+print(dist_sex_temporalDO)
+
+dist_sex_perspective <- df %>% count(perspective, taq_sex) %>% arrange(perspective, taq_sex)
+print(dist_sex_perspective)
+
+# =============================================================================
+# OPTIONAL: SAVE ALL DISTRIBUTIONS TO DOCUMENT
+# =============================================================================
+
+if (save_to_doc) {
+  sink(doc_filename)
+  
+  cat("DISTRIBUTION REPORT\n")
+  cat("==================\n\n")
+  
+  cat("SINGLE VARIABLE DISTRIBUTIONS\n")
+  cat("------------------------------\n\n")
+  
+  cat("Distribution of taq_age:\n")
+  print(dist_age)
+  cat("\n\n")
+  
+  cat("Distribution of taq_sex:\n")
+  print(dist_sex)
+  cat("\n\n")
+  
+  cat("Distribution of citizenship:\n")
+  print(dist_citizenship)
+  cat("\n\n")
+  
+  cat("Distribution of group:\n")
+  print(dist_group)
+  cat("\n\n")
+  
+  cat("Distribution of temporalDO:\n")
+  print(dist_temporalDO)
+  cat("\n\n")
+  
+  cat("Distribution of perspective:\n")
+  print(dist_perspective)
+  cat("\n\n")
+  
+  cat("NESTED DISTRIBUTIONS\n")
+  cat("---------------------\n\n")
+  
+  cat("Age within Citizenship:\n")
+  print(dist_age_citizenship)
+  cat("\n\n")
+  
+  cat("Sex within Citizenship:\n")
+  print(dist_sex_citizenship)
+  cat("\n\n")
+  
+  cat("Age within temporalDO:\n")
+  print(dist_age_temporalDO)
+  cat("\n\n")
+  
+  cat("Age within perspective:\n")
+  print(dist_age_perspective)
+  cat("\n\n")
+  
+  cat("Sex within temporalDO:\n")
+  print(dist_sex_temporalDO)
+  cat("\n\n")
+  
+  cat("Sex within perspective:\n")
+  print(dist_sex_perspective)
+  cat("\n")
+  
+  sink()
+  cat("Distributions saved to:", doc_filename, "\n")
+}
--- a/eohi3/dataREVIEW-JAN21/eohi3_quotas.txt
+++ b/eohi3/dataREVIEW-JAN21/eohi3_quotas.txt
@ -0,0 +1,177 @@
+DISTRIBUTION REPORT
+
+==================
+
+
+SINGLE VARIABLE DISTRIBUTIONS
+
+------------------------------
+
+
+Distribution of taq_age:
+
+  taq_age  n
+1 18 - 24 73
+2 53 - 59 67
+3 60 - 66 67
+4 67 - 73 65
+5 39 - 45 64
+6 46 - 52 63
+7 25 - 31 62
+8 32 - 38 61
+
+
+
+Distribution of taq_sex:
+
+            taq_sex   n
+1            Female 260
+2              Male 257
+3 Prefer not to say   5
+
+
+
+Distribution of citizenship:
+
+  citizenship   n
+1    American 262
+2    Canadian 258
+3        Both   2
+
+
+
+Distribution of group:
+
+  group   n
+1 01FPV 177
+2 03VFP 174
+3 02PVF 171
+
+
+
+Distribution of temporalDO:
+
+  temporalDO   n
+1       past 262
+2     future 260
+
+
+
+Distribution of perspective:
+
+  perspective   n
+1       other 261
+2        self 261
+
+
+
+NESTED DISTRIBUTIONS
+
+---------------------
+
+
+Age within Citizenship:
+
+   citizenship taq_age  n
+1     American 18 - 24 38
+2     American 25 - 31 30
+3     American 32 - 38 29
+4     American 39 - 45 33
+5     American 46 - 52 31
+6     American 53 - 59 34
+7     American 60 - 66 34
+8     American 67 - 73 33
+9         Both 32 - 38  1
+10        Both 46 - 52  1
+11    Canadian 18 - 24 35
+12    Canadian 25 - 31 32
+13    Canadian 32 - 38 31
+14    Canadian 39 - 45 31
+15    Canadian 46 - 52 31
+16    Canadian 53 - 59 33
+17    Canadian 60 - 66 33
+18    Canadian 67 - 73 32
+
+
+
+Sex within Citizenship:
+
+  citizenship           taq_sex   n
+1    American            Female 130
+2    American              Male 129
+3    American Prefer not to say   3
+4        Both            Female   1
+5        Both              Male   1
+6    Canadian            Female 129
+7    Canadian              Male 127
+8    Canadian Prefer not to say   2
+
+
+
+Age within temporalDO:
+
+   temporalDO taq_age  n
+1      future 18 - 24 38
+2      future 25 - 31 31
+3      future 32 - 38 29
+4      future 39 - 45 34
+5      future 46 - 52 35
+6      future 53 - 59 36
+7      future 60 - 66 29
+8      future 67 - 73 28
+9        past 18 - 24 35
+10       past 25 - 31 31
+11       past 32 - 38 32
+12       past 39 - 45 30
+13       past 46 - 52 28
+14       past 53 - 59 31
+15       past 60 - 66 38
+16       past 67 - 73 37
+
+
+
+Age within perspective:
+
+   perspective taq_age  n
+1        other 18 - 24 41
+2        other 25 - 31 36
+3        other 32 - 38 28
+4        other 39 - 45 32
+5        other 46 - 52 28
+6        other 53 - 59 33
+7        other 60 - 66 30
+8        other 67 - 73 33
+9         self 18 - 24 32
+10        self 25 - 31 26
+11        self 32 - 38 33
+12        self 39 - 45 32
+13        self 46 - 52 35
+14        self 53 - 59 34
+15        self 60 - 66 37
+16        self 67 - 73 32
+
+
+
+Sex within temporalDO:
+
+  temporalDO           taq_sex   n
+1     future            Female 130
+2     future              Male 129
+3     future Prefer not to say   1
+4       past            Female 130
+5       past              Male 128
+6       past Prefer not to say   4
+
+
+
+Sex within perspective:
+
+  perspective           taq_sex   n
+1       other            Female 130
+2       other              Male 128
+3       other Prefer not to say   3
+4        self            Female 130
+5        self              Male 129
+6        self Prefer not to say   2
+
+
--- a/eohi3/dataREVIEW-JAN21/eohi3_raw.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_raw.csv
--- a/eohi3/dataREVIEW-JAN21/eohi3_raw2.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_raw2.csv
--- a/eohi3/dataREVIEW-JAN21/eohi3_unprocessed_final.csv
+++ b/eohi3/dataREVIEW-JAN21/eohi3_unprocessed_final.csv
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,343 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# =============================================================================
+# 1. CREATE BACKUP
+# =============================================================================
+#file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
+
+# =============================================================================
+# 2. DEFINE VARIABLE MAPPINGS
+# =============================================================================
+
+# Past variables mapping: [self/other][VAL/PERS/PREF]_p5_[string] -> past_[val/pers/pref]_[string]
+past_mappings <- list(
+  # Values (VAL)
+  "past_val_trad" = c("selfVAL_p5_trad", "otherVAL_p5_trad"),
+  "past_val_autonomy" = c("selfVAL_p5_autonomy", "otherVAL_p5_autonomy"),
+  "past_val_personal" = c("selfVAL_p5_personal", "otherVAL_p5_personal"),
+  "past_val_justice" = c("selfVAL_p5_justice", "otherVAL_p5_justice"),
+  "past_val_close" = c("selfVAL_p5_close", "otherVAL_p5_close"),
+  "past_val_connect" = c("selfVAL_p5_connect", "otherVAL_p5_connect"),
+  "past_val_DGEN" = c("selfVAL_p5_dgen", "otherVAL_p5_dgen"),
+  
+  # Personality (PERS)
+  "past_pers_open" = c("selfPERS_p5_open", "otherPERS_p5_open"),
+  "past_pers_goal" = c("selfPERS_p5_goal", "otherPERS_p5_goal"),
+  "past_pers_social" = c("selfPERS_p5_social", "otherPERS_p5_social"),
+  "past_pers_agree" = c("selfPERS_p5_agree", "otherPERS_p5_agree"),
+  "past_pers_stress" = c("selfPERS_p5_stress", "otherPERS_p5_stress"),
+  "past_pers_DGEN" = c("selfPERS_p5_dgen", "otherPERS_p5_dgen"),
+  
+  # Preferences (PREF)
+  "past_pref_hobbies" = c("selfPREF_p5_hobbies", "otherPREF_p5_hobbies"),
+  "past_pref_music" = c("selfPREF_p5_music", "otherPREF_p5_music"),
+  "past_pref_dress" = c("selfPREF_p5_dress", "otherPREF_p5_dress"),
+  "past_pref_exer" = c("selfPREF_p5_exer", "otherPREF_p5_exer"),
+  "past_pref_food" = c("selfPREF_p5_food", "otherPREF_p5_food"),
+  "past_pref_friends" = c("selfPREF_p5_friends", "otherPREF_p5_friends"),
+  "past_pref_DGEN" = c("selfPREF_p5_dgen", "otherPREF_p5_dgen")
+)
+
+# Future variables mapping: [self/other][VAL/PERS/PREF]_f5_[string] -> fut_[val/pers/pref]_[string]
+future_mappings <- list(
+  # Values (VAL)
+  "fut_val_trad" = c("selfVAL_f5_trad", "otherVAL_f5_trad"),
+  "fut_val_autonomy" = c("selfVAL_f5_autonomy", "otherVAL_f5_autonomy"),
+  "fut_val_personal" = c("selfVAL_f5_personal", "otherVAL_f5_personal"),
+  "fut_val_justice" = c("selfVAL_f5_justice", "otherVAL_f5_justice"),
+  "fut_val_close" = c("selfVAL_f5_close", "otherVAL_f5_close"),
+  "fut_val_connect" = c("selfVAL_f5_connect", "otherVAL_f5_connect"),
+  "fut_val_DGEN" = c("selfVAL_f5_dgen", "otherVAL_f5_dgen"),
+  
+  # Personality (PERS)
+  "fut_pers_open" = c("selfPERS_f5_open", "otherPERS_f5_open"),
+  "fut_pers_goal" = c("selfPERS_f5_goal", "otherPERS_f5_goal"),
+  "fut_pers_social" = c("selfPERS_f5_social", "otherPERS_f5_social"),
+  "fut_pers_agree" = c("selfPERS_f5_agree", "otherPERS_f5_agree"),
+  "fut_pers_stress" = c("selfPERS_f5_stress", "otherPERS_f5_stress"),
+  "fut_pers_DGEN" = c("selfPERS_f5_dgen", "otherPERS_f5_dgen"),
+  
+  # Preferences (PREF)
+  "fut_pref_hobbies" = c("selfPREF_f5_hobbies", "otherPREF_f5_hobbies"),
+  "fut_pref_music" = c("selfPREF_f5_music", "otherPREF_f5_music"),
+  "fut_pref_dress" = c("selfPREF_f5_dress", "otherPREF_f5_dress"),
+  "fut_pref_exer" = c("selfPREF_f5_exer", "otherPREF_f5_exer"),
+  "fut_pref_food" = c("selfPREF_f5_food", "otherPREF_f5_food"),
+  "fut_pref_friends" = c("selfPREF_f5_friends", "otherPREF_f5_friends"),
+  "fut_pref_DGEN" = c("selfPREF_f5_dgen", "otherPREF_f5_dgen")
+)
+
+# =============================================================================
+# 3. COMBINE VARIABLES
+# =============================================================================
+
+# Function to combine self and other variables
+# For each row, values exist in either self OR other, never both
+# NOTE: Column existence should be checked before calling this function
+combine_vars <- function(df, self_col, other_col) {
+  # Safety check: if columns don't exist, return appropriate fallback
+  if (!self_col %in% names(df)) {
+    stop(paste("ERROR: Column", self_col, "not found. This should have been caught earlier."))
+  }
+  if (!other_col %in% names(df)) {
+    stop(paste("ERROR: Column", other_col, "not found. This should have been caught earlier."))
+  }
+  
+  # Combine: use self value if not empty/NA, otherwise use other value
+  # Handle both NA and empty strings
+  result <- ifelse(
+    !is.na(df[[self_col]]) & df[[self_col]] != "",
+    df[[self_col]],
+    ifelse(
+      !is.na(df[[other_col]]) & df[[other_col]] != "",
+      df[[other_col]],
+      NA
+    )
+  )
+  
+  return(result)
+}
+
+# Apply past mappings
+cat("\nCombining past variables...\n")
+missing_cols <- list()
+for (new_col in names(past_mappings)) {
+  self_col <- past_mappings[[new_col]][1]
+  other_col <- past_mappings[[new_col]][2]
+  
+  # Check if all required columns exist
+  missing <- c()
+  if (!new_col %in% names(df)) {
+    missing <- c(missing, paste("target:", new_col))
+  }
+  if (!self_col %in% names(df)) {
+    missing <- c(missing, paste("self:", self_col))
+  }
+  if (!other_col %in% names(df)) {
+    missing <- c(missing, paste("other:", other_col))
+  }
+  
+  if (length(missing) > 0) {
+    missing_cols[[new_col]] <- missing
+    warning(paste("Skipping", new_col, "- missing columns:", paste(missing, collapse = ", ")))
+    next
+  }
+  
+  # All columns exist, proceed with combination
+  df[[new_col]] <- combine_vars(df, self_col, other_col)
+  cat(paste("  Updated:", new_col, "\n"))
+}
+
+# Report any missing columns
+if (length(missing_cols) > 0) {
+  cat("\n⚠ Missing columns detected in PAST variables:\n")
+  for (var in names(missing_cols)) {
+    cat(paste("  ", var, ":", paste(missing_cols[[var]], collapse = ", "), "\n"))
+  }
+}
+
+# Apply future mappings
+cat("\nCombining future variables...\n")
+missing_cols_future <- list()
+for (new_col in names(future_mappings)) {
+  self_col <- future_mappings[[new_col]][1]
+  other_col <- future_mappings[[new_col]][2]
+  
+  # Check if all required columns exist
+  missing <- c()
+  if (!new_col %in% names(df)) {
+    missing <- c(missing, paste("target:", new_col))
+  }
+  if (!self_col %in% names(df)) {
+    missing <- c(missing, paste("self:", self_col))
+  }
+  if (!other_col %in% names(df)) {
+    missing <- c(missing, paste("other:", other_col))
+  }
+  
+  if (length(missing) > 0) {
+    missing_cols_future[[new_col]] <- missing
+    warning(paste("Skipping", new_col, "- missing columns:", paste(missing, collapse = ", ")))
+    next
+  }
+  
+  # All columns exist, proceed with combination
+  df[[new_col]] <- combine_vars(df, self_col, other_col)
+  cat(paste("  Updated:", new_col, "\n"))
+}
+
+# Report any missing columns
+if (length(missing_cols_future) > 0) {
+  cat("\n⚠ Missing columns detected in FUTURE variables:\n")
+  for (var in names(missing_cols_future)) {
+    cat(paste("  ", var, ":", paste(missing_cols_future[[var]], collapse = ", "), "\n"))
+  }
+}
+
+# =============================================================================
+# 4. VALIDATION CHECKS
+# =============================================================================
+
+cat("\n=== VALIDATION CHECKS ===\n\n")
+
+# Check 1: Ensure no row has values in both self and other for the same variable
+check_conflicts <- function(df, mappings) {
+  conflicts <- data.frame()
+  
+  for (new_col in names(mappings)) {
+    self_col <- mappings[[new_col]][1]
+    other_col <- mappings[[new_col]][2]
+    
+    if (self_col %in% names(df) && other_col %in% names(df)) {
+      # Find rows where both self and other have non-empty values
+      both_filled <- !is.na(df[[self_col]]) & df[[self_col]] != "" &
+                     !is.na(df[[other_col]]) & df[[other_col]] != ""
+      
+      if (any(both_filled, na.rm = TRUE)) {
+        conflict_rows <- which(both_filled)
+        conflicts <- rbind(conflicts, data.frame(
+          variable = new_col,
+          self_col = self_col,
+          other_col = other_col,
+          n_conflicts = length(conflict_rows),
+          example_rows = paste(head(conflict_rows, 5), collapse = ", ")
+        ))
+      }
+    }
+  }
+  
+  return(conflicts)
+}
+
+past_conflicts <- check_conflicts(df, past_mappings)
+future_conflicts <- check_conflicts(df, future_mappings)
+
+if (nrow(past_conflicts) > 0) {
+  cat("WARNING: Found conflicts in PAST variables (both self and other have values):\n")
+  print(past_conflicts)
+} else {
+  cat("✓ No conflicts found in PAST variables\n")
+}
+
+if (nrow(future_conflicts) > 0) {
+  cat("\nWARNING: Found conflicts in FUTURE variables (both self and other have values):\n")
+  print(future_conflicts)
+} else {
+  cat("✓ No conflicts found in FUTURE variables\n")
+}
+
+# Check 2: Verify that combined columns have values where expected
+check_coverage <- function(df, mappings) {
+  coverage <- data.frame()
+  
+  for (new_col in names(mappings)) {
+    self_col <- mappings[[new_col]][1]
+    other_col <- mappings[[new_col]][2]
+    
+    # Check if columns exist before counting
+    self_exists <- self_col %in% names(df)
+    other_exists <- other_col %in% names(df)
+    target_exists <- new_col %in% names(df)
+    
+    # Count non-empty values in original columns (only if they exist)
+    self_count <- if (self_exists) {
+      sum(!is.na(df[[self_col]]) & df[[self_col]] != "", na.rm = TRUE)
+    } else {
+      NA
+    }
+    
+    other_count <- if (other_exists) {
+      sum(!is.na(df[[other_col]]) & df[[other_col]] != "", na.rm = TRUE)
+    } else {
+      NA
+    }
+    
+    combined_count <- if (target_exists) {
+      sum(!is.na(df[[new_col]]) & df[[new_col]] != "", na.rm = TRUE)
+    } else {
+      NA
+    }
+    
+    # Combined should equal sum of self and other (since they don't overlap)
+    expected_count <- if (!is.na(self_count) && !is.na(other_count)) {
+      self_count + other_count
+    } else {
+      NA
+    }
+    
+    match <- if (!is.na(combined_count) && !is.na(expected_count)) {
+      combined_count == expected_count
+    } else {
+      NA
+    }
+    
+    coverage <- rbind(coverage, data.frame(
+      variable = new_col,
+      self_non_empty = self_count,
+      other_non_empty = other_count,
+      combined_non_empty = combined_count,
+      expected_non_empty = expected_count,
+      match = match
+    ))
+  }
+  
+  return(coverage)
+}
+
+past_coverage <- check_coverage(df, past_mappings)
+future_coverage <- check_coverage(df, future_mappings)
+
+cat("\n=== COVERAGE CHECK ===\n")
+cat("\nPAST variables:\n")
+print(past_coverage)
+
+cat("\nFUTURE variables:\n")
+print(future_coverage)
+
+# Check if all coverage matches
+all_past_match <- all(past_coverage$match, na.rm = TRUE)
+all_future_match <- all(future_coverage$match, na.rm = TRUE)
+
+if (all_past_match && all_future_match) {
+  cat("\n✓ All combined variables have correct coverage\n")
+} else {
+  cat("\n⚠ Some variables may have missing coverage - check the table above\n")
+}
+
+# Check 3: Sample check - verify a few rows manually
+cat("\n=== SAMPLE ROW CHECK ===\n")
+sample_rows <- min(5, nrow(df))
+cat(paste("Checking first", sample_rows, "rows:\n\n"))
+
+for (i in 1:sample_rows) {
+  cat(paste("Row", i, ":\n"))
+  
+  # Check one past variable
+  test_var <- "past_val_trad"
+  self_val <- if (past_mappings[[test_var]][1] %in% names(df)) df[i, past_mappings[[test_var]][1]] else NA
+  other_val <- if (past_mappings[[test_var]][2] %in% names(df)) df[i, past_mappings[[test_var]][2]] else NA
+  combined_val <- df[i, test_var]
+  
+  cat(sprintf("  %s: self=%s, other=%s, combined=%s\n", 
+              test_var, 
+              ifelse(is.na(self_val) || self_val == "", "empty", self_val),
+              ifelse(is.na(other_val) || other_val == "", "empty", other_val),
+              ifelse(is.na(combined_val) || combined_val == "", "empty", combined_val)))
+}
+
+# =============================================================================
+# 5. SAVE UPDATED DATA
+# =============================================================================
+
+
+write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
+cat("Updated data saved to: eohi3.csv\n")
+cat(paste("Total rows:", nrow(df), "\n"))
+cat(paste("Total columns:", ncol(df), "\n"))
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,187 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# =============================================================================
+# 1. CREATE BACKUP
+# =============================================================================
+file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
+
+# =============================================================================
+# 2. DEFINE VARIABLE MAPPINGS
+# =============================================================================
+
+# Target variables (excluding those ending in _MEAN)
+# Each target var = past_var - fut_var
+ehi_mappings <- list(
+  # Preferences (PREF)
+  "ehi_pref_hobbies" = c("past_pref_hobbies", "fut_pref_hobbies"),
+  "ehi_pref_music" = c("past_pref_music", "fut_pref_music"),
+  "ehi_pref_dress" = c("past_pref_dress", "fut_pref_dress"),
+  "ehi_pref_exer" = c("past_pref_exer", "fut_pref_exer"),
+  "ehi_pref_food" = c("past_pref_food", "fut_pref_food"),
+  "ehi_pref_friends" = c("past_pref_friends", "fut_pref_friends"),
+  "ehi_pref_DGEN" = c("past_pref_DGEN", "fut_pref_DGEN"),
+  
+  # Personality (PERS)
+  "ehi_pers_open" = c("past_pers_open", "fut_pers_open"),
+  "ehi_pers_goal" = c("past_pers_goal", "fut_pers_goal"),
+  "ehi_pers_social" = c("past_pers_social", "fut_pers_social"),
+  "ehi_pers_agree" = c("past_pers_agree", "fut_pers_agree"),
+  "ehi_pers_stress" = c("past_pers_stress", "fut_pers_stress"),
+  "ehi_pers_DGEN" = c("past_pers_DGEN", "fut_pers_DGEN"),
+  
+  # Values (VAL)
+  "ehi_val_trad" = c("past_val_trad", "fut_val_trad"),
+  "ehi_val_autonomy" = c("past_val_autonomy", "fut_val_autonomy"),
+  "ehi_val_personal" = c("past_val_personal", "fut_val_personal"),
+  "ehi_val_justice" = c("past_val_justice", "fut_val_justice"),
+  "ehi_val_close" = c("past_val_close", "fut_val_close"),
+  "ehi_val_connect" = c("past_val_connect", "fut_val_connect"),
+  "ehi_val_DGEN" = c("past_val_DGEN", "fut_val_DGEN")
+)
+
+# =============================================================================
+# 3. CHECK IF TARGET VARIABLES EXIST
+# =============================================================================
+
+missing_targets <- c()
+for (target_var in names(ehi_mappings)) {
+  if (!target_var %in% names(df)) {
+    missing_targets <- c(missing_targets, target_var)
+    cat(paste("⚠ Target variable not found:", target_var, "\n"))
+  }
+}
+
+if (length(missing_targets) > 0) {
+  cat("\nERROR: The following target variables are missing from eohi3.csv:\n")
+  for (var in missing_targets) {
+    cat(paste("  -", var, "\n"))
+  }
+  stop("Cannot proceed without target variables. Please add them to the CSV file.")
+}
+
+# =============================================================================
+# 4. CALCULATE EHI VARIABLES (past - future)
+# =============================================================================
+
+missing_source_cols <- list()
+
+for (target_var in names(ehi_mappings)) {
+  past_var <- ehi_mappings[[target_var]][1]
+  fut_var <- ehi_mappings[[target_var]][2]
+  
+  # Check if source columns exist
+  missing <- c()
+  if (!past_var %in% names(df)) {
+    missing <- c(missing, past_var)
+  }
+  if (!fut_var %in% names(df)) {
+    missing <- c(missing, fut_var)
+  }
+  
+  if (length(missing) > 0) {
+    missing_source_cols[[target_var]] <- missing
+    warning(paste("Skipping", target_var, "- missing source columns:", paste(missing, collapse = ", ")))
+    next
+  }
+  
+  # Convert to numeric, handling empty strings and NA
+  past_vals <- as.numeric(ifelse(df[[past_var]] == "" | is.na(df[[past_var]]), NA, df[[past_var]]))
+  fut_vals <- as.numeric(ifelse(df[[fut_var]] == "" | is.na(df[[fut_var]]), NA, df[[fut_var]]))
+  
+  # Calculate difference: past - future
+  ehi_vals <- past_vals - fut_vals
+  
+  # Update target column
+  df[[target_var]] <- ehi_vals
+  cat(paste("  Calculated:", target_var, "=", past_var, "-", fut_var, "\n"))
+}
+
+# Report any missing source columns
+if (length(missing_source_cols) > 0) {
+  for (var in names(missing_source_cols)) {
+    cat(paste("  ", var, ":", paste(missing_source_cols[[var]], collapse = ", "), "\n"))
+  }
+}
+
+# =============================================================================
+# 5. VALIDATION: CHECK 5 RANDOM ROWS
+# =============================================================================
+
+cat("\n=== VALIDATION: CHECKING 5 RANDOM ROWS ===\n\n")
+
+# Set seed for reproducibility
+set.seed(123)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+sample_rows <- sort(sample_rows)
+
+for (i in sample_rows) {
+  cat(paste("Row", i, ":\n"))
+  
+  # Check a few representative variables from each category
+  test_vars <- c(
+    "ehi_pref_hobbies",
+    "ehi_pers_open", 
+    "ehi_val_trad"
+  )
+  
+  for (target_var in test_vars) {
+    if (target_var %in% names(ehi_mappings)) {
+      past_var <- ehi_mappings[[target_var]][1]
+      fut_var <- ehi_mappings[[target_var]][2]
+      
+      if (past_var %in% names(df) && fut_var %in% names(df)) {
+        past_val <- df[i, past_var]
+        fut_val <- df[i, fut_var]
+        ehi_val <- df[i, target_var]
+        
+        # Convert to numeric for calculation check
+        past_num <- as.numeric(ifelse(past_val == "" | is.na(past_val), NA, past_val))
+        fut_num <- as.numeric(ifelse(fut_val == "" | is.na(fut_val), NA, fut_val))
+        ehi_num <- as.numeric(ifelse(is.na(ehi_val), NA, ehi_val))
+        
+        # Calculate expected value
+        expected <- if (!is.na(past_num) && !is.na(fut_num)) {
+          past_num - fut_num
+        } else {
+          NA
+        }
+        
+        # Check if calculation is correct
+        match <- if (!is.na(expected) && !is.na(ehi_num)) {
+          abs(expected - ehi_num) < 0.0001  # Allow for floating point precision
+        } else {
+          is.na(expected) && is.na(ehi_num)
+        }
+        
+        cat(sprintf("  %s:\n", target_var))
+        cat(sprintf("    %s = %s\n", past_var, ifelse(is.na(past_val) || past_val == "", "NA/empty", past_val)))
+        cat(sprintf("    %s = %s\n", fut_var, ifelse(is.na(fut_val) || fut_val == "", "NA/empty", fut_val)))
+        cat(sprintf("    %s = %s\n", target_var, ifelse(is.na(ehi_val), "NA", ehi_val)))
+        cat(sprintf("    Expected: %s - %s = %s\n", 
+                    ifelse(is.na(past_num), "NA", past_num),
+                    ifelse(is.na(fut_num), "NA", fut_num),
+                    ifelse(is.na(expected), "NA", expected)))
+        cat(sprintf("    Match: %s\n\n", ifelse(match, "✓", "✗ ERROR")))
+      }
+    }
+  }
+}
+
+# =============================================================================
+# 6. SAVE UPDATED DATA
+# =============================================================================
+# COMMENTED OUT: Uncomment when ready to save
+
+# cat("\n=== SAVING DATA ===\n")
+write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
+# cat("Updated data saved to: eohi3.csv\n")
+# cat(paste("Total rows:", nrow(df), "\n"))
+# cat(paste("Total columns:", ncol(df), "\n"))
+
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,225 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# =============================================================================
+# 1. CREATE BACKUP
+# =============================================================================
+file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
+
+# =============================================================================
+# 2. DEFINE MEAN VARIABLE MAPPINGS
+# =============================================================================
+
+mean_mappings <- list(
+  # Past Preferences MEAN
+  "past_pref_MEAN" = c("past_pref_hobbies", "past_pref_music", "past_pref_dress", 
+                       "past_pref_exer", "past_pref_food", "past_pref_friends"),
+  
+  # Future Preferences MEAN
+  "fut_pref_MEAN" = c("fut_pref_hobbies", "fut_pref_music", "fut_pref_dress", 
+                      "fut_pref_exer", "fut_pref_food", "fut_pref_friends"),
+  
+  # Past Personality MEAN
+  "past_pers_MEAN" = c("past_pers_open", "past_pers_goal", "past_pers_social", 
+                       "past_pers_agree", "past_pers_stress"),
+  
+  # Future Personality MEAN
+  "fut_pers_MEAN" = c("fut_pers_open", "fut_pers_goal", "fut_pers_social", 
+                       "fut_pers_agree", "fut_pers_stress"),
+  
+  # Past Values MEAN
+  "past_val_MEAN" = c("past_val_trad", "past_val_autonomy", "past_val_personal", 
+                      "past_val_justice", "past_val_close", "past_val_connect"),
+  
+  # Future Values MEAN
+  "fut_val_MEAN" = c("fut_val_trad", "fut_val_autonomy", "fut_val_personal", 
+                     "fut_val_justice", "fut_val_close", "fut_val_connect"),
+  
+  # EHI Preferences MEAN
+  "ehi_pref_MEAN" = c("ehi_pref_hobbies", "ehi_pref_music", "ehi_pref_dress", 
+                      "ehi_pref_exer", "ehi_pref_food", "ehi_pref_friends"),
+  
+  # EHI Personality MEAN
+  "ehi_pers_MEAN" = c("ehi_pers_open", "ehi_pers_goal", "ehi_pers_social", 
+                      "ehi_pers_agree", "ehi_pers_stress"),
+  
+  # EHI Values MEAN
+  "ehi_val_MEAN" = c("ehi_val_trad", "ehi_val_autonomy", "ehi_val_personal", 
+                     "ehi_val_justice", "ehi_val_close", "ehi_val_connect")
+)
+
+# Additional means
+additional_means <- list(
+  "ehiDS_mean" = c("ehi_pref_MEAN", "ehi_pers_MEAN", "ehi_val_MEAN"),
+  "ehiDGEN_mean" = c("ehi_pref_DGEN", "ehi_pers_DGEN", "ehi_val_DGEN")
+)
+
+# =============================================================================
+# 3. CHECK IF VARIABLES EXIST
+# =============================================================================
+
+# Check source variables for mean_mappings
+missing_source_vars <- list()
+for (target_var in names(mean_mappings)) {
+  source_vars <- mean_mappings[[target_var]]
+  missing <- setdiff(source_vars, names(df))
+  if (length(missing) > 0) {
+    missing_source_vars[[target_var]] <- missing
+    cat(paste("⚠ Missing source variables for", target_var, ":", paste(missing, collapse = ", "), "\n"))
+  }
+}
+
+# Check source variables for additional_means
+missing_additional_vars <- list()
+for (target_var in names(additional_means)) {
+  source_vars <- additional_means[[target_var]]
+  missing <- setdiff(source_vars, names(df))
+  if (length(missing) > 0) {
+    missing_additional_vars[[target_var]] <- missing
+    cat(paste("⚠ Missing source variables for", target_var, ":", paste(missing, collapse = ", "), "\n"))
+  }
+}
+
+# Check if target variables exist
+expected_targets <- c(names(mean_mappings), names(additional_means))
+actual_targets <- names(df)
+missing_targets <- setdiff(expected_targets, actual_targets)
+
+if (length(missing_targets) > 0) {
+  cat("\nERROR: The following target variables are missing from eohi3.csv:\n")
+  for (var in missing_targets) {
+    cat(paste("  -", var, "\n"))
+  }
+  stop("Cannot proceed without target variables. Please add them to the CSV file.")
+}
+
+# =============================================================================
+# 4. CALCULATE MEAN VARIABLES
+# =============================================================================
+
+# Function to calculate row means, handling NA and empty strings
+calculate_mean <- function(df, source_vars) {
+  # Extract columns and convert to numeric
+  cols_data <- df[, source_vars, drop = FALSE]
+  
+  # Convert to numeric matrix, treating empty strings and "NA" as NA
+  numeric_matrix <- apply(cols_data, 2, function(x) {
+    as.numeric(ifelse(x == "" | is.na(x) | x == "NA", NA, x))
+  })
+  
+  # Calculate row means, ignoring NA values
+  rowMeans(numeric_matrix, na.rm = TRUE)
+}
+
+# Calculate means for main mappings
+for (target_var in names(mean_mappings)) {
+  source_vars <- mean_mappings[[target_var]]
+  
+  # Check if all source variables exist
+  missing <- setdiff(source_vars, names(df))
+  if (length(missing) > 0) {
+    warning(paste("Skipping", target_var, "- missing source variables:", paste(missing, collapse = ", ")))
+    next
+  }
+  
+  # Calculate mean
+  df[[target_var]] <- calculate_mean(df, source_vars)
+  cat(paste("  Calculated:", target_var, "from", length(source_vars), "variables\n"))
+}
+
+# Calculate additional means
+for (target_var in names(additional_means)) {
+  source_vars <- additional_means[[target_var]]
+  
+  # Check if all source variables exist
+  missing <- setdiff(source_vars, names(df))
+  if (length(missing) > 0) {
+    warning(paste("Skipping", target_var, "- missing source variables:", paste(missing, collapse = ", ")))
+    next
+  }
+  
+  # Calculate mean
+  df[[target_var]] <- calculate_mean(df, source_vars)
+  cat(paste("  Calculated:", target_var, "from", length(source_vars), "variables\n"))
+}
+
+# =============================================================================
+# 5. VALIDATION: CHECK 5 RANDOM ROWS
+# =============================================================================
+
+# Set seed for reproducibility
+set.seed(123)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+sample_rows <- sort(sample_rows)
+
+for (i in sample_rows) {
+  cat(paste("Row", i, ":\n"))
+  
+  # Check a few representative mean variables
+  test_vars <- c(
+    "past_pref_MEAN",
+    "ehi_pref_MEAN",
+    "ehiDS_mean"
+  )
+  
+  for (target_var in test_vars) {
+    # Determine which mapping to use
+    if (target_var %in% names(mean_mappings)) {
+      source_vars <- mean_mappings[[target_var]]
+    } else if (target_var %in% names(additional_means)) {
+      source_vars <- additional_means[[target_var]]
+    } else {
+      next
+    }
+    
+    # Check if all source variables exist
+    if (!all(source_vars %in% names(df))) {
+      next
+    }
+    
+    # Get values
+    source_vals <- df[i, source_vars]
+    target_val <- df[i, target_var]
+    
+    # Convert to numeric for calculation
+    source_nums <- as.numeric(ifelse(source_vals == "" | is.na(source_vals) | source_vals == "NA", NA, source_vals))
+    target_num <- as.numeric(ifelse(is.na(target_val), NA, target_val))
+    
+    # Calculate expected mean (ignoring NA)
+    expected <- mean(source_nums, na.rm = TRUE)
+    if (all(is.na(source_nums))) {
+      expected <- NA
+    }
+    
+    # Check if calculation is correct
+    match <- if (!is.na(expected) && !is.na(target_num)) {
+      abs(expected - target_num) < 0.0001  # Allow for floating point precision
+    } else {
+      is.na(expected) && is.na(target_num)
+    }
+    
+    cat(sprintf("  %s:\n", target_var))
+    cat(sprintf("    Source variables: %s\n", paste(source_vars, collapse = ", ")))
+    cat(sprintf("    Source values: %s\n", paste(ifelse(is.na(source_vals) | source_vals == "", "NA/empty", source_vals), collapse = ", ")))
+    cat(sprintf("    %s = %s\n", target_var, ifelse(is.na(target_val), "NA", round(target_val, 4))))
+    cat(sprintf("    Expected mean: %s\n", ifelse(is.na(expected), "NA", round(expected, 4))))
+    cat(sprintf("    Match: %s\n\n", ifelse(match, "✓", "✗ ERROR")))
+  }
+}
+
+# =============================================================================
+# 6. SAVE UPDATED DATA
+# =============================================================================
+# COMMENTED OUT: Uncomment when ready to save
+
+write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
+# cat("Updated data saved to: eohi3.csv\n")
+# cat(paste("Total rows:", nrow(df), "\n"))
+# cat(paste("Total columns:", ncol(df), "\n"))
+
--- a/eohi3/datap
+++ b/eohi3/datap
@ -0,0 +1,462 @@
+library(dplyr)
+
+setwd("/home/ladmin/Documents/DND/EOHI/eohi3")
+
+# Read the data (with check.names=FALSE to preserve original column names)
+# Keep empty cells as empty strings, not NA
+# Only convert the literal string "NA" to NA, not empty strings
+df <- read.csv("eohi3.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = "NA")
+
+# =============================================================================
+# 1. CREATE BACKUP
+# =============================================================================
+file.copy("eohi3.csv", "eohi3_2.csv", overwrite = TRUE)
+
+# =============================================================================
+# HELPER FUNCTION: Check variable existence and values
+# =============================================================================
+
+check_vars_exist <- function(source_vars, target_vars) {
+  missing_source <- setdiff(source_vars, names(df))
+  missing_target <- setdiff(target_vars, names(df))
+  
+  if (length(missing_source) > 0) {
+    stop(paste("Missing source variables:", paste(missing_source, collapse = ", ")))
+  }
+  if (length(missing_target) > 0) {
+    stop(paste("Missing target variables:", paste(missing_target, collapse = ", ")))
+  }
+  return(TRUE)
+}
+
+check_values_exist <- function(var_name, expected_values) {
+  unique_vals <- unique(df[[var_name]])
+  unique_vals <- unique_vals[!is.na(unique_vals) & unique_vals != ""]
+  missing_vals <- setdiff(expected_values, unique_vals)
+  extra_vals <- setdiff(unique_vals, expected_values)
+  
+  if (length(missing_vals) > 0) {
+    cat(paste("  ⚠ Expected values not found in", var_name, ":", paste(missing_vals, collapse = ", "), "\n"))
+  }
+  if (length(extra_vals) > 0) {
+    cat(paste("  ⚠ Unexpected values found in", var_name, ":", paste(extra_vals, collapse = ", "), "\n"))
+  }
+  return(list(missing = missing_vals, extra = extra_vals))
+}
+
+# =============================================================================
+# 2. RECODE other_length2 TO other_length
+# =============================================================================
+
+cat("\n=== 1. RECODING other_length2 TO other_length ===\n\n")
+
+# Check variables exist
+check_vars_exist("other_length2", "other_length")
+
+# Check values in source
+cat("Checking source variable values...\n")
+length_vals <- unique(df$other_length2[!is.na(df$other_length2) & df$other_length2 != ""])
+cat(paste("  Unique values in other_length2:", paste(length_vals, collapse = ", "), "\n"))
+
+# Recode - handle "20+" as special case first, then convert to numeric for ranges
+# Convert to numeric once, suppressing warnings for non-numeric values
+num_length <- suppressWarnings(as.numeric(df$other_length2))
+
+df$other_length <- ifelse(
+  is.na(df$other_length2),
+  NA,
+  ifelse(
+    df$other_length2 == "",
+    "",
+    ifelse(
+      df$other_length2 == "20+",
+      "20+",
+      ifelse(
+        !is.na(num_length) & num_length >= 5 & num_length <= 9,
+        "5-9",
+        ifelse(
+          !is.na(num_length) & num_length >= 10 & num_length <= 14,
+          "10-14",
+          ifelse(
+            !is.na(num_length) & num_length >= 15 & num_length <= 19,
+            "15-19",
+            NA
+          )
+        )
+      )
+    )
+  )
+)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(123)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  source_val <- df$other_length2[i]
+  target_val <- df$other_length[i]
+  cat(sprintf("  Row %d: other_length2 = %s -> other_length = %s\n", 
+              i, ifelse(is.na(source_val), "NA", ifelse(source_val == "", "empty", source_val)), 
+              ifelse(is.na(target_val), "NA", ifelse(target_val == "", "empty", target_val))))
+}
+
+# =============================================================================
+# 3. RECODE other_like2 TO other_like
+# =============================================================================
+
+cat("\n=== 2. RECODING other_like2 TO other_like ===\n\n")
+
+# Check variables exist
+check_vars_exist("other_like2", "other_like")
+
+# Check expected values exist
+expected_like <- c("Dislike a great deal", "Dislike somewhat", "Neither like nor dislike", 
+                   "Like somewhat", "Like a great deal")
+check_values_exist("other_like2", expected_like)
+
+# Recode
+df$other_like <- ifelse(
+  is.na(df$other_like2),
+  NA,
+  ifelse(
+    df$other_like2 == "",
+    "",
+    ifelse(
+      df$other_like2 == "Dislike a great deal",
+      "-2",
+      ifelse(
+        df$other_like2 == "Dislike somewhat",
+        "-1",
+        ifelse(
+          df$other_like2 == "Neither like nor dislike",
+          "0",
+          ifelse(
+            df$other_like2 == "Like somewhat",
+            "1",
+            ifelse(
+              df$other_like2 == "Like a great deal",
+              "2",
+              NA
+            )
+          )
+        )
+      )
+    )
+  )
+)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(456)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  source_val <- df$other_like2[i]
+  target_val <- df$other_like[i]
+  cat(sprintf("  Row %d: other_like2 = %s -> other_like = %s\n", 
+              i, ifelse(is.na(source_val), "NA", ifelse(source_val == "", "empty", source_val)), 
+              ifelse(is.na(target_val), "NA", ifelse(target_val == "", "empty", target_val))))
+}
+
+# =============================================================================
+# 4. CALCULATE aot_total
+# =============================================================================
+
+cat("\n=== 3. CALCULATING aot_total ===\n\n")
+
+# Check variables exist
+aot_vars <- c("aot01", "aot02", "aot03", "aot04_r", "aot05_r", "aot06_r", "aot07_r", "aot08")
+check_vars_exist(aot_vars, "aot_total")
+
+# Reverse code aot04_r through aot07_r
+reverse_vars <- c("aot04_r", "aot05_r", "aot06_r", "aot07_r")
+for (var in reverse_vars) {
+  df[[paste0(var, "_reversed")]] <- as.numeric(ifelse(
+    df[[var]] == "" | is.na(df[[var]]),
+    NA,
+    as.numeric(df[[var]]) * -1
+  ))
+}
+
+# Calculate mean of all 8 variables (4 reversed + 4 original)
+all_aot_vars <- c("aot01", "aot02", "aot03", "aot04_r_reversed", "aot05_r_reversed", 
+                  "aot06_r_reversed", "aot07_r_reversed", "aot08")
+
+# Convert to numeric matrix
+aot_matrix <- df[, all_aot_vars]
+aot_numeric <- apply(aot_matrix, 2, function(x) {
+  as.numeric(ifelse(x == "" | is.na(x), NA, x))
+})
+
+# Calculate mean
+df$aot_total <- rowMeans(aot_numeric, na.rm = TRUE)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(789)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  aot_vals <- df[i, all_aot_vars]
+  aot_nums <- as.numeric(ifelse(aot_vals == "" | is.na(aot_vals), NA, aot_vals))
+  expected_mean <- mean(aot_nums, na.rm = TRUE)
+  actual_mean <- df$aot_total[i]
+  cat(sprintf("  Row %d: aot_total = %s (expected: %s)\n", 
+              i, ifelse(is.na(actual_mean), "NA", round(actual_mean, 4)),
+              ifelse(is.na(expected_mean), "NA", round(expected_mean, 4))))
+}
+
+# =============================================================================
+# 5. PROCESS CRT QUESTIONS
+# =============================================================================
+
+cat("\n=== 4. PROCESSING CRT QUESTIONS ===\n\n")
+
+# Check variables exist
+check_vars_exist(c("crt01", "crt02", "crt03"), c("crt_correct", "crt_int"))
+
+# Initialize CRT variables
+df$crt_correct <- 0
+df$crt_int <- 0
+
+# CRT01: "5 cents" = correct (1,0), "10 cents" = intuitive (0,1), else (0,0)
+df$crt_correct <- ifelse(df$crt01 == "5 cents", 1, df$crt_correct)
+df$crt_int <- ifelse(df$crt01 == "10 cents", 1, df$crt_int)
+
+# CRT02: "5 minutes" = correct (1,0), "100 minutes" = intuitive (0,1), else (0,0)
+df$crt_correct <- ifelse(df$crt02 == "5 minutes", df$crt_correct + 1, df$crt_correct)
+df$crt_int <- ifelse(df$crt02 == "100 minutes", df$crt_int + 1, df$crt_int)
+
+# CRT03: "47 days" = correct (1,0), "24 days" = intuitive (0,1), else (0,0)
+df$crt_correct <- ifelse(df$crt03 == "47 days", df$crt_correct + 1, df$crt_correct)
+df$crt_int <- ifelse(df$crt03 == "24 days", df$crt_int + 1, df$crt_int)
+
+# Check expected values exist
+expected_crt01 <- c("5 cents", "10 cents")
+expected_crt02 <- c("5 minutes", "100 minutes")
+expected_crt03 <- c("47 days", "24 days")
+check_values_exist("crt01", expected_crt01)
+check_values_exist("crt02", expected_crt02)
+check_values_exist("crt03", expected_crt03)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(1011)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  cat(sprintf("  Row %d:\n", i))
+  cat(sprintf("    crt01 = %s -> crt_correct = %d, crt_int = %d\n", 
+              ifelse(is.na(df$crt01[i]) || df$crt01[i] == "", "NA/empty", df$crt01[i]),
+              ifelse(df$crt01[i] == "5 cents", 1, 0),
+              ifelse(df$crt01[i] == "10 cents", 1, 0)))
+  cat(sprintf("    crt02 = %s -> crt_correct = %d, crt_int = %d\n", 
+              ifelse(is.na(df$crt02[i]) || df$crt02[i] == "", "NA/empty", df$crt02[i]),
+              ifelse(df$crt02[i] == "5 minutes", 1, 0),
+              ifelse(df$crt02[i] == "100 minutes", 1, 0)))
+  cat(sprintf("    crt03 = %s -> crt_correct = %d, crt_int = %d\n", 
+              ifelse(is.na(df$crt03[i]) || df$crt03[i] == "", "NA/empty", df$crt03[i]),
+              ifelse(df$crt03[i] == "47 days", 1, 0),
+              ifelse(df$crt03[i] == "24 days", 1, 0)))
+  cat(sprintf("    Total: crt_correct = %d, crt_int = %d\n\n", 
+              df$crt_correct[i], df$crt_int[i]))
+}
+
+# =============================================================================
+# 6. CALCULATE icar_verbal
+# =============================================================================
+
+cat("\n=== 5. CALCULATING icar_verbal ===\n\n")
+
+# Check variables exist
+verbal_vars <- c("verbal01", "verbal02", "verbal03", "verbal04", "verbal05")
+check_vars_exist(verbal_vars, "icar_verbal")
+
+# Correct answers
+correct_verbal <- c("5", "8", "It's impossible to tell", "47", "Sunday")
+
+# Calculate proportion correct
+verbal_responses <- df[, verbal_vars]
+correct_count <- rowSums(
+  sapply(1:5, function(i) {
+    verbal_responses[, i] == correct_verbal[i]
+  }),
+  na.rm = TRUE
+)
+df$icar_verbal <- correct_count / 5
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(1213)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  responses <- df[i, verbal_vars]
+  correct <- sum(sapply(1:5, function(j) responses[j] == correct_verbal[j]), na.rm = TRUE)
+  prop <- correct / 5
+  cat(sprintf("  Row %d: Correct = %d/5, icar_verbal = %s\n", 
+              i, correct, round(prop, 4)))
+}
+
+# =============================================================================
+# 7. CALCULATE icar_matrix
+# =============================================================================
+
+cat("\n=== 6. CALCULATING icar_matrix ===\n\n")
+
+# Check variables exist
+matrix_vars <- c("matrix01", "matrix02", "matrix03", "matrix04", "matrix05")
+check_vars_exist(matrix_vars, "icar_matrix")
+
+# Correct answers
+correct_matrix <- c("D", "E", "B", "B", "D")
+
+# Calculate proportion correct
+matrix_responses <- df[, matrix_vars]
+correct_count <- rowSums(
+  sapply(1:5, function(i) {
+    matrix_responses[, i] == correct_matrix[i]
+  }),
+  na.rm = TRUE
+)
+df$icar_matrix <- correct_count / 5
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(1415)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  responses <- df[i, matrix_vars]
+  correct <- sum(sapply(1:5, function(j) responses[j] == correct_matrix[j]), na.rm = TRUE)
+  prop <- correct / 5
+  cat(sprintf("  Row %d: Correct = %d/5, icar_matrix = %s\n", 
+              i, correct, round(prop, 4)))
+}
+
+# =============================================================================
+# 8. CALCULATE icar_total
+# =============================================================================
+
+cat("\n=== 7. CALCULATING icar_total ===\n\n")
+
+# Check variables exist
+check_vars_exist(c(verbal_vars, matrix_vars), "icar_total")
+
+# Calculate proportion correct across all 10 items
+all_correct <- c(correct_verbal, correct_matrix)
+all_responses <- df[, c(verbal_vars, matrix_vars)]
+correct_count <- rowSums(
+  sapply(1:10, function(i) {
+    all_responses[, i] == all_correct[i]
+  }),
+  na.rm = TRUE
+)
+df$icar_total <- correct_count / 10
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(1617)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  responses <- df[i, c(verbal_vars, matrix_vars)]
+  correct <- sum(sapply(1:10, function(j) responses[j] == all_correct[j]), na.rm = TRUE)
+  prop <- correct / 10
+  cat(sprintf("  Row %d: Correct = %d/10, icar_total = %s\n", 
+              i, correct, round(prop, 4)))
+}
+
+# =============================================================================
+# 9. RECODE demo_sex TO sex
+# =============================================================================
+
+cat("\n=== 8. RECODING demo_sex TO sex ===\n\n")
+
+# Check variables exist
+check_vars_exist("demo_sex", "sex")
+
+# Check values
+sex_vals <- unique(df$demo_sex[!is.na(df$demo_sex) & df$demo_sex != ""])
+cat(paste("  Unique values in demo_sex:", paste(sex_vals, collapse = ", "), "\n"))
+
+# Recode: male = 0, female = 1, else = 2
+df$sex <- ifelse(
+  is.na(df$demo_sex) | df$demo_sex == "",
+  NA,
+  ifelse(
+    tolower(df$demo_sex) == "male",
+    0,
+    ifelse(
+      tolower(df$demo_sex) == "female",
+      1,
+      2
+    )
+  )
+)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(1819)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  source_val <- df$demo_sex[i]
+  target_val <- df$sex[i]
+  cat(sprintf("  Row %d: demo_sex = %s -> sex = %s\n", 
+              i, ifelse(is.na(source_val) || source_val == "", "NA/empty", source_val), 
+              ifelse(is.na(target_val), "NA", target_val)))
+}
+
+# =============================================================================
+# 10. RECODE demo_edu TO education
+# =============================================================================
+
+cat("\n=== 9. RECODING demo_edu TO education ===\n\n")
+
+# Check variables exist
+check_vars_exist("demo_edu", "education")
+
+# Check values
+edu_vals <- unique(df$demo_edu[!is.na(df$demo_edu) & df$demo_edu != ""])
+cat(paste("  Unique values in demo_edu:", paste(edu_vals, collapse = ", "), "\n"))
+
+# Recode
+df$education <- ifelse(
+  is.na(df$demo_edu) | df$demo_edu == "",
+  NA,
+  ifelse(
+    df$demo_edu %in% c("High School (or equivalent)", "Trade School"),
+    "HS_TS",
+    ifelse(
+      df$demo_edu %in% c("College Diploma/Certificate", "University - Undergraduate"),
+      "C_Ug",
+      ifelse(
+        df$demo_edu %in% c("University - Graduate (Masters)", "University - PhD", "Professional Degree (ex. JD/MD)"),
+        "grad_prof",
+        NA
+      )
+    )
+  )
+)
+
+# Convert to ordered factor
+df$education <- factor(df$education, 
+                       levels = c("HS_TS", "C_Ug", "grad_prof"),
+                       ordered = TRUE)
+
+# Verification check
+cat("\nVerification (5 random rows):\n")
+set.seed(2021)
+sample_rows <- sample(1:nrow(df), min(5, nrow(df)))
+for (i in sample_rows) {
+  source_val <- df$demo_edu[i]
+  target_val <- df$education[i]
+  cat(sprintf("  Row %d: demo_edu = %s -> education = %s\n", 
+              i, ifelse(is.na(source_val) || source_val == "", "NA/empty", source_val), 
+              ifelse(is.na(target_val), "NA", as.character(target_val))))
+}
+
+# =============================================================================
+# 11. SAVE UPDATED DATA
+# =============================================================================
+# COMMENTED OUT: Uncomment when ready to save
+
+# write.csv(df, "eohi3.csv", row.names = FALSE, na = "")
+# cat("\nUpdated data saved to: eohi3.csv\n")
+# cat(paste("Total rows:", nrow(df), "\n"))
+# cat(paste("Total columns:", ncol(df), "\n"))
+
--- a/eohi3/eohi3.csv
+++ b/eohi3/eohi3.csv
--- a/eohi3/knit/DA01_anova_DS.Rmd
+++ b/eohi3/knit/DA01_anova_DS.Rmd
@ -0,0 +1,501 @@
+---
+title: "Mixed ANOVA - Domain Specific Means (DA01)"
+author: ""
+date: "`r Sys.Date()`"
+output:
+  html_document:
+    toc: true
+    toc_float: true
+    code_folding: hide
+---
+
+```{r setup, include = FALSE}
+knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
+```
+
+# Setup
+
+```{r libraries}
+library(tidyverse)
+library(rstatix)
+library(emmeans)
+library(effectsize)
+library(afex)
+library(car)
+
+options(scipen = 999)
+afex::set_sum_contrasts()
+```
+
+# Data
+
+```{r read-data}
+# Data file is in parent of knit/ (eohi3/eohi3.csv)
+df <- read.csv(
+  "/home/ladmin/Documents/DND/EOHI/eohi3/eohi3.csv",
+  stringsAsFactors = FALSE,
+  check.names = FALSE,
+  na.strings = "NA"
+)
+
+between_vars <- c("perspective", "temporalDO")
+within_vars <- c(
+  "past_pref_MEAN", "past_pers_MEAN", "past_val_MEAN",
+  "fut_pref_MEAN",  "fut_pers_MEAN",  "fut_val_MEAN"
+)
+
+missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
+if (length(missing_vars) > 0) {
+  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
+}
+
+anova_data <- df %>%
+  select(pID, all_of(between_vars), all_of(within_vars)) %>%
+  filter(
+    !is.na(perspective), perspective != "",
+    !is.na(temporalDO),  temporalDO  != ""
+  )
+```
+
+# Long format
+
+```{r long-format}
+long_data <- anova_data %>%
+  pivot_longer(
+    cols = all_of(within_vars),
+    names_to = "variable",
+    values_to = "MEAN_SCORE"
+  ) %>%
+  mutate(
+    time = case_when(
+      grepl("^past_", variable) ~ "past",
+      grepl("^fut_",  variable) ~ "fut",
+      TRUE ~ NA_character_
+    ),
+    domain = case_when(
+      grepl("_pref_MEAN$", variable) ~ "pref",
+      grepl("_pers_MEAN$", variable) ~ "pers",
+      grepl("_val_MEAN$",  variable) ~ "val",
+      TRUE ~ NA_character_
+    )
+  ) %>%
+  mutate(
+    TIME       = factor(time,   levels = c("past", "fut")),
+    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
+    perspective = factor(perspective),
+    temporalDO  = factor(temporalDO),
+    pID         = factor(pID)
+  ) %>%
+  select(pID, perspective, temporalDO, TIME, DOMAIN, MEAN_SCORE) %>%
+  filter(!is.na(MEAN_SCORE))
+```
+
+# Descriptive statistics
+
+```{r desc-stats}
+desc_stats <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n        = n(),
+    mean     = round(mean(MEAN_SCORE), 5),
+    variance = round(var(MEAN_SCORE), 5),
+    sd       = round(sd(MEAN_SCORE), 5),
+    median   = round(median(MEAN_SCORE), 5),
+    q1       = round(quantile(MEAN_SCORE, 0.25), 5),
+    q3       = round(quantile(MEAN_SCORE, 0.75), 5),
+    min      = round(min(MEAN_SCORE), 5),
+    max      = round(max(MEAN_SCORE), 5),
+    .groups  = "drop"
+  )
+
+# Show all rows and columns (no truncation)
+options(tibble.width = Inf)
+print(desc_stats, n = Inf)
+```
+
+Interpretations:  
+1. Mean and median values are similar, indicating distribution is relatively symmetric and any skew is minimal. Any outliers are not extreme.  
+2. Highest to lowest group n size ratio is 1.14 (139/122). Acceptable ratio for ANOVA (under 1.5).  
+3. Highest to lowest overall group variance ratio is 1.67 (7.93/4.74). Acceptable ratio for ANOVA (under 4).  
+   For the sake  of consistency w/ the other EHI studies, I also calculated Hartley's F-max ratio.  
+   The conservative F-max critical value is 1.60, which is still higher than the highest observed F-max ratio of 1.53.  
+
+# Assumption checks
+
+## Missing values
+
+```{r missing}
+missing_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n_total     = n(),
+    n_missing   = sum(is.na(MEAN_SCORE)),
+    pct_missing = round(100 * n_missing / n_total, 2),
+    .groups     = "drop"
+  )
+
+print(missing_summary, n = Inf)
+```
+
+No missing values. As expected. 
+
+## Outliers
+
+```{r outliers}
+outlier_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n          = n(),
+    n_outliers = sum(abs(scale(MEAN_SCORE)) > 3),
+    .groups    = "drop"
+  )
+
+print(outlier_summary, n = Inf)
+```
+
+No outliers present. Good. 
+
+## Homogeneity of variance
+
+```{r homogeneity}
+homogeneity_between <- long_data %>%
+  group_by(TIME, DOMAIN) %>%
+  rstatix::levene_test(MEAN_SCORE ~ perspective * temporalDO)
+
+print(homogeneity_between, n = Inf)
+```
+
+Levene's test is sigifnicant for two cells: fut-pers and fut-val. 
+However, variance ratios and F-max tests show that any heteroscedasticity is mild. 
+
+## Normality (within-subjects residuals)
+
+```{r normality}
+resid_within <- long_data %>%
+  group_by(pID) %>%
+  mutate(person_mean = mean(MEAN_SCORE, na.rm = TRUE)) %>%
+  ungroup() %>%
+  mutate(resid = MEAN_SCORE - person_mean) %>%
+  pull(resid)
+resid_within <- resid_within[!is.na(resid_within)]
+
+n_resid <- length(resid_within)
+if (n_resid < 3L) {
+  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
+} else {
+  resid_for_shapiro <- if (n_resid > 5000L) {
+    set.seed(1L)
+    sample(resid_within, 5000L)
+  } else {
+    resid_within
+  }
+  print(shapiro.test(resid_for_shapiro))
+}
+```
+
+### Q-Q plot
+
+```{r qqplot, fig.height = 4}
+qqnorm(resid_within)
+qqline(resid_within)
+```
+
+Shapiro-Wilk is significant but is sensitive to large sample size. 
+QQ plot shows that centre residuals are normally distributed, with some tail heaviness. 
+ANOVA is robust to violations of normality w/ large sample size. 
+
+Overall, ANOVA can proceed. 
+
+# Mixed ANOVA
+
+```{r anova}
+aov_afex <- aov_ez(
+  id      = "pID",
+  dv      = "MEAN_SCORE",
+  data    = long_data,
+  between = c("perspective", "temporalDO"),
+  within  = c("TIME", "DOMAIN"),
+  type    = 3
+)
+
+cat("\n--- ANOVA Table (Type 3, uncorrected) ---\n")
+print(nice(aov_afex, correction = "none"))
+cat("\n--- ANOVA Table (Type 3, Greenhouse–Geisser correction) ---\n")
+print(nice(aov_afex, correction = "GG"))
+```
+
+Mauchly's test of sphericity is sig for DOMAIN main effect and interaction (except w/ TIME). Use GG correction for these:  
+- 8                              DOMAIN 1.94, 1004.66  1.21     0.63 <.001, p = .529  
+## 9                  perspective:DOMAIN 1.94, 1004.66  1.21 7.79 *** <.001, p <.001  
+## 10                  temporalDO:DOMAIN 1.94, 1004.66  1.21     0.76 <.001, p = .466  
+## 11      perspective:temporalDO:DOMAIN 1.94, 1004.66  1.21     0.17 <.001, p = .837  
+
+
+The following are significant main effects and interactions:  
+## 15 perspective:temporalDO:TIME:DOMAIN 2, 1036  0.75   3.11 * <.001    .045  
+## 13            perspective:TIME:DOMAIN 2, 1036  0.75   3.58 * <.001    .028  
+## 9                  perspective:DOMAIN 1.94, 1004.66  1.21 7.79 *** <.001, p <.001 (GG corrected)   
+## 6                     temporalDO:TIME  1, 518  1.86  9.81 ** <.001    .002  
+## 7         perspective:temporalDO:TIME  1, 518  1.86  7.91 ** <.001    .005  
+## 4                                TIME  1, 518  1.86 10.05 ** <.001    .002  
+
+
+# Mauchly and epsilon
+
+```{r mauchly}
+anova_wide <- anova_data %>%
+  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
+  filter(if_all(all_of(within_vars), ~ !is.na(.)))
+response_matrix <- as.matrix(anova_wide[, within_vars])
+rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
+idata <- data.frame(
+  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
+  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
+)
+rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
+rm_summary <- summary(rm_anova, multivariate = FALSE)
+if (!is.null(rm_summary$sphericity.tests)) {
+  cat("\nMauchly's Test of Sphericity:\n")
+  print(rm_summary$sphericity.tests)
+}
+if (!is.null(rm_summary$epsilon)) {
+  cat("\nEpsilon (GG, HF):\n")
+  print(rm_summary$epsilon)
+}
+```
+
+# Post-hoc comparisons
+
+## TIME (main effect)
+
+```{r posthoc-TIME}
+emm_TIME <- emmeans(aov_afex, ~ TIME)
+print(pairs(emm_TIME, adjust = "bonferroni"))
+```
+
+Pairwise comparison provide supprot for EHI effect. 
+
+## temporalDO:TIME
+
+```{r posthoc-temporalDO-TIME}
+emm_temporalDO_TIME <- emmeans(aov_afex, ~ TIME | temporalDO)
+print(pairs(emm_temporalDO_TIME, adjust = "bonferroni"))
+```
+
+Contrast significant only for temporal display order of past first, then future. 
+When grouped by time instead of temporalDO, no contrasts are significant. 
+
+## perspective:temporalDO:TIME
+
+```{r posthoc-pt-TIME}
+emm_pt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO)
+print(pairs(emm_pt_TIME, adjust = "bonferroni"))
+```
+
+EHI is significant only for self perspective and past first temporal display order.  
+
+When grouped by perspective or temporalDO instead of TIME, no contrasts are significant. 
+
+## perspective:DOMAIN
+
+```{r posthoc-perspective-DOMAIN}
+emm_perspective_DOMAIN <- emmeans(aov_afex, ~ perspective | DOMAIN)
+print(pairs(emm_perspective_DOMAIN, adjust = "bonferroni"))
+emm_perspective_DOMAIN_domain <- emmeans(aov_afex, ~ DOMAIN | perspective)
+print(pairs(emm_perspective_DOMAIN_domain, adjust = "bonferroni"))
+```
+
+significance is driven by the change from preferences to values in the "other" perspective. 
+
+## perspective:TIME:DOMAIN
+
+```{r posthoc-pt-DOMAIN}
+emm_pt_TIME_domain <- emmeans(aov_afex, ~ TIME | perspective + DOMAIN)
+print(pairs(emm_pt_TIME_domain, adjust = "bonferroni"))
+```
+
+EHI effects present for other-perspective in the preferences domain and for self-perspective in the values domain.  
+Estimate is higher in the self-perspective than in the other-perspective.  
+
+```{r posthoc-pt-DOMAIN-domain}
+emm_pt_domain_domain <- emmeans(aov_afex, ~ DOMAIN | perspective + TIME)
+print(pairs(emm_pt_domain_domain, adjust = "bonferroni"))
+```
+
+Significant contrasts are driven by domain changes from preferences to values in the self vs other perspectives, in the past-oriented questions.  
+Trends reverse depending on perspective, where values have higher estimates than preferences in the self-perspective, but lower estimates than preferences in the other-perspective.
+
+## perspective:temporalDO:TIME:DOMAIN
+
+```{r posthoc-ptt-TIME}
+emm_ptt_TIME <- emmeans(aov_afex, ~ TIME | perspective + temporalDO + DOMAIN)
+print(pairs(emm_ptt_TIME, adjust = "bonferroni"))
+```
+EHI effects are present for three contrasts:  
+- past - fut   0.2806 0.118 518   2.380  0.0177 (other-perspective, preferences domain, past-first temporal display order)  
+- past - fut   0.4358 0.138 518   3.156  0.0017 (self-perspective, personality domain, past-first temporal display order)  
+- past - fut   0.7276 0.141 518   5.169 <0.0001 (self-perspective, values domain, past-first temporal display order)  
+
+A reverse EHI effect is present for 1 contrast:  
+- past - fut  -0.2367 0.118 518  -2.001  0.0459 (self-personality, preferences domain, future-first temporal display order)
+
+```{r posthoc-ptt-perspective}
+emm_ptt_perspective <- emmeans(aov_afex, ~ perspective | temporalDO + TIME + DOMAIN)
+print(pairs(emm_ptt_perspective, adjust = "bonferroni"))
+```
+1 significant contrast:  
+- other - self  -0.6972 0.314 518  -2.220  0.0268 (values domain, past-oriented questions, past-first temporal display order)
+
+
+not really of theoretical interest but speaks to the perspective:TIME:DOMAIN interaction. 
+
+no significant contrasts when grouped by temporalDO instead of TIME or perspective.
+
+## Cohen's d (significant contrasts only)
+
+```{r cohens-d-significant}
+d_data <- anova_data %>%
+  mutate(
+    past_mean = (past_pref_MEAN + past_pers_MEAN + past_val_MEAN) / 3,
+    fut_mean  = (fut_pref_MEAN + fut_pers_MEAN + fut_val_MEAN) / 3,
+    pref_mean = (past_pref_MEAN + fut_pref_MEAN) / 2,
+    pers_mean = (past_pers_MEAN + fut_pers_MEAN) / 2,
+    val_mean  = (past_val_MEAN + fut_val_MEAN) / 2
+  )
+
+cohens_d_results <- tibble(
+  contrast = character(),
+  condition = character(),
+  d = double()
+)
+
+# TIME main: past vs fut
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "overall",
+    d = suppressMessages(effectsize::cohens_d(d_data$past_mean, d_data$fut_mean, paired = TRUE)$Cohens_d)
+  )
+
+# temporalDO:TIME: past vs fut for temporalDO = past
+d_past_tdo <- d_data %>% filter(temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "temporalDO = past",
+    d = suppressMessages(effectsize::cohens_d(d_past_tdo$past_mean, d_past_tdo$fut_mean, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:temporalDO:TIME: past vs fut for self, past temporalDO
+d_self_past <- d_data %>% filter(perspective == "self", temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, temporalDO = past",
+    d = suppressMessages(effectsize::cohens_d(d_self_past$past_mean, d_self_past$fut_mean, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:DOMAIN: pref vs val for perspective = other
+d_other <- d_data %>% filter(perspective == "other")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "perspective = other",
+    d = suppressMessages(effectsize::cohens_d(d_other$pref_mean, d_other$val_mean, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:TIME:DOMAIN - TIME: other, pref
+d_other_pref <- d_data %>% filter(perspective == "other")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "other, pref domain",
+    d = suppressMessages(effectsize::cohens_d(d_other_pref$past_pref_MEAN, d_other_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:TIME:DOMAIN - TIME: self, val
+d_self_val <- d_data %>% filter(perspective == "self")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, val domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_val$past_val_MEAN, d_self_val$fut_val_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:TIME:DOMAIN - DOMAIN: other, past TIME
+d_other_past <- d_data %>% filter(perspective == "other")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "other, past TIME",
+    d = suppressMessages(effectsize::cohens_d(d_other_past$past_pref_MEAN, d_other_past$past_val_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:TIME:DOMAIN - DOMAIN: self, past TIME: pref - pers
+d_self_past_t <- d_data %>% filter(perspective == "self")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - pers)",
+    condition = "self, past TIME",
+    d = suppressMessages(effectsize::cohens_d(d_self_past_t$past_pref_MEAN, d_self_past_t$past_pers_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# perspective:TIME:DOMAIN - DOMAIN: self, past TIME: pref - val
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "self, past TIME",
+    d = suppressMessages(effectsize::cohens_d(d_self_past_t$past_pref_MEAN, d_self_past_t$past_val_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: self, future temporalDO, pref (reverse EHI)
+d_self_fut_pref <- d_data %>% filter(perspective == "self", temporalDO == "future")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut) [reverse]",
+    condition = "self, future temporalDO, pref domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_fut_pref$past_pref_MEAN, d_self_fut_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: other, past temporalDO, pref
+d_other_past_pref <- d_data %>% filter(perspective == "other", temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "other, past temporalDO, pref domain",
+    d = suppressMessages(effectsize::cohens_d(d_other_past_pref$past_pref_MEAN, d_other_past_pref$fut_pref_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: self, past temporalDO, pers
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, past temporalDO, pers domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_past$past_pers_MEAN, d_self_past$fut_pers_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: self, past temporalDO, val
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, past temporalDO, val domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_past$past_val_MEAN, d_self_past$fut_val_MEAN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way perspective: past temporalDO, past TIME, val domain (other - self, between-subjects)
+d_ptt_val <- d_data %>%
+  filter(temporalDO == "past") %>%
+  select(perspective, past_val_MEAN)
+d_other_ptt <- d_ptt_val %>% filter(perspective == "other") %>% pull(past_val_MEAN)
+d_self_ptt <- d_ptt_val %>% filter(perspective == "self") %>% pull(past_val_MEAN)
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "perspective (other - self)",
+    condition = "past temporalDO, past TIME, val domain",
+    d = suppressMessages(effectsize::cohens_d(d_other_ptt, d_self_ptt, paired = FALSE)$Cohens_d)
+  )
+
+cohens_d_results %>%
+  mutate(d = round(d, 3)) %>%
+  print(n = Inf)
+```
--- a/eohi3/knit/DA01_anova_DS.html
+++ b/eohi3/knit/DA01_anova_DS.html
--- a/eohi3/knit/DA02_anova_DGEN.html
+++ b/eohi3/knit/DA02_anova_DGEN.html
--- a/eohi3/knit/DA02_anova_DGEN.rmd
+++ b/eohi3/knit/DA02_anova_DGEN.rmd
@ -0,0 +1,434 @@
+---
+title: "Mixed ANOVA - Domain General Vars"
+author: ""
+date: "`r Sys.Date()`"
+output:
+  html_document:
+    toc: true
+    toc_float: true
+    code_folding: hide
+---
+
+```{r setup, include = FALSE}
+knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
+```
+
+# Setup
+
+```{r libraries}
+library(tidyverse)
+library(rstatix)
+library(emmeans)
+library(effectsize)
+library(afex)
+library(car)
+
+options(scipen = 999)
+afex::set_sum_contrasts()
+```
+
+# Data
+
+```{r read-data}
+# Data file is in parent of knit/ (eohi3/eohi3.csv)
+df <- read.csv(
+  "/home/ladmin/Documents/DND/EOHI/eohi3/eohi3.csv",
+  stringsAsFactors = FALSE,
+  check.names = FALSE,
+  na.strings = "NA"
+)
+
+between_vars <- c("perspective", "temporalDO")
+within_vars <- c(
+  "past_pref_DGEN", "past_pers_DGEN", "past_val_DGEN",
+  "fut_pref_DGEN",  "fut_pers_DGEN",  "fut_val_DGEN"
+)
+
+missing_vars <- setdiff(c(between_vars, within_vars, "pID"), names(df))
+if (length(missing_vars) > 0) {
+  stop(paste("Missing required variables:", paste(missing_vars, collapse = ", ")))
+}
+
+anova_data <- df %>%
+  select(pID, all_of(between_vars), all_of(within_vars)) %>%
+  filter(
+    !is.na(perspective), perspective != "",
+    !is.na(temporalDO),  temporalDO  != ""
+  )
+```
+
+# Long format
+
+```{r long-format}
+long_data <- anova_data %>%
+  pivot_longer(
+    cols = all_of(within_vars),
+    names_to = "variable",
+    values_to = "DGEN_SCORE"
+  ) %>%
+  mutate(
+    time = case_when(
+      grepl("^past_", variable) ~ "past",
+      grepl("^fut_",  variable) ~ "fut",
+      TRUE ~ NA_character_
+    ),
+    domain = case_when(
+      grepl("_pref_DGEN$", variable) ~ "pref",
+      grepl("_pers_DGEN$", variable) ~ "pers",
+      grepl("_val_DGEN$",  variable) ~ "val",
+      TRUE ~ NA_character_
+    )
+  ) %>%
+  mutate(
+    TIME       = factor(time,   levels = c("past", "fut")),
+    DOMAIN     = factor(domain, levels = c("pref", "pers", "val")),
+    perspective = factor(perspective),
+    temporalDO  = factor(temporalDO),
+    pID         = factor(pID)
+  ) %>%
+  select(pID, perspective, temporalDO, TIME, DOMAIN, DGEN_SCORE) %>%
+  filter(!is.na(DGEN_SCORE))
+```
+
+# Descriptive statistics
+
+```{r desc-stats}
+desc_stats <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n        = n(),
+    mean     = round(mean(DGEN_SCORE), 5),
+    variance = round(var(DGEN_SCORE), 5),
+    sd       = round(sd(DGEN_SCORE), 5),
+    median   = round(median(DGEN_SCORE), 5),
+    q1       = round(quantile(DGEN_SCORE, 0.25), 5),
+    q3       = round(quantile(DGEN_SCORE, 0.75), 5),
+    min      = round(min(DGEN_SCORE), 5),
+    max      = round(max(DGEN_SCORE), 5),
+    .groups  = "drop"
+  )
+
+# Show all rows and columns (no truncation)
+options(tibble.width = Inf)
+print(desc_stats, n = Inf)
+```
+
+Interpretation:  
+1. Mean and median values are similar w/ slightly more variation than in the domain specific anova.  
+2. Highest to lowest group n size ratio is 1.14 (139/122). Acceptable ratio for ANOVA (under 1.5).  
+3. Highest to lowest overall group variance ratio is 1.40 (9.32/6.65). Acceptable ratio for ANOVA (under 4).  
+   For the sake  of consistency w/ the other EHI studies, I also calculated Hartley's F-max ratio.  
+   The conservative F-max critical value is 1.60 (same as DS anova since number of groups and n sizes doesn't change), which is still higher than the highest observed F-max ratio of 1.28.  
+
+# Assumption checks
+
+## Missing values
+
+```{r missing}
+missing_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n_total     = n(),
+    n_missing   = sum(is.na(DGEN_SCORE)),
+    pct_missing = round(100 * n_missing / n_total, 2),
+    .groups     = "drop"
+  )
+
+print(missing_summary, n = Inf)
+```
+
+No missing values. As expected. 
+
+## Outliers
+
+```{r outliers}
+outlier_summary <- long_data %>%
+  group_by(perspective, temporalDO, TIME, DOMAIN) %>%
+  summarise(
+    n          = n(),
+    n_outliers = sum(abs(scale(DGEN_SCORE)) > 3),
+    .groups    = "drop"
+  )
+
+print(outlier_summary, n = Inf)
+```
+
+No outliers present. Good. Same as DS anova.
+
+## Homogeneity of variance
+
+```{r homogeneity}
+homogeneity_between <- long_data %>%
+  group_by(TIME, DOMAIN) %>%
+  rstatix::levene_test(DGEN_SCORE ~ perspective * temporalDO)
+
+print(homogeneity_between, n = Inf)
+```
+
+Levene's test is signiicant for 1 cell only: past-val. However, variance ratios and F-max tests show that any heteroscedasticity is mild.
+
+## Normality (within-subjects residuals)
+
+```{r normality}
+resid_within <- long_data %>%
+  group_by(pID) %>%
+  mutate(person_mean = mean(DGEN_SCORE, na.rm = TRUE)) %>%
+  ungroup() %>%
+  mutate(resid = DGEN_SCORE - person_mean) %>%
+  pull(resid)
+resid_within <- resid_within[!is.na(resid_within)]
+
+n_resid <- length(resid_within)
+if (n_resid < 3L) {
+  message("Too few within-subjects residuals (n < 3); skipping Shapiro-Wilk.")
+} else {
+  resid_for_shapiro <- if (n_resid > 5000L) {
+    set.seed(1L)
+    sample(resid_within, 5000L)
+  } else {
+    resid_within
+  }
+  print(shapiro.test(resid_for_shapiro))
+}
+```
+
+### Q-Q plot
+
+```{r qqplot, fig.height = 4}
+qqnorm(resid_within)
+qqline(resid_within)
+```
+
+Shapiro-Wilk test is significant but is sensitive to large sample size.  
+QQ plot shows that strict centre residuals are normally distributed, but there is some deviation from normality. 
+ANOVA is robust to violations of normality w/ large sample size. 
+
+Overall, ANOVA can proceed. 
+
+# Mixed ANOVA
+
+```{r anova}
+aov_afex <- aov_ez(
+  id      = "pID",
+  dv      = "DGEN_SCORE",
+  data    = long_data,
+  between = c("perspective", "temporalDO"),
+  within  = c("TIME", "DOMAIN"),
+  type    = 3,
+  anova_table = list(correction = "none")
+)
+
+print(aov_afex)
+```
+
+Mauchly's test of sphericity is not significant. Using uncorrected values for interpretation and analysis.  
+
+
+Significant main effects and interactions:  
+                                Effect        df   MSE        F   ges   p
+4                                TIME     1, 518  3.11  8.39 **  .001   .004  
+8                              DOMAIN    2, 1036  2.13 7.85 ***  .001   <.001  
+10                  temporalDO:DOMAIN    2, 1036  2.13  5.00 ** <.001   .007  
+15 perspective:temporalDO:TIME:DOMAIN    2, 1036  1.52   3.12 * <.001   .045  
+
+
+# Mauchly and epsilon
+
+```{r mauchly}
+anova_wide <- anova_data %>%
+  select(pID, perspective, temporalDO, all_of(within_vars)) %>%
+  filter(if_all(all_of(within_vars), ~ !is.na(.)))
+response_matrix <- as.matrix(anova_wide[, within_vars])
+rm_model <- lm(response_matrix ~ perspective * temporalDO, data = anova_wide)
+idata <- data.frame(
+  TIME   = factor(rep(c("past", "fut"), each = 3), levels = c("past", "fut")),
+  DOMAIN = factor(rep(c("pref", "pers", "val"), 2), levels = c("pref", "pers", "val"))
+)
+rm_anova <- car::Anova(rm_model, idata = idata, idesign = ~ TIME * DOMAIN, type = 3)
+rm_summary <- summary(rm_anova, multivariate = FALSE)
+if (!is.null(rm_summary$sphericity.tests)) {
+  print(rm_summary$sphericity.tests)
+}
+if (!is.null(rm_summary$epsilon)) {
+  print(rm_summary$epsilon)
+}
+```
+
+# Post hoc comparisons
+
+## TIME (main effect)
+
+```{r posthoc-TIME}
+emm_TIME <- emmeans(aov_afex, ~ TIME)
+print(pairs(emm_TIME, adjust = "none"))
+```
+
+Supports presence of EHI effect.
+
+## DOMAIN (main effect)
+
+```{r posthoc-domain}
+emm_DOMAIN <- emmeans(aov_afex, ~ DOMAIN)
+print(pairs(emm_DOMAIN, adjust = "tukey"))
+```
+
+Only preference to values contrast is significant.
+
+## temporalDO:DOMAIN
+
+```{r posthoc-temporaldo-domain}
+emmeans(aov_afex, pairwise ~ temporalDO | DOMAIN, adjust = "none")$contrasts
+emmeans(aov_afex, pairwise ~ DOMAIN | temporalDO, adjust = "tukey")$contrasts
+```
+
+When grouped by domain, no contrasts are significant.
+
+
+When grouped by temporalDO, some contrasts are significant: 
+
+Future-first temporal display order:  
+contrast    estimate     SE  df t.ratio p.value  
+pref - pers  0.25065 0.0892 518   2.810  0.0142
+
+
+Past-first temporal display order:  
+contrast    estimate     SE  df t.ratio p.value  
+pref - val   0.33129 0.0895 518   3.702  0.0007  
+pers - val   0.32478 0.0921 518   3.527  0.0013
+
+## perspective:temporalDO:TIME:DOMAIN
+
+### contrasts for TIME grouped by perspective, temporalDO, and DOMAIN
+```{r posthoc-fourway}
+emm_fourway <- emmeans(aov_afex, pairwise ~ TIME | perspective * temporalDO * DOMAIN, adjust = "tukey")
+print(emm_fourway$contrasts)
+```
+
+Significant contrasts: 
+
+contrast    estimate     SE  df t.ratio p.value  
+ past - fut   0.5285 0.179 518   2.957  0.0032 (self-perspective, personality domain, past-first temporal display order)  
+ past - fut   0.5366 0.187 518   2.863  0.0044 (self-perspective, values domain, past-first temporal display order)
+
+ ### contrasts for DOMAIN grouped by perspective, TIME, and temporalDO
+```{r posthoc-fourway2}
+emm_fourway2 <- emmeans(aov_afex, pairwise ~ DOMAIN | perspective * TIME * temporalDO, adjust = "tukey")
+print(emm_fourway2$contrasts)
+```
+
+Significant contrasts:  
+
+contrast    estimate     SE  df t.ratio p.value  
+ pref - val    0.6259 0.166 518   3.778  0.0005 (other-perspective, past-directed questions, past-first temporal display order)  
+ pers - val    0.4892 0.160 518   3.056  0.0066 (other-perspective, past-directed questions, past-first temporal display order)  
+ pref - val    0.4309 0.168 518   2.559  0.0290 (self-perspective, future-directed questions, past-first temporal display order)
+
+## Cohen's d (significant contrasts only)
+
+```{r cohens-d-significant}
+d_data <- anova_data %>%
+  mutate(
+    past_mean = (past_pref_DGEN + past_pers_DGEN + past_val_DGEN) / 3,
+    fut_mean  = (fut_pref_DGEN + fut_pers_DGEN + fut_val_DGEN) / 3,
+    pref_mean = (past_pref_DGEN + fut_pref_DGEN) / 2,
+    pers_mean = (past_pers_DGEN + fut_pers_DGEN) / 2,
+    val_mean  = (past_val_DGEN + fut_val_DGEN) / 2
+  )
+
+cohens_d_results <- tibble(
+  contrast = character(),
+  condition = character(),
+  d = double()
+)
+
+# TIME main: past vs fut
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "overall",
+    d = suppressMessages(effectsize::cohens_d(d_data$past_mean, d_data$fut_mean, paired = TRUE)$Cohens_d)
+  )
+
+# DOMAIN main: pref vs val
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "overall",
+    d = suppressMessages(effectsize::cohens_d(d_data$pref_mean, d_data$val_mean, paired = TRUE)$Cohens_d)
+  )
+
+# temporalDO:DOMAIN - future: pref vs pers
+d_fut <- d_data %>% filter(temporalDO == "future")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - pers)",
+    condition = "temporalDO = future",
+    d = suppressMessages(effectsize::cohens_d(d_fut$pref_mean, d_fut$pers_mean, paired = TRUE)$Cohens_d)
+  )
+
+# temporalDO:DOMAIN - past: pref vs val, pers vs val
+d_past <- d_data %>% filter(temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "temporalDO = past",
+    d = suppressMessages(effectsize::cohens_d(d_past$pref_mean, d_past$val_mean, paired = TRUE)$Cohens_d)
+  ) %>%
+  add_row(
+    contrast = "DOMAIN (pers - val)",
+    condition = "temporalDO = past",
+    d = suppressMessages(effectsize::cohens_d(d_past$pers_mean, d_past$val_mean, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: self, past temporalDO, pers
+d_self_past <- d_data %>% filter(perspective == "self", temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, past temporalDO, pers domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_past$past_pers_DGEN, d_self_past$fut_pers_DGEN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way TIME: self, past temporalDO, val
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "TIME (past - fut)",
+    condition = "self, past temporalDO, val domain",
+    d = suppressMessages(effectsize::cohens_d(d_self_past$past_val_DGEN, d_self_past$fut_val_DGEN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way DOMAIN: other, past TIME, past temporalDO - pref vs val
+d_other_past_tpast <- d_data %>% filter(perspective == "other", temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "other, past TIME, past temporalDO",
+    d = suppressMessages(effectsize::cohens_d(d_other_past_tpast$past_pref_DGEN, d_other_past_tpast$past_val_DGEN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way DOMAIN: other, past TIME, past temporalDO - pers vs val
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pers - val)",
+    condition = "other, past TIME, past temporalDO",
+    d = suppressMessages(effectsize::cohens_d(d_other_past_tpast$past_pers_DGEN, d_other_past_tpast$past_val_DGEN, paired = TRUE)$Cohens_d)
+  )
+
+# 4-way DOMAIN: self, fut TIME, past temporalDO - pref vs val
+d_self_fut_tpast <- d_data %>% filter(perspective == "self", temporalDO == "past")
+cohens_d_results <- cohens_d_results %>%
+  add_row(
+    contrast = "DOMAIN (pref - val)",
+    condition = "self, fut TIME, past temporalDO",
+    d = suppressMessages(effectsize::cohens_d(d_self_fut_tpast$fut_pref_DGEN, d_self_fut_tpast$fut_val_DGEN, paired = TRUE)$Cohens_d)
+  )
+
+cohens_d_results %>%
+  mutate(d = round(d, 3)) %>%
+  print(n = Inf)
+```  
+
+Size	d	Interpretation  
+Small	0.2	Weak effect  
+Medium	0.5	Moderate effect  
+Large	0.8	Strong effect  
--- a/eohi3/test-DEC29/eohi3-test.csv
+++ b/eohi3/test-DEC29/eohi3-test.csv
--- a/review/Diener_lifeScale.pdf
+++ b/review/Diener_lifeScale.pdf
--- a/review/brietzke_ehi2.pdf
+++ b/review/brietzke_ehi2.pdf
--- a/review/carmen_ehi2.pdf
+++ b/review/carmen_ehi2.pdf
--- a/review/fleming_pro-retro.pdf
+++ b/review/fleming_pro-retro.pdf
--- a/review/guo_ehi2.pdf
+++ b/review/guo_ehi2.pdf
--- a/review/gutral_ehi2.pdf
+++ b/review/gutral_ehi2.pdf
--- a/review/haas_ehi1.pdf
+++ b/review/haas_ehi1.pdf
--- a/review/haddock_futResearch.pdf
+++ b/review/haddock_futResearch.pdf
--- a/review/harris_ehi1.pdf
+++ b/review/harris_ehi1.pdf
--- a/review/hershfield_FUTforecast.pdf
+++ b/review/hershfield_FUTforecast.pdf
--- a/review/lechner_valueScale.pdf
+++ b/review/lechner_valueScale.pdf
--- a/review/pdf_to_txt.py
+++ b/review/pdf_to_txt.py
@ -0,0 +1,128 @@
+#!/home/ladmin/miniconda3/envs/nlp/bin/python
+"""
+PDF to Text Converter
+Converts PDF files to plain text files.
+
+Usage:
+    python pdf_to_txt.py <input.pdf>                    # Creates input.txt
+    python pdf_to_txt.py <input.pdf> <output.txt>       # Custom output name
+    python pdf_to_txt.py --all                           # Convert all PDFs in current directory
+
+Requirements:
+    pip install pypdf
+"""
+
+import sys
+import os
+from pathlib import Path
+
+try:
+    from pypdf import PdfReader
+except ImportError:
+    print("Error: pypdf library not found.")
+    print("Please install it with: pip install pypdf")
+    sys.exit(1)
+
+
+def pdf_to_text(pdf_path, output_path=None):
+    """
+    Convert a PDF file to a text file.
+    
+    Args:
+        pdf_path: Path to the PDF file
+        output_path: Path to the output text file (optional)
+    
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        # Convert to Path objects
+        pdf_path = Path(pdf_path)
+        
+        if not pdf_path.exists():
+            print(f"Error: File not found: {pdf_path}")
+            return False
+        
+        # Determine output path
+        if output_path is None:
+            output_path = pdf_path.with_suffix('.txt')
+        else:
+            output_path = Path(output_path)
+        
+        print(f"Converting: {pdf_path.name}")
+        
+        # Read the PDF
+        reader = PdfReader(str(pdf_path))
+        
+        # Extract text from all pages
+        text_content = []
+        for i, page in enumerate(reader.pages, 1):
+            text = page.extract_text()
+            if text:
+                text_content.append(f"--- Page {i} ---\n{text}\n")
+        
+        # Write to text file
+        full_text = "\n".join(text_content)
+        output_path.write_text(full_text, encoding='utf-8')
+        
+        print(f"✓ Created: {output_path.name} ({len(reader.pages)} pages, {len(full_text):,} characters)")
+        return True
+        
+    except Exception as e:
+        print(f"✗ Error processing {pdf_path.name}: {str(e)}")
+        return False
+
+
+def convert_all_pdfs():
+    """Convert all PDF files in the current directory to text files."""
+    current_dir = Path.cwd()
+    pdf_files = list(current_dir.glob("*.pdf"))
+    
+    if not pdf_files:
+        print("No PDF files found in the current directory.")
+        return
+    
+    print(f"Found {len(pdf_files)} PDF file(s) to convert.\n")
+    
+    successful = 0
+    failed = 0
+    
+    for pdf_file in pdf_files:
+        if pdf_to_text(pdf_file):
+            successful += 1
+        else:
+            failed += 1
+    
+    print(f"\n{'='*60}")
+    print(f"Conversion complete: {successful} successful, {failed} failed")
+
+
+def main():
+    if len(sys.argv) < 2:
+        print(__doc__)
+        sys.exit(1)
+    
+    # Convert all PDFs in directory
+    if sys.argv[1] == "--all":
+        convert_all_pdfs()
+    
+    # Convert single PDF
+    elif len(sys.argv) == 2:
+        pdf_path = sys.argv[1]
+        pdf_to_text(pdf_path)
+    
+    # Convert single PDF with custom output name
+    elif len(sys.argv) == 3:
+        pdf_path = sys.argv[1]
+        output_path = sys.argv[2]
+        pdf_to_text(pdf_path, output_path)
+    
+    else:
+        print("Error: Too many arguments")
+        print(__doc__)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
+
--- a/review/quoidbach.sm.pdf
+++ b/review/quoidbach.sm.pdf
--- a/review/quoidbach_ehi1.pdf
+++ b/review/quoidbach_ehi1.pdf
--- a/review/reiff_ehi2.pdf
+++ b/review/reiff_ehi2.pdf
--- a/review/rutt_ehi2.pdf
+++ b/review/rutt_ehi2.pdf
--- a/review/sachi_ehi1.pdf
+++ b/review/sachi_ehi1.pdf
--- a/review/siedlecka_pro-retro.pdf
+++ b/review/siedlecka_pro-retro.pdf
--- a/review/vanRyzin_ehi1.pdf
+++ b/review/vanRyzin_ehi1.pdf
--- a/review/verner_FUTforecast.pdf
+++ b/review/verner_FUTforecast.pdf
--- a/review/wilson_FUTforecast.pdf
+++ b/review/wilson_FUTforecast.pdf
--- a/review/yue_ehi2.pdf
+++ b/review/yue_ehi2.pdf
--- a/manuscript/EOHI
+++ b/manuscript/EOHI
Author	SHA1	Message	Date
Irina Levit	d0d0e0f8d4	Add DA00 F-max, DA01 ANOVA script, knit Rmds; ignore lock files in .gitignore (#6 ) Reviewed-on: #6 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>	2026-02-03 15:19:27 -05:00
ira	42ea52d859	Merge pull request 'eohi3 var creation and recode scripts' (#5 ) from workB into master Reviewed-on: #5	2026-02-02 10:26:22 -05:00
Irina Levit	094201eeb4	eohi3 var creation and recode scripts	2026-01-28 17:25:10 -05:00
Irina Levit	ba54687da2	eohi3-updates (#3 ) updating eohi folder w/ third eohi exp. Reviewed-on: #3 Co-authored-by: Irina Levit <irina.levit.rn@gmail.com> Co-committed-by: Irina Levit <irina.levit.rn@gmail.com>	2026-01-26 16:30:09 -05:00