316 lines
14 KiB
Markdown
316 lines
14 KiB
Markdown
# Variable Creation Scripts Documentation
|
|
|
|
This document describes the data processing scripts used to create derived variables in the EOHI3 dataset. Each script performs specific transformations and should be run in sequence.
|
|
|
|
---
|
|
|
|
## datap 04 - combined vars.r
|
|
|
|
### Goal
|
|
Combine self-perspective and other-perspective variables into single columns. For each row, values exist in either the self-perspective variables OR the other-perspective variables, never both.
|
|
|
|
### Transformations
|
|
|
|
#### Past Variables (p5 = past)
|
|
Combines `self[VAL/PERS/PREF]_p5_[string]` and `other[VAL/PERS/PREF]_p5_[string]` into `past_[val/pers/pref]_[string]`.
|
|
|
|
**Source Variables:**
|
|
- **Values (VAL)**: `selfVAL_p5_trad`, `otherVAL_p5_trad`, `selfVAL_p5_autonomy`, `otherVAL_p5_autonomy`, `selfVAL_p5_personal`, `otherVAL_p5_personal`, `selfVAL_p5_justice`, `otherVAL_p5_justice`, `selfVAL_p5_close`, `otherVAL_p5_close`, `selfVAL_p5_connect`, `otherVAL_p5_connect`, `selfVAL_p5_dgen`, `otherVAL_p5_dgen`
|
|
- **Personality (PERS)**: `selfPERS_p5_open`, `otherPESR_p5_open` (note: typo in source data), `selfPERS_p5_goal`, `otherPERS_p5_goal`, `selfPERS_p5_social`, `otherPERS_p5_social`, `selfPERS_p5_agree`, `otherPERS_p5_agree`, `selfPERS_p5_stress`, `otherPERS_p5_stress`, `selfPERS_p5_dgen`, `otherPERS_p5_dgen`
|
|
- **Preferences (PREF)**: `selfPREF_p5_hobbies`, `otherPREF_p5_hobbies`, `selfPREF_p5_music`, `otherPREF_p5_music`, `selfPREF_p5_dress`, `otherPREF_p5_dress`, `selfPREF_p5_exer`, `otherPREF_p5_exer`, `selfPREF_p5_food`, `otherPREF_p5_food`, `selfPREF_p5_friends`, `otherPREF_p5_friends`, `selfPREF_p5_dgen`, `otherPREF_p5_dgen`
|
|
|
|
**Target Variables:**
|
|
- `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect`, `past_val_DGEN`
|
|
- `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress`, `past_pers_DGEN`
|
|
- `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends`, `past_pref_DGEN`
|
|
|
|
#### Future Variables (f5 = future)
|
|
Combines `self[VAL/PERS/PREF]_f5_[string]` and `other[VAL/PERS/PREF]_f5_[string]` into `fut_[val/pers/pref]_[string]`.
|
|
|
|
**Source Variables:**
|
|
- **Values (VAL)**: `selfVAL_f5_trad`, `otherVAL_f5_trad`, `selfVAL_f5_autonomy`, `otherVAL_f5_autonomy`, `selfVAL_f5_personal`, `otherVAL_f5_personal`, `selfVAL_f5_justice`, `otherVAL_f5_justice`, `selfVAL_f5_close`, `otherVAL_f5_close`, `selfVAL_f5_connect`, `otherVAL_f5_connect`, `selfVAL_f5_dgen`, `otherVAL_f5_dgen`
|
|
- **Personality (PERS)**: `selfPERS_f5_open`, `otherPERS_f5_open`, `selfPERS_f5_goal`, `otherPERS_f5_goal`, `selfPERS_f5_social`, `otherPERS_f5_social`, `selfPERS_f5_agree`, `otherPERS_f5_agree`, `selfPERS_f5_stress`, `otherPERS_f5_stress`, `selfPERS_f5_dgen`, `otherPERS_f5_dgen`
|
|
- **Preferences (PREF)**: `selfPREF_f5_hobbies`, `otherPREF_f5_hobbies`, `selfPREF_f5_music`, `otherPREF_f5_music`, `selfPREF_f5_dress`, `otherPREF_f5_dress`, `selfPREF_f5_exer`, `otherPREF_f5_exer`, `selfPREF_f5_food`, `otherPREF_f5_food`, `selfPREF_f5_friends`, `otherPREF_f5_friends`, `selfPREF_f5_dgen`, `otherPREF_f5_dgen`
|
|
|
|
**Target Variables:**
|
|
- `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect`, `fut_val_DGEN`
|
|
- `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress`, `fut_pers_DGEN`
|
|
- `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends`, `fut_pref_DGEN`
|
|
|
|
### Logic
|
|
- Uses self value if present (not empty/NA), otherwise uses other value
|
|
- If both are empty/NA, result is NA
|
|
- Assumes mutual exclusivity: each row has values in either self OR other, never both
|
|
|
|
### Validation Checks
|
|
1. **Conflict Check**: Verifies no rows have values in both self and other for the same variable
|
|
2. **Coverage Check**: Verifies combined columns have expected number of non-empty values (self_count + other_count = combined_count)
|
|
3. **Sample Row Check**: Shows examples of how values were combined
|
|
|
|
### Output
|
|
- Updates existing target columns in `eohi3.csv`
|
|
- Creates backup `eohi3_2.csv` before processing
|
|
|
|
---
|
|
|
|
## datap 05 - ehi vars.r
|
|
|
|
### Goal
|
|
Calculate EHI (End of History Illusion) variables as the difference between past and future variables. Each EHI variable represents the change from past to future perspective.
|
|
|
|
### Transformations
|
|
|
|
**Calculation Formula:** `ehi_[pref/pers/val]_[string] = past_[pref/pers/val]_[string] - fut_[pref/pers/val]_[string]`
|
|
|
|
#### EHI Variables Created
|
|
|
|
**EHI Preferences:**
|
|
- `ehi_pref_hobbies` = `past_pref_hobbies` - `fut_pref_hobbies`
|
|
- `ehi_pref_music` = `past_pref_music` - `fut_pref_music`
|
|
- `ehi_pref_dress` = `past_pref_dress` - `fut_pref_dress`
|
|
- `ehi_pref_exer` = `past_pref_exer` - `fut_pref_exer`
|
|
- `ehi_pref_food` = `past_pref_food` - `fut_pref_food`
|
|
- `ehi_pref_friends` = `past_pref_friends` - `fut_pref_friends`
|
|
- `ehi_pref_DGEN` = `past_pref_DGEN` - `fut_pref_DGEN`
|
|
|
|
**EHI Personality:**
|
|
- `ehi_pers_open` = `past_pers_open` - `fut_pers_open`
|
|
- `ehi_pers_goal` = `past_pers_goal` - `fut_pers_goal`
|
|
- `ehi_pers_social` = `past_pers_social` - `fut_pers_social`
|
|
- `ehi_pers_agree` = `past_pers_agree` - `fut_pers_agree`
|
|
- `ehi_pers_stress` = `past_pers_stress` - `fut_pers_stress`
|
|
- `ehi_pers_DGEN` = `past_pers_DGEN` - `fut_pers_DGEN`
|
|
|
|
**EHI Values:**
|
|
- `ehi_val_trad` = `past_val_trad` - `fut_val_trad`
|
|
- `ehi_val_autonomy` = `past_val_autonomy` - `fut_val_autonomy`
|
|
- `ehi_val_personal` = `past_val_personal` - `fut_val_personal`
|
|
- `ehi_val_justice` = `past_val_justice` - `fut_val_justice`
|
|
- `ehi_val_close` = `past_val_close` - `fut_val_close`
|
|
- `ehi_val_connect` = `past_val_connect` - `fut_val_connect`
|
|
- `ehi_val_DGEN` = `past_val_DGEN` - `fut_val_DGEN`
|
|
|
|
### Logic
|
|
- Converts source variables to numeric (handling empty strings and NA)
|
|
- Calculates difference: past - future
|
|
- Result can be positive (past > future), negative (past < future), or zero (past = future)
|
|
|
|
### Validation Checks
|
|
1. **Variable Existence**: Checks that all target variables exist before processing
|
|
2. **Source Variable Check**: Verifies source columns exist
|
|
3. **Random Row Validation**: Checks 5 random rows showing source values, target value, expected calculation, and match status
|
|
|
|
### Output
|
|
- Updates existing target columns in `eohi3.csv`
|
|
- Creates backup `eohi3_2.csv` before processing
|
|
|
|
---
|
|
|
|
## datap 06 - mean vars.r
|
|
|
|
### Goal
|
|
Calculate mean variables for various scales by averaging multiple related variables. Creates both domain-specific means and overall composite means.
|
|
|
|
### Transformations
|
|
|
|
#### Domain-Specific Means
|
|
|
|
**Past Preferences MEAN:**
|
|
- **Source Variables**: `past_pref_hobbies`, `past_pref_music`, `past_pref_dress`, `past_pref_exer`, `past_pref_food`, `past_pref_friends` (6 variables)
|
|
- **Target Variable**: `past_pref_MEAN`
|
|
|
|
**Future Preferences MEAN:**
|
|
- **Source Variables**: `fut_pref_hobbies`, `fut_pref_music`, `fut_pref_dress`, `fut_pref_exer`, `fut_pref_food`, `fut_pref_friends` (6 variables)
|
|
- **Target Variable**: `fut_pref_MEAN`
|
|
|
|
**Past Personality MEAN:**
|
|
- **Source Variables**: `past_pers_open`, `past_pers_goal`, `past_pers_social`, `past_pers_agree`, `past_pers_stress` (5 variables)
|
|
- **Target Variable**: `past_pers_MEAN`
|
|
|
|
**Future Personality MEAN:**
|
|
- **Source Variables**: `fut_pers_open`, `fut_pers_goal`, `fut_pers_social`, `fut_pers_agree`, `fut_pers_stress` (5 variables)
|
|
- **Target Variable**: `fut_pers_MEAN`
|
|
|
|
**Past Values MEAN:**
|
|
- **Source Variables**: `past_val_trad`, `past_val_autonomy`, `past_val_personal`, `past_val_justice`, `past_val_close`, `past_val_connect` (6 variables)
|
|
- **Target Variable**: `past_val_MEAN`
|
|
|
|
**Future Values MEAN:**
|
|
- **Source Variables**: `fut_val_trad`, `fut_val_autonomy`, `fut_val_personal`, `fut_val_justice`, `fut_val_close`, `fut_val_connect` (6 variables)
|
|
- **Target Variable**: `fut_val_MEAN`
|
|
|
|
**EHI Preferences MEAN:**
|
|
- **Source Variables**: `ehi_pref_hobbies`, `ehi_pref_music`, `ehi_pref_dress`, `ehi_pref_exer`, `ehi_pref_food`, `ehi_pref_friends` (6 variables)
|
|
- **Target Variable**: `ehi_pref_MEAN`
|
|
|
|
**EHI Personality MEAN:**
|
|
- **Source Variables**: `ehi_pers_open`, `ehi_pers_goal`, `ehi_pers_social`, `ehi_pers_agree`, `ehi_pers_stress` (5 variables)
|
|
- **Target Variable**: `ehi_pers_MEAN`
|
|
|
|
**EHI Values MEAN:**
|
|
- **Source Variables**: `ehi_val_trad`, `ehi_val_autonomy`, `ehi_val_personal`, `ehi_val_justice`, `ehi_val_close`, `ehi_val_connect` (6 variables)
|
|
- **Target Variable**: `ehi_val_MEAN`
|
|
|
|
#### Composite Means
|
|
|
|
**EHI Domain-Specific Mean:**
|
|
- **Source Variables**: `ehi_pref_MEAN`, `ehi_pers_MEAN`, `ehi_val_MEAN` (3 variables)
|
|
- **Target Variable**: `ehiDS_mean`
|
|
|
|
**EHI Domain-General Mean:**
|
|
- **Source Variables**: `ehi_pref_DGEN`, `ehi_pers_DGEN`, `ehi_val_DGEN` (3 variables)
|
|
- **Target Variable**: `ehiDGEN_mean`
|
|
|
|
### Logic
|
|
- Converts source variables to numeric (handling empty strings and NA)
|
|
- Calculates row means using `rowMeans()` with `na.rm = TRUE` (ignores NA values)
|
|
- Each mean represents the average of non-missing values for that row
|
|
|
|
### Validation Checks
|
|
1. **Variable Existence**: Uses `setdiff()` to check source and target variables exist
|
|
2. **Random Row Validation**: Checks 5 random rows showing source variable names, source values, target value, expected mean calculation, and match status
|
|
|
|
### Output
|
|
- Updates existing target columns in `eohi3.csv`
|
|
- Creates backup `eohi3_2.csv` before processing
|
|
|
|
---
|
|
|
|
## datap 07 - scales and recodes.r
|
|
|
|
### Goal
|
|
Recode various variables and calculate scale scores. Includes recoding categorical variables, processing cognitive reflection test (CRT) items, calculating ICAR scores, and recoding demographic variables.
|
|
|
|
### Transformations
|
|
|
|
#### 1. Recode other_length2 → other_length
|
|
**Source Variable**: `other_length2`
|
|
**Target Variable**: `other_length`
|
|
|
|
**Recoding Rules:**
|
|
- Values 5-9 → "5-9"
|
|
- Values 10-14 → "10-14"
|
|
- Values 15-19 → "15-19"
|
|
- Value "20+" → "20+" (handled as special case)
|
|
- Empty strings → preserved as empty string (not NA)
|
|
- NA → NA
|
|
|
|
#### 2. Recode other_like2 → other_like
|
|
**Source Variable**: `other_like2`
|
|
**Target Variable**: `other_like`
|
|
|
|
**Recoding Rules:**
|
|
- "Dislike a great deal" → "-2"
|
|
- "Dislike somewhat" → "-1"
|
|
- "Neither like nor dislike" → "0"
|
|
- "Like somewhat" → "1"
|
|
- "Like a great deal" → "2"
|
|
- Empty strings → preserved as empty string (not NA)
|
|
- NA → NA
|
|
|
|
#### 3. Calculate aot_total (Actively Open-Minded Thinking)
|
|
**Source Variables**: `aot01`, `aot02`, `aot03`, `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`, `aot08`
|
|
**Target Variable**: `aot_total`
|
|
|
|
**Calculation:**
|
|
1. Reverse code `aot04_r`, `aot05_r`, `aot06_r`, `aot07_r` by multiplying by -1
|
|
2. Calculate mean of all 8 variables: 4 original (`aot01`, `aot02`, `aot03`, `aot08`) + 4 reversed (`aot04_r`, `aot05_r`, `aot06_r`, `aot07_r`)
|
|
|
|
#### 4. Process CRT Questions → crt_correct and crt_int
|
|
**Source Variables**: `crt01`, `crt02`, `crt03`
|
|
**Target Variables**: `crt_correct`, `crt_int`
|
|
|
|
**CRT01:**
|
|
- "5 cents" → `crt_correct` = 1, `crt_int` = 0
|
|
- "10 cents" → `crt_correct` = 0, `crt_int` = 1
|
|
- Other values → `crt_correct` = 0, `crt_int` = 0
|
|
|
|
**CRT02:**
|
|
- "5 minutes" → `crt_correct` += 1, `crt_int` unchanged
|
|
- "100 minutes" → `crt_correct` unchanged, `crt_int` += 1
|
|
- Other values → both unchanged
|
|
|
|
**CRT03:**
|
|
- "47 days" → `crt_correct` += 1, `crt_int` unchanged
|
|
- "24 days" → `crt_correct` unchanged, `crt_int` += 1
|
|
- Other values → both unchanged
|
|
|
|
**Note**: `crt_correct` and `crt_int` are cumulative across all 3 questions (range: 0-3)
|
|
|
|
#### 5. Calculate icar_verbal
|
|
**Source Variables**: `verbal01`, `verbal02`, `verbal03`, `verbal04`, `verbal05`
|
|
**Target Variable**: `icar_verbal`
|
|
|
|
**Correct Answers:**
|
|
- `verbal01` = "5"
|
|
- `verbal02` = "8"
|
|
- `verbal03` = "It's impossible to tell"
|
|
- `verbal04` = "47"
|
|
- `verbal05` = "Sunday"
|
|
|
|
**Calculation**: Proportion correct = (number of correct responses) / 5
|
|
|
|
#### 6. Calculate icar_matrix
|
|
**Source Variables**: `matrix01`, `matrix02`, `matrix03`, `matrix04`, `matrix05`
|
|
**Target Variable**: `icar_matrix`
|
|
|
|
**Correct Answers:**
|
|
- `matrix01` = "D"
|
|
- `matrix02` = "E"
|
|
- `matrix03` = "B"
|
|
- `matrix04` = "B"
|
|
- `matrix05` = "D"
|
|
|
|
**Calculation**: Proportion correct = (number of correct responses) / 5
|
|
|
|
#### 7. Calculate icar_total
|
|
**Source Variables**: `verbal01`-`verbal05`, `matrix01`-`matrix05` (10 variables total)
|
|
**Target Variable**: `icar_total`
|
|
|
|
**Calculation**: Proportion correct across all 10 items = (number of correct responses) / 10
|
|
|
|
#### 8. Recode demo_sex → sex
|
|
**Source Variable**: `demo_sex`
|
|
**Target Variable**: `sex`
|
|
|
|
**Recoding Rules:**
|
|
- "Male" (case-insensitive) → 0
|
|
- "Female" (case-insensitive) → 1
|
|
- Other values (e.g., "Prefer not to say") → 2
|
|
- Empty/NA → NA
|
|
|
|
#### 9. Recode demo_edu → education
|
|
**Source Variable**: `demo_edu`
|
|
**Target Variable**: `education` (ordered factor)
|
|
|
|
**Recoding Rules:**
|
|
- "High School (or equivalent)" or "Trade School" → "HS_TS"
|
|
- "College Diploma/Certificate" or "University - Undergraduate" → "C_Ug"
|
|
- "University - Graduate (Masters)" or "University - PhD" or "Professional Degree (ex. JD/MD)" → "grad_prof"
|
|
- Empty/NA → NA
|
|
|
|
**Factor Levels**: `HS_TS` < `C_Ug` < `grad_prof` (ordered)
|
|
|
|
### Validation Checks
|
|
Each transformation includes:
|
|
1. **Variable Existence Check**: Verifies source and target variables exist
|
|
2. **Value Check**: Verifies expected values exist in source variables (warns about unexpected values)
|
|
3. **Post-Processing Verification**: Checks 5 random rows showing source values, target values, and calculations
|
|
|
|
### Output
|
|
- Updates existing target columns in `eohi3.csv`
|
|
- Creates backup `eohi3_2.csv` before processing
|
|
|
|
---
|
|
|
|
## Script Execution Order
|
|
|
|
These scripts should be run in the following order:
|
|
|
|
1. **datap 04 - combined vars.r** - Combines self/other variables into past/future variables
|
|
2. **datap 05 - ehi vars.r** - Calculates EHI variables from past/future differences
|
|
3. **datap 06 - mean vars.r** - Calculates mean variables for scales
|
|
4. **datap 07 - scales and recodes.r** - Recodes variables and calculates scale scores
|
|
|
|
Each script creates a backup (`eohi3_2.csv`) before processing and includes validation checks to ensure transformations are performed correctly.
|