eohi/eohi2/README_Variable_Creation.txt
2025-12-23 15:47:09 -05:00

1048 lines
41 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

================================================================================
EOHI2 DATA PROCESSING PIPELINE - VARIABLE CREATION DOCUMENTATION
================================================================================
This README documents the complete data processing pipeline for eohi2.csv.
All processing scripts should be run in the order listed below.
Source File: eohi2.csv
Processing Scripts: dataP 01 through datap 16
================================================================================
SCRIPT 01: dataP 01 - recode and combine past & future vars.r
================================================================================
PURPOSE:
Combines responses from two survey versions (01 and 02) and recodes Likert
scale text responses to numeric values for past and future time periods.
VARIABLES CREATED: 60 total (15 items × 4 time periods)
SOURCE COLUMNS:
- Set A: 01past5PrefItem_1 through 01fut10ValItem_5 (60 columns)
- Set B: 02past5PrefItem_1 through 02fut10ValItem_5 (60 columns)
TARGET VARIABLES:
Past 5 Years (15 variables):
- past_5_pref_read, past_5_pref_music, past_5_pref_TV, past_5_pref_nap,
past_5_pref_travel
- past_5_pers_extravert, past_5_pers_critical, past_5_pers_dependable,
past_5_pers_anxious, past_5_pers_complex
- past_5_val_obey, past_5_val_trad, past_5_val_opinion,
past_5_val_performance, past_5_val_justice
Past 10 Years (15 variables):
- past_10_pref_read, past_10_pref_music, past_10_pref_TV, past_10_pref_nap,
past_10_pref_travel
- past_10_pers_extravert, past_10_pers_critical, past_10_pers_dependable,
past_10_pers_anxious, past_10_pers_complex
- past_10_val_obey, past_10_val_trad, past_10_val_opinion,
past_10_val_performance, past_10_val_justice
Future 5 Years (15 variables):
- fut_5_pref_read, fut_5_pref_music, fut_5_pref_TV, fut_5_pref_nap,
fut_5_pref_travel
- fut_5_pers_extravert, fut_5_pers_critical, fut_5_pers_dependable,
fut_5_pers_anxious, fut_5_pers_complex
- fut_5_val_obey, fut_5_val_trad, fut_5_val_opinion,
fut_5_val_performance, fut_5_val_justice
Future 10 Years (15 variables):
- fut_10_pref_read, fut_10_pref_music, fut_10_pref_TV, fut_10_pref_nap,
fut_10_pref_travel
- fut_10_pers_extravert, fut_10_pers_critical, fut_10_pers_dependable,
fut_10_pers_anxious, fut_10_pers_complex
- fut_10_val_obey, fut_10_val_trad, fut_10_val_opinion,
fut_10_val_performance, fut_10_val_justice
TRANSFORMATION LOGIC:
Step 1: Combine responses from Set A (01) and Set B (02)
- If Set A has a value, use Set A
- If Set A is empty, use Set B
Step 2: Recode text responses to numeric values:
"Strongly Disagree" → -3
"Disagree" → -2
"Somewhat Disagree" → -1
"Neither Agree nor Disagree" → 0
"Somewhat Agree" → 1
"Agree" → 2
"Strongly Agree" → 3
Empty/Missing → NA
ITEM DOMAINS:
- Preferences (pref): Reading, Music, TV, Nap, Travel
- Personality (pers): Extravert, Critical, Dependable, Anxious, Complex
- Values (val): Obey, Tradition, Opinion, Performance, Justice
================================================================================
SCRIPT 02: dataP 02 - recode present VARS.r
================================================================================
PURPOSE:
Recodes present-time Likert scale text responses to numeric values.
VARIABLES CREATED: 15 total
SOURCE COLUMNS:
- prePrefItem_1 through prePrefItem_5 (5 columns)
- prePersItem_1 through prePersItem_5 (5 columns)
- preValItem_1 through preValItem_5 (5 columns)
TARGET VARIABLES:
Present Time (15 variables):
- present_pref_read, present_pref_music, present_pref_tv, present_pref_nap,
present_pref_travel
- present_pers_extravert, present_pers_critical, present_pers_dependable,
present_pers_anxious, present_pers_complex
- present_val_obey, present_val_trad, present_val_opinion,
present_val_performance, present_val_justice
TRANSFORMATION LOGIC:
Recode text responses to numeric values:
"Strongly Disagree" → -3
"Disagree" → -2
"Somewhat Disagree" → -1
"Neither Agree nor Disagree" → 0
"Somewhat Agree" → 1
"Agree" → 2
"Strongly Agree" → 3
Empty/Missing → NA
SPECIAL NOTE:
Present time uses "present_pref_tv" (lowercase) while past/future use
"past_5_pref_TV" (uppercase). This is intentional and preserved from the
original data structure.
================================================================================
SCRIPT 03: dataP 03 - recode DGEN vars.r
================================================================================
PURPOSE:
Combines DGEN (domain general) responses from two survey versions (01 and 02).
These are single-item measures for each domain/time combination.
NO RECODING - just copies numeric values as-is.
VARIABLES CREATED: 12 total (3 domains × 4 time periods)
SOURCE COLUMNS:
- Set A: 01past5PrefDGEN_1, 01past5PersDGEN_1, 01past5ValDGEN_1, etc.
- Set B: 02past5PrefDGEN_1, 02past5PersDGEN_1, 02past5ValDGEN_1, etc.
TARGET VARIABLES:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TRANSFORMATION LOGIC:
- If Set A (01) has a value, use Set A
- If Set A is empty, use Set B (02)
- NO RECODING: Values are copied directly as numeric
SPECIAL NOTES:
- Future columns in raw data use "_8" suffix for Pref/Pers items
- Future Val columns use "ValuesDGEN" spelling in Set A, "ValDGEN" in Set B
================================================================================
SCRIPT 04: dataP 04 - DGEN means.r
================================================================================
PURPOSE:
Calculates mean DGEN scores by averaging the three domain scores (Preferences,
Personality, Values) for each time period.
VARIABLES CREATED: 4 total (1 per time period)
SOURCE COLUMNS:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TARGET VARIABLES:
- DGEN_past_5_mean
- DGEN_past_10_mean
- DGEN_fut_5_mean
- DGEN_fut_10_mean
TRANSFORMATION LOGIC:
Each mean = (Pref + Pers + Val) / 3
- NA values are excluded from calculation (na.rm = TRUE)
================================================================================
SCRIPT 05: dataP 05 - recode scales VARS.r
================================================================================
PURPOSE:
Processes two cognitive scales:
1. AOT (Actively Open-minded Thinking): 8-item scale with reverse coding
2. CRT (Cognitive Reflection Test): 3-item test with correct/intuitive scoring
VARIABLES CREATED: 3 total
SOURCE COLUMNS:
AOT Scale:
- aot_1, aot_2, aot_3, aot_4, aot_5, aot_6, aot_7, aot_8
CRT Test:
- crt_1, crt_2, crt_3
TARGET VARIABLES:
- aot_total (mean of 8 items with reverse coding)
- crt_correct (proportion of correct answers)
- crt_int (proportion of intuitive/incorrect answers)
TRANSFORMATION LOGIC:
AOT Scale (aot_total):
1. Items 4, 5, 6, 7 are reverse coded by multiplying by -1
2. Calculate mean of all 8 items (with reverse coding applied)
3. Original source values are NOT modified in the dataframe
4. NA values excluded from calculation (na.rm = TRUE)
CRT Correct (crt_correct):
Correct answers:
- crt_1: "5 cents"
- crt_2: "5 minutes"
- crt_3: "47 days"
Calculation: (Number of correct answers) / (Number of non-missing answers)
CRT Intuitive (crt_int):
Intuitive (common incorrect) answers:
- crt_1: "10 cents"
- crt_2: "100 minutes"
- crt_3: "24 days"
Calculation: (Number of intuitive answers) / (Number of non-missing answers)
SPECIAL NOTES:
- CRT scoring is case-insensitive and trims whitespace
- Both CRT scores are proportions (0.00 to 1.00)
- Empty/missing CRT responses are excluded from denominator
================================================================================
SCRIPT 06: dataP 06 - time interval differences.r
================================================================================
PURPOSE:
Calculates absolute differences between time intervals to measure perceived
change across time periods for all 15 items.
VARIABLES CREATED: 90 total (6 difference types × 15 items)
SOURCE COLUMNS:
- present_pref_read through present_val_justice (15 columns)
- past_5_pref_read through past_5_val_justice (15 columns)
- past_10_pref_read through past_10_val_justice (15 columns)
- fut_5_pref_read through fut_5_val_justice (15 columns)
- fut_10_pref_read through fut_10_val_justice (15 columns)
TARGET VARIABLES (by difference type):
NPast_5 (Present vs Past 5 years) - 15 variables:
Formula: |present - past_5|
- NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap,
NPast_5_pref_travel
- NPast_5_pers_extravert, NPast_5_pers_critical, NPast_5_pers_dependable,
NPast_5_pers_anxious, NPast_5_pers_complex
- NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion,
NPast_5_val_performance, NPast_5_val_justice
NPast_10 (Present vs Past 10 years) - 15 variables:
Formula: |present - past_10|
- NPast_10_pref_read, NPast_10_pref_music, NPast_10_pref_TV,
NPast_10_pref_nap, NPast_10_pref_travel
- NPast_10_pers_extravert, NPast_10_pers_critical, NPast_10_pers_dependable,
NPast_10_pers_anxious, NPast_10_pers_complex
- NPast_10_val_obey, NPast_10_val_trad, NPast_10_val_opinion,
NPast_10_val_performance, NPast_10_val_justice
NFut_5 (Present vs Future 5 years) - 15 variables:
Formula: |present - fut_5|
- NFut_5_pref_read, NFut_5_pref_music, NFut_5_pref_TV, NFut_5_pref_nap,
NFut_5_pref_travel
- NFut_5_pers_extravert, NFut_5_pers_critical, NFut_5_pers_dependable,
NFut_5_pers_anxious, NFut_5_pers_complex
- NFut_5_val_obey, NFut_5_val_trad, NFut_5_val_opinion,
NFut_5_val_performance, NFut_5_val_justice
NFut_10 (Present vs Future 10 years) - 15 variables:
Formula: |present - fut_10|
- NFut_10_pref_read, NFut_10_pref_music, NFut_10_pref_TV, NFut_10_pref_nap,
NFut_10_pref_travel
- NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable,
NFut_10_pers_anxious, NFut_10_pers_complex
- NFut_10_val_obey, NFut_10_val_trad, NFut_10_val_opinion,
NFut_10_val_performance, NFut_10_val_justice
5.10past (Past 5 vs Past 10 years) - 15 variables:
Formula: |past_5 - past_10|
- 5.10past_pref_read, 5.10past_pref_music, 5.10past_pref_TV,
5.10past_pref_nap, 5.10past_pref_travel
- 5.10past_pers_extravert, 5.10past_pers_critical, 5.10past_pers_dependable,
5.10past_pers_anxious, 5.10past_pers_complex
- 5.10past_val_obey, 5.10past_val_trad, 5.10past_val_opinion,
5.10past_val_performance, 5.10past_val_justice
5.10fut (Future 5 vs Future 10 years) - 15 variables:
Formula: |fut_5 - fut_10|
- 5.10fut_pref_read, 5.10fut_pref_music, 5.10fut_pref_TV, 5.10fut_pref_nap,
5.10fut_pref_travel
- 5.10fut_pers_extravert, 5.10fut_pers_critical, 5.10fut_pers_dependable,
5.10fut_pers_anxious, 5.10fut_pers_complex
- 5.10fut_val_obey, 5.10fut_val_trad, 5.10fut_val_opinion,
5.10fut_val_performance, 5.10fut_val_justice
TRANSFORMATION LOGIC:
All calculations use absolute differences:
- NPast_5: |present_[item] - past_5_[item]|
- NPast_10: |present_[item] - past_10_[item]|
- NFut_5: |present_[item] - fut_5_[item]|
- NFut_10: |present_[item] - fut_10_[item]|
- 5.10past: |past_5_[item] - past_10_[item]|
- 5.10fut: |fut_5_[item] - fut_10_[item]|
Result: Always positive values representing magnitude of change
Missing values in either source column result in NA
SPECIAL NOTES:
- Present time uses "pref_tv" (lowercase) while past/future use "pref_TV"
(uppercase), so script handles this naming inconsistency
- All values are absolute differences (non-negative)
================================================================================
SCRIPT 07: dataP 07 - domain means.r
================================================================================
PURPOSE:
Calculates domain-level means by averaging the 5 items within each domain
(Preferences, Personality, Values) for each of the 6 time interval difference
types.
VARIABLES CREATED: 18 total (6 time intervals × 3 domains)
SOURCE COLUMNS:
- NPast_5_pref_read through NPast_5_val_justice (15 columns)
- NPast_10_pref_read through NPast_10_val_justice (15 columns)
- NFut_5_pref_read through NFut_5_val_justice (15 columns)
- NFut_10_pref_read through NFut_10_val_justice (15 columns)
- 5.10past_pref_read through 5.10past_val_justice (15 columns)
- 5.10fut_pref_read through 5.10fut_val_justice (15 columns)
Total: 90 difference columns (created in Script 06)
TARGET VARIABLES:
NPast_5 Domain Means (3 variables):
- NPast_5_pref_MEAN (mean of 5 preference items)
- NPast_5_pers_MEAN (mean of 5 personality items)
- NPast_5_val_MEAN (mean of 5 values items)
NPast_10 Domain Means (3 variables):
- NPast_10_pref_MEAN
- NPast_10_pers_MEAN
- NPast_10_val_MEAN
NFut_5 Domain Means (3 variables):
- NFut_5_pref_MEAN
- NFut_5_pers_MEAN
- NFut_5_val_MEAN
NFut_10 Domain Means (3 variables):
- NFut_10_pref_MEAN
- NFut_10_pers_MEAN
- NFut_10_val_MEAN
5.10past Domain Means (3 variables):
- 5.10past_pref_MEAN
- 5.10past_pers_MEAN
- 5.10past_val_MEAN
5.10fut Domain Means (3 variables):
- 5.10fut_pref_MEAN
- 5.10fut_pers_MEAN
- 5.10fut_val_MEAN
TRANSFORMATION LOGIC:
Each domain mean = average of 5 items within that domain
Example for NPast_5_pref_MEAN:
= mean(NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV,
NPast_5_pref_nap, NPast_5_pref_travel)
Example for NFut_10_pers_MEAN:
= mean(NFut_10_pers_extravert, NFut_10_pers_critical,
NFut_10_pers_dependable, NFut_10_pers_anxious,
NFut_10_pers_complex)
NA values excluded from calculation (na.rm = TRUE)
PURPOSE OF DOMAIN MEANS:
- Provides higher-level summary of perceived change by domain
- Reduces item-level noise by aggregating across related items
- Enables domain-level comparisons across time intervals
- Parallel to Script 04 (DGEN means) but for difference scores instead of
raw DGEN ratings
SPECIAL NOTES:
- This script depends on Script 06 being run first
- Creates domain-level aggregates of absolute difference scores
- All means are averages of non-negative values (absolute differences)
================================================================================
SCRIPT 08: dataP 08 - DGEN 510 vars.r
================================================================================
PURPOSE:
Calculates absolute differences between 5-year and 10-year DGEN ratings for
both Past and Future time directions. These variables measure the perceived
difference in domain-general change between the two time intervals.
VARIABLES CREATED: 6 total (3 domains × 2 time directions)
SOURCE COLUMNS:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
Total: 12 DGEN columns (created in Script 03)
TARGET VARIABLES:
Past Direction (3 variables):
- X5_10DGEN_past_pref (|DGEN_past_5_Pref - DGEN_past_10_Pref|)
- X5_10DGEN_past_pers (|DGEN_past_5_Pers - DGEN_past_10_Pers|)
- X5_10DGEN_past_val (|DGEN_past_5_Val - DGEN_past_10_Val|)
Future Direction (3 variables):
- X5_10DGEN_fut_pref (|DGEN_fut_5_Pref - DGEN_fut_10_Pref|)
- X5_10DGEN_fut_pers (|DGEN_fut_5_Pers - DGEN_fut_10_Pers|)
- X5_10DGEN_fut_val (|DGEN_fut_5_Val - DGEN_fut_10_Val|)
TRANSFORMATION LOGIC:
Formula: |DGEN_5 - DGEN_10|
All calculations use absolute differences:
- Past Preferences: |DGEN_past_5_Pref - DGEN_past_10_Pref|
- Past Personality: |DGEN_past_5_Pers - DGEN_past_10_Pers|
- Past Values: |DGEN_past_5_Val - DGEN_past_10_Val|
- Future Preferences: |DGEN_fut_5_Pref - DGEN_fut_10_Pref|
- Future Personality: |DGEN_fut_5_Pers - DGEN_fut_10_Pers|
- Future Values: |DGEN_fut_5_Val - DGEN_fut_10_Val|
Result: Always positive values representing magnitude of difference
Missing values in either source column result in NA
SPECIAL NOTES:
- Variable names use "X" prefix because R automatically adds it to column
names starting with numbers (5_10 becomes X5_10)
- This script depends on Script 03 being run first
- Measures interval effects within time direction (past vs future)
- Parallel to Script 06's 5.10past and 5.10fut variables but for DGEN scores
================================================================================
SCRIPT 09: dataP 09 - interval x direction means.r
================================================================================
PURPOSE:
Calculates comprehensive mean scores by averaging item-level differences
across intervals and directions. Creates both narrow-scope means (single
time interval) and broad-scope global means (combining multiple intervals).
VARIABLES CREATED: 11 total (6 narrow-scope + 5 global-scope)
SOURCE COLUMNS:
All 90 difference variables created in Script 06:
- NPast_5_[domain]_[item] (15 variables)
- NPast_10_[domain]_[item] (15 variables)
- NFut_5_[domain]_[item] (15 variables)
- NFut_10_[domain]_[item] (15 variables)
- X5.10past_[domain]_[item] (15 variables)
- X5.10fut_[domain]_[item] (15 variables)
TARGET VARIABLES:
Narrow-Scope Means (15 source items each):
- NPast_5_mean (mean across all 15 NPast_5 items)
- NPast_10_mean (mean across all 15 NPast_10 items)
- NFut_5_mean (mean across all 15 NFut_5 items)
- NFut_10_mean (mean across all 15 NFut_10 items)
- X5.10past_mean (mean across all 15 X5.10past items)
- X5.10fut_mean (mean across all 15 X5.10fut items)
Global-Scope Means (30 source items each):
- NPast_global_mean (NPast_5 + NPast_10: all past intervals)
- NFut_global_mean (NFut_5 + NFut_10: all future intervals)
- X5.10_global_mean (X5.10past + X5.10fut: all 5-vs-10 intervals)
- N5_global_mean (NPast_5 + NFut_5: all 5-year intervals)
- N10_global_mean (NPast_10 + NFut_10: all 10-year intervals)
TRANSFORMATION LOGIC:
Narrow-Scope Means (15 items each):
Each mean averages all 15 difference items within one time interval
Example for NPast_5_mean:
= mean(NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV,
NPast_5_pref_nap, NPast_5_pref_travel,
NPast_5_pers_extravert, NPast_5_pers_critical,
NPast_5_pers_dependable, NPast_5_pers_anxious,
NPast_5_pers_complex,
NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion,
NPast_5_val_performance, NPast_5_val_justice)
Global-Scope Means (30 items each):
Each mean averages 30 difference items across two related intervals
Example for NPast_global_mean:
= mean(all 15 NPast_5 items + all 15 NPast_10 items)
Represents overall perceived change from present to any past timepoint
Example for N5_global_mean:
= mean(all 15 NPast_5 items + all 15 NFut_5 items)
Represents overall perceived change at 5-year interval regardless of
direction
NA values excluded from calculation (na.rm = TRUE)
PURPOSE OF INTERVAL × DIRECTION MEANS:
- Narrow-scope means: Single-interval summaries across all domains and items
- Global-scope means: Cross-interval summaries for testing:
* Direction effects (past vs future)
* Interval effects (5-year vs 10-year)
* Combined temporal distance effects
- Enables comprehensive analysis of temporal self-perception patterns
- Reduces item-level and domain-level noise through broad aggregation
QUALITY ASSURANCE:
- Script includes automated QA checks for first 5 rows
- Manually recalculates each mean and verifies against stored values
- Prints TRUE/FALSE match status for each variable
- Ensures calculation accuracy before further analysis
SPECIAL NOTES:
- This script depends on Script 06 being run first
- All means are averages of absolute difference scores (non-negative)
- Global means provide the broadest temporal self-perception summaries
- Naming convention uses "global" for 30-item means, no suffix for 15-item
================================================================================
SCRIPT 10: dataP 10 - DGEN mean vars.r
================================================================================
PURPOSE:
Calculates mean DGEN scores by averaging across different time combinations.
Creates means for Past, Future, and interval-based (5-year, 10-year) groupings.
VARIABLES CREATED: 6 total
SOURCE COLUMNS:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TARGET VARIABLES:
Direction-Based Means (2 variables):
- DGEN_past_mean (mean of past_5_mean and past_10_mean)
- DGEN_fut_mean (mean of fut_5_mean and fut_10_mean)
Interval-Based Means (2 variables):
- DGEN_5_mean (mean of past_5_mean and fut_5_mean)
- DGEN_10_mean (mean of past_10_mean and fut_10_mean)
Domain-Based Means (2 variables):
- DGEN_pref_mean (mean across all 4 time periods for Preferences)
- DGEN_pers_mean (mean across all 4 time periods for Personality)
TRANSFORMATION LOGIC:
Direction-based:
- DGEN_past_mean = mean(DGEN_past_5_mean, DGEN_past_10_mean)
- DGEN_fut_mean = mean(DGEN_fut_5_mean, DGEN_fut_10_mean)
Interval-based:
- DGEN_5_mean = mean(DGEN_past_5_mean, DGEN_fut_5_mean)
- DGEN_10_mean = mean(DGEN_past_10_mean, DGEN_fut_10_mean)
Domain-based:
- DGEN_pref_mean = mean across all 4 Pref scores
- DGEN_pers_mean = mean across all 4 Pers scores
NA values excluded from calculation (na.rm = TRUE)
================================================================================
SCRIPT 11: dataP 11 - CORRECT ehi vars.r
================================================================================
PURPOSE:
Creates Enduring Hedonic Impact (EHI) variables by calculating differences
between Past and Future responses for each item across different time intervals.
Formula: NPast - NFut (positive values indicate greater past-present change)
VARIABLES CREATED: 45 total (15 items × 3 time intervals)
SOURCE COLUMNS:
5-year intervals:
- NPast_5_pref_read through NPast_5_val_justice (15 columns)
- NFut_5_pref_read through NFut_5_val_justice (15 columns)
10-year intervals:
- NPast_10_pref_read through NPast_10_val_justice (15 columns)
- NFut_10_pref_read through NFut_10_val_justice (15 columns)
5-10 year change:
- X5.10past_pref_read through X5.10past_val_justice (15 columns)
- X5.10fut_pref_read through X5.10fut_val_justice (15 columns)
TARGET VARIABLES:
5-Year EHI Variables (15 variables):
- ehi5_pref_read, ehi5_pref_music, ehi5_pref_TV, ehi5_pref_nap,
ehi5_pref_travel
- ehi5_pers_extravert, ehi5_pers_critical, ehi5_pers_dependable,
ehi5_pers_anxious, ehi5_pers_complex
- ehi5_val_obey, ehi5_val_trad, ehi5_val_opinion, ehi5_val_performance,
ehi5_val_justice
10-Year EHI Variables (15 variables):
- ehi10_pref_read, ehi10_pref_music, ehi10_pref_TV, ehi10_pref_nap,
ehi10_pref_travel
- ehi10_pers_extravert, ehi10_pers_critical, ehi10_pers_dependable,
ehi10_pers_anxious, ehi10_pers_complex
- ehi10_val_obey, ehi10_val_trad, ehi10_val_opinion, ehi10_val_performance,
ehi10_val_justice
5-10 Year Change EHI Variables (15 variables):
- ehi5.10_pref_read, ehi5.10_pref_music, ehi5.10_pref_TV, ehi5.10_pref_nap,
ehi5.10_pref_travel
- ehi5.10_pers_extravert, ehi5.10_pers_critical, ehi5.10_pers_dependable,
ehi5.10_pers_anxious, ehi5.10_pers_complex
- ehi5.10_val_obey, ehi5.10_val_trad, ehi5.10_val_opinion,
ehi5.10_val_performance, ehi5.10_val_justice
TRANSFORMATION LOGIC:
Formula: NPast - NFut
All calculations use signed differences:
- ehi5_[item] = NPast_5_[item] - NFut_5_[item]
- ehi10_[item] = NPast_10_[item] - NFut_10_[item]
- ehi5.10_[item] = X5.10past_[item] - X5.10fut_[item]
Result: Positive = greater past change, Negative = greater future change
Missing values in either source column result in NA
QUALITY ASSURANCE:
- Comprehensive QA checks for all 45 variables across all rows
- First 5 rows displayed with detailed calculations showing source values,
computed differences, and stored values
- Pass/Fail status for each variable reported
================================================================================
SCRIPT 12: dataP 12 - CORRECT DGEN ehi vars.r
================================================================================
PURPOSE:
Creates domain-general EHI variables by calculating differences between Past
and Future DGEN responses. These are the domain-general parallel to Script 11's
domain-specific EHI variables.
VARIABLES CREATED: 6 total (3 domains × 2 time intervals)
SOURCE COLUMNS:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TARGET VARIABLES:
5-Year DGEN EHI (3 variables):
- ehiDGEN_5_Pref
- ehiDGEN_5_Pers
- ehiDGEN_5_Val
10-Year DGEN EHI (3 variables):
- ehiDGEN_10_Pref
- ehiDGEN_10_Pers
- ehiDGEN_10_Val
TRANSFORMATION LOGIC:
Formula: DGEN_past - DGEN_fut
All calculations use signed differences:
- ehiDGEN_5_Pref = DGEN_past_5_Pref - DGEN_fut_5_Pref
- ehiDGEN_5_Pers = DGEN_past_5_Pers - DGEN_fut_5_Pers
- ehiDGEN_5_Val = DGEN_past_5_Val - DGEN_fut_5_Val
- ehiDGEN_10_Pref = DGEN_past_10_Pref - DGEN_fut_10_Pref
- ehiDGEN_10_Pers = DGEN_past_10_Pers - DGEN_fut_10_Pers
- ehiDGEN_10_Val = DGEN_past_10_Val - DGEN_fut_10_Val
Result: Positive = greater past change, Negative = greater future change
QUALITY ASSURANCE:
- QA checks for all 6 variables across all rows
- First 5 rows displayed with detailed calculations
- Pass/Fail status for each variable reported
================================================================================
SCRIPT 13: datap 13 - ehi domain specific means.r
================================================================================
PURPOSE:
Calculates domain-level mean EHI scores by averaging the 5 items within each
domain (Preferences, Personality, Values) for each time interval.
VARIABLES CREATED: 9 total (3 domains × 3 time intervals)
SOURCE COLUMNS:
- ehi5_pref_read through ehi5_val_justice (15 columns)
- ehi10_pref_read through ehi10_val_justice (15 columns)
- ehi5.10_pref_read through ehi5.10_val_justice (15 columns)
TARGET VARIABLES:
5-Year Domain Means (3 variables):
- ehi5_pref_MEAN (mean of 5 preference items)
- ehi5_pers_MEAN (mean of 5 personality items)
- ehi5_val_MEAN (mean of 5 values items)
10-Year Domain Means (3 variables):
- ehi10_pref_MEAN
- ehi10_pers_MEAN
- ehi10_val_MEAN
5-10 Year Change Domain Means (3 variables):
- ehi5.10_pref_MEAN
- ehi5.10_pers_MEAN
- ehi5.10_val_MEAN
TRANSFORMATION LOGIC:
Each domain mean = average of 5 items within that domain
Example for ehi5_pref_MEAN:
= mean(ehi5_pref_read, ehi5_pref_music, ehi5_pref_TV,
ehi5_pref_nap, ehi5_pref_travel)
NA values excluded from calculation (na.rm = TRUE)
QUALITY ASSURANCE:
- Comprehensive QA for all 9 variables across all rows
- First 5 rows displayed for multiple domain means
- Pass/Fail status for each variable
================================================================================
SCRIPT 14: datap 14 - all ehi global means.r
================================================================================
PURPOSE:
Calculates global EHI means by averaging domain-level means. Creates the
highest-level summary scores for EHI across both domain-general and
domain-specific measures.
VARIABLES CREATED: 5 total
SOURCE COLUMNS:
- ehiDGEN_5_Pref, ehiDGEN_5_Pers, ehiDGEN_5_Val
- ehiDGEN_10_Pref, ehiDGEN_10_Pers, ehiDGEN_10_Val
- ehi5_pref_MEAN, ehi5_pers_MEAN, ehi5_val_MEAN
- ehi10_pref_MEAN, ehi10_pers_MEAN, ehi10_val_MEAN
- ehi5.10_pref_MEAN, ehi5.10_pers_MEAN, ehi5.10_val_MEAN
TARGET VARIABLES:
DGEN Global Means (2 variables):
- ehiDGEN_5_mean (mean of 3 DGEN domains for 5-year)
- ehiDGEN_10_mean (mean of 3 DGEN domains for 10-year)
Domain-Specific Global Means (3 variables):
- ehi5_global_mean (mean of 3 domain means for 5-year)
- ehi10_global_mean (mean of 3 domain means for 10-year)
- ehi5.10_global_mean (mean of 3 domain means for 5-10 change)
TRANSFORMATION LOGIC:
Each global mean = average of 3 domain-level scores
Example for ehiDGEN_5_mean:
= mean(ehiDGEN_5_Pref, ehiDGEN_5_Pers, ehiDGEN_5_Val)
Example for ehi5_global_mean:
= mean(ehi5_pref_MEAN, ehi5_pers_MEAN, ehi5_val_MEAN)
NA values excluded from calculation (na.rm = TRUE)
QUALITY ASSURANCE:
- QA for all 5 global means across all rows
- First 5 rows displayed with detailed calculations
- Values shown with 5 decimal precision
- Pass/Fail status for each variable
================================================================================
SCRIPT 15: datap 15 - education recoded ordinal 3.r
================================================================================
PURPOSE:
Recodes raw education categories (`demo_edu`) into an ordered 3-level factor
for analyses requiring an ordinal education variable.
VARIABLES CREATED: 1 total
SOURCE COLUMNS:
- demo_edu
TARGET VARIABLES:
- edu3 (ordered factor with 3 levels)
TRANSFORMATION LOGIC:
Map `demo_edu` to 3 ordered levels and store as an ordered factor:
- "HS_TS": High School (or equivalent), Trade School (non-military)
- "C_Ug": College Diploma/Certificate, University - Undergraduate
- "grad_prof": University - Graduate (Masters), University - PhD, Professional Degree (ex. JD/MD)
Levels and order:
edu3 = factor(edu3, levels = c("HS_TS", "C_Ug", "grad_prof"), ordered = TRUE)
QUALITY ASSURANCE:
- Prints frequency table for `edu3` and a cross-tab of `demo_edu` × `edu3` to
verify correct mapping and absence of unintended NAs.
- Saves updated dataset to `eohi2.csv`.
================================================================================
SCRIPT 16: datap 16 - ehi vars standardized .r
================================================================================
PURPOSE:
Standardizes key EHI summary variables (z-scores) and creates a composite
standardized EHI mean (`stdEHI_mean`) for use in correlational and regression
analyses.
VARIABLES CREATED: 5 total
SOURCE COLUMNS:
- ehiDGEN_5_mean, ehiDGEN_10_mean
- ehi5_global_mean, ehi10_global_mean
TARGET VARIABLES:
- stdDGEN_5 = z(ehiDGEN_5_mean)
- stdDGEN_10 = z(ehiDGEN_10_mean)
- stdDS_5 = z(ehi5_global_mean)
- stdDS_10 = z(ehi10_global_mean)
- stdEHI_mean = mean(stdDGEN_5, stdDGEN_10, stdDS_5, stdDS_10), row-wise
TRANSFORMATION LOGIC:
Standardize each source variable using sample mean and SD (na.rm = TRUE):
stdX = (X - mean(X)) / sd(X)
Then compute row-wise average across the four standardized variables:
stdEHI_mean = rowMeans(cbind(stdDGEN_5, stdDGEN_10, stdDS_5, stdDS_10),
na.rm = TRUE)
CHECKS/QA:
- Prints pre-standardization means/SDs and post-standardization means/SDs to
confirm ~0 mean and ~1 SD for each standardized variable (allowing for NAs).
- Spot-checks random rows by recomputing standardized values and comparing to
stored columns.
- Saves updated dataset to `eohi2.csv`.
================================================================================
SUMMARY OF ALL CREATED VARIABLES
================================================================================
Total Variables Created: 291
By Script:
- Script 01: 60 variables (past/future recoded items)
- Script 02: 15 variables (present recoded items)
- Script 03: 12 variables (DGEN domain scores)
- Script 04: 4 variables (DGEN time period means)
- Script 05: 3 variables (AOT & CRT scales)
- Script 06: 90 variables (time interval differences)
- Script 07: 18 variables (domain means for differences)
- Script 08: 6 variables (DGEN 5-vs-10 differences)
- Script 09: 11 variables (interval × direction means)
- Script 10: 6 variables (DGEN combined means)
- Script 11: 45 variables (domain-specific EHI scores)
- Script 12: 6 variables (DGEN EHI scores)
- Script 13: 9 variables (EHI domain means)
- Script 14: 5 variables (EHI global means)
- Script 15: 1 variable (education ordinal factor)
- Script 16: 5 variables (standardized EHI summaries and composite)
By Category:
- Time Period Items (75 total):
* Present: 15 items
* Past 5: 15 items
* Past 10: 15 items
* Future 5: 15 items
* Future 10: 15 items
- DGEN Variables (28 total):
* Domain scores: 12 (3 domains × 4 time periods)
* Time period means: 4 (1 per time period)
* 5-vs-10 differences: 6 (3 domains × 2 directions)
* Combined means: 6 (past, future, interval-based, domain-based)
- Cognitive Scales (3 total):
* AOT total
* CRT correct
* CRT intuitive
- Time Differences (90 total):
* NPast_5: 15 differences
* NPast_10: 15 differences
* NFut_5: 15 differences
* NFut_10: 15 differences
* 5.10past: 15 differences
* 5.10fut: 15 differences
- Domain Means for Differences (18 total):
* NPast_5: 3 domain means
* NPast_10: 3 domain means
* NFut_5: 3 domain means
* NFut_10: 3 domain means
* 5.10past: 3 domain means
* 5.10fut: 3 domain means
- Interval × Direction Means (11 total):
* Narrow-scope means: 6 (NPast_5, NPast_10, NFut_5, NFut_10,
X5.10past, X5.10fut)
* Global-scope means: 5 (NPast_global, NFut_global, X5.10_global,
N5_global, N10_global)
- EHI Variables (60 total):
* Domain-specific EHI: 45 (15 items × 3 time intervals)
* DGEN EHI: 6 (3 domains × 2 time intervals)
* Domain means: 9 (3 domains × 3 time intervals)
* Global means: 5 (2 DGEN + 3 domain-specific)
- Standardized EHI Variables (5 total):
* stdDGEN_5, stdDGEN_10, stdDS_5, stdDS_10, stdEHI_mean
================================================================================
DATA PROCESSING NOTES
================================================================================
1. PROCESSING ORDER:
Scripts MUST be run in numerical order (01 → 16) as later scripts depend
on variables created by earlier scripts.
Key Dependencies:
- Script 03 required before Script 04, 08, 10, 12 (DGEN scores)
- Script 04 required before Script 10 (DGEN time period means)
- Script 06 required before Script 07, 09, 11 (time interval differences)
- Script 11 required before Script 13 (domain-specific EHI items)
- Script 12 required before Script 14 (DGEN EHI scores)
- Script 13 required before Script 14 (EHI domain means)
- Script 14 required before Script 16 (uses ehiDGEN_5/10_mean, ehi5/10_global_mean)
- Script 15 can run anytime after raw `demo_edu` is present; run before
analyses needing `edu3`
2. SURVEY VERSION HANDLING:
- Two survey versions (01 and 02) were used
- Scripts 01 and 03 combine these versions
- Preference given to version 01 when both exist
3. MISSING DATA:
- Empty cells and NA values are preserved throughout processing
- Calculations use na.rm=TRUE to exclude missing values from means
- Difference calculations result in NA if either source value is missing
4. QUALITY ASSURANCE:
- Each script includes QA checks with row verification
- Manual calculation checks confirm proper transformations
- Column existence checks prevent errors from missing source data
- Scripts 09-16 include comprehensive QA with first 5 rows displayed
- All EHI scripts (11-14, 16) verify calculations against stored values
- Pass/Fail status reported for all variables in QA-enabled scripts
5. FILE SAVING:
- Most scripts save directly to eohi2.csv
- Scripts 04, 06, and 07 have commented-out write commands for review
- Scripts 08 and 09 save directly to eohi2.csv
- Each script overwrites existing target columns if present
6. SPECIAL NAMING CONVENTIONS:
- "pref_tv" vs "pref_TV" inconsistency maintained from source data
- DGEN variables use underscores (DGEN_past_5_Pref)
- Difference variables use descriptive prefixes (NPast_5_, 5.10past_)
- "X" prefix added to variables starting with numbers (X5.10past_mean)
- Global means use "_global_" to distinguish from narrow-scope means
================================================================================
ITEM REFERENCE GUIDE
================================================================================
15 Core Items (Used across all time periods):
PREFERENCES (5 items):
1. pref_read - Reading preferences
2. pref_music - Music preferences
3. pref_TV/tv - TV watching preferences (note case variation)
4. pref_nap - Napping preferences
5. pref_travel - Travel preferences
PERSONALITY (5 items):
6. pers_extravert - Extraverted personality
7. pers_critical - Critical thinking personality
8. pers_dependable - Dependable personality
9. pers_anxious - Anxious personality
10. pers_complex - Complex personality
VALUES (5 items):
11. val_obey - Value of obedience
12. val_trad - Value of tradition
13. val_opinion - Value of expressing opinions
14. val_performance - Value of performance
15. val_justice - Value of justice
================================================================================
EHI CONCEPT AND INTERPRETATION
================================================================================
ENDURING HEDONIC IMPACT (EHI):
EHI measures the asymmetry between perceived past and future change in
psychological attributes. The concept is based on the premise that people
may perceive their past and future selves differently, even when considering
equivalent time distances.
KEY EHI VARIABLES:
- Domain-Specific EHI (Scripts 11, 13, 14):
Calculated from item-level differences between past and future responses
Formula: NPast - NFut
* Positive values: Greater perceived change from past to present
* Negative values: Greater perceived change from present to future
* Zero: Symmetric perception of past and future change
- Domain-General EHI (Scripts 12, 14):
Calculated from DGEN single-item responses
Formula: DGEN_past - DGEN_fut
* Measures broader temporal self-perception without item-level detail
HIERARCHICAL STRUCTURE:
Level 1: Item-level EHI (45 domain-specific, 6 DGEN)
Level 2: Domain means (9 domain-specific, combining 5 items each)
Level 3: Global means (5 highest-level summaries)
INTERPRETATION:
- EHI > 0: "Past asymmetry" - Person perceives greater change from past
- EHI < 0: "Future asymmetry" - Person perceives greater change to future
- EHI ≈ 0: "Temporal symmetry" - Balanced perception of past/future change
================================================================================
END OF DOCUMENTATION
================================================================================
Last Updated: October 29, 2025
Processing Pipeline: Scripts 01-16