================================================================================ EOHI2 DATA PROCESSING PIPELINE - VARIABLE CREATION DOCUMENTATION ================================================================================ This README documents the complete data processing pipeline for eohi2.csv. All processing scripts should be run in the order listed below. Source File: eohi2.csv Processing Scripts: dataP 01 through dataP 07 ================================================================================ SCRIPT 01: dataP 01 - recode and combine past & future vars.r ================================================================================ PURPOSE: Combines responses from two survey versions (01 and 02) and recodes Likert scale text responses to numeric values for past and future time periods. VARIABLES CREATED: 60 total (15 items × 4 time periods) SOURCE COLUMNS: - Set A: 01past5PrefItem_1 through 01fut10ValItem_5 (60 columns) - Set B: 02past5PrefItem_1 through 02fut10ValItem_5 (60 columns) TARGET VARIABLES: Past 5 Years (15 variables): - past_5_pref_read, past_5_pref_music, past_5_pref_TV, past_5_pref_nap, past_5_pref_travel - past_5_pers_extravert, past_5_pers_critical, past_5_pers_dependable, past_5_pers_anxious, past_5_pers_complex - past_5_val_obey, past_5_val_trad, past_5_val_opinion, past_5_val_performance, past_5_val_justice Past 10 Years (15 variables): - past_10_pref_read, past_10_pref_music, past_10_pref_TV, past_10_pref_nap, past_10_pref_travel - past_10_pers_extravert, past_10_pers_critical, past_10_pers_dependable, past_10_pers_anxious, past_10_pers_complex - past_10_val_obey, past_10_val_trad, past_10_val_opinion, past_10_val_performance, past_10_val_justice Future 5 Years (15 variables): - fut_5_pref_read, fut_5_pref_music, fut_5_pref_TV, fut_5_pref_nap, fut_5_pref_travel - fut_5_pers_extravert, fut_5_pers_critical, fut_5_pers_dependable, fut_5_pers_anxious, fut_5_pers_complex - fut_5_val_obey, fut_5_val_trad, fut_5_val_opinion, fut_5_val_performance, fut_5_val_justice Future 10 Years (15 variables): - fut_10_pref_read, fut_10_pref_music, fut_10_pref_TV, fut_10_pref_nap, fut_10_pref_travel - fut_10_pers_extravert, fut_10_pers_critical, fut_10_pers_dependable, fut_10_pers_anxious, fut_10_pers_complex - fut_10_val_obey, fut_10_val_trad, fut_10_val_opinion, fut_10_val_performance, fut_10_val_justice TRANSFORMATION LOGIC: Step 1: Combine responses from Set A (01) and Set B (02) - If Set A has a value, use Set A - If Set A is empty, use Set B Step 2: Recode text responses to numeric values: "Strongly Disagree" → -3 "Disagree" → -2 "Somewhat Disagree" → -1 "Neither Agree nor Disagree" → 0 "Somewhat Agree" → 1 "Agree" → 2 "Strongly Agree" → 3 Empty/Missing → NA ITEM DOMAINS: - Preferences (pref): Reading, Music, TV, Nap, Travel - Personality (pers): Extravert, Critical, Dependable, Anxious, Complex - Values (val): Obey, Tradition, Opinion, Performance, Justice ================================================================================ SCRIPT 02: dataP 02 - recode present VARS.r ================================================================================ PURPOSE: Recodes present-time Likert scale text responses to numeric values. VARIABLES CREATED: 15 total SOURCE COLUMNS: - prePrefItem_1 through prePrefItem_5 (5 columns) - prePersItem_1 through prePersItem_5 (5 columns) - preValItem_1 through preValItem_5 (5 columns) TARGET VARIABLES: Present Time (15 variables): - present_pref_read, present_pref_music, present_pref_tv, present_pref_nap, present_pref_travel - present_pers_extravert, present_pers_critical, present_pers_dependable, present_pers_anxious, present_pers_complex - present_val_obey, present_val_trad, present_val_opinion, present_val_performance, present_val_justice TRANSFORMATION LOGIC: Recode text responses to numeric values: "Strongly Disagree" → -3 "Disagree" → -2 "Somewhat Disagree" → -1 "Neither Agree nor Disagree" → 0 "Somewhat Agree" → 1 "Agree" → 2 "Strongly Agree" → 3 Empty/Missing → NA SPECIAL NOTE: Present time uses "present_pref_tv" (lowercase) while past/future use "past_5_pref_TV" (uppercase). This is intentional and preserved from the original data structure. ================================================================================ SCRIPT 03: dataP 03 - recode DGEN vars.r ================================================================================ PURPOSE: Combines DGEN (domain general) responses from two survey versions (01 and 02). These are single-item measures for each domain/time combination. NO RECODING - just copies numeric values as-is. VARIABLES CREATED: 12 total (3 domains × 4 time periods) SOURCE COLUMNS: - Set A: 01past5PrefDGEN_1, 01past5PersDGEN_1, 01past5ValDGEN_1, etc. - Set B: 02past5PrefDGEN_1, 02past5PersDGEN_1, 02past5ValDGEN_1, etc. TARGET VARIABLES: - DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val - DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val - DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val - DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val TRANSFORMATION LOGIC: - If Set A (01) has a value, use Set A - If Set A is empty, use Set B (02) - NO RECODING: Values are copied directly as numeric SPECIAL NOTES: - Future columns in raw data use "_8" suffix for Pref/Pers items - Future Val columns use "ValuesDGEN" spelling in Set A, "ValDGEN" in Set B ================================================================================ SCRIPT 04: dataP 04 - DGEN means.r ================================================================================ PURPOSE: Calculates mean DGEN scores by averaging the three domain scores (Preferences, Personality, Values) for each time period. VARIABLES CREATED: 4 total (1 per time period) SOURCE COLUMNS: - DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val - DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val - DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val - DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val TARGET VARIABLES: - DGEN_past_5_mean - DGEN_past_10_mean - DGEN_fut_5_mean - DGEN_fut_10_mean TRANSFORMATION LOGIC: Each mean = (Pref + Pers + Val) / 3 - NA values are excluded from calculation (na.rm = TRUE) ================================================================================ SCRIPT 05: dataP 05 - recode scales VARS.r ================================================================================ PURPOSE: Processes two cognitive scales: 1. AOT (Actively Open-minded Thinking): 8-item scale with reverse coding 2. CRT (Cognitive Reflection Test): 3-item test with correct/intuitive scoring VARIABLES CREATED: 3 total SOURCE COLUMNS: AOT Scale: - aot_1, aot_2, aot_3, aot_4, aot_5, aot_6, aot_7, aot_8 CRT Test: - crt_1, crt_2, crt_3 TARGET VARIABLES: - aot_total (mean of 8 items with reverse coding) - crt_correct (proportion of correct answers) - crt_int (proportion of intuitive/incorrect answers) TRANSFORMATION LOGIC: AOT Scale (aot_total): 1. Items 4, 5, 6, 7 are reverse coded by multiplying by -1 2. Calculate mean of all 8 items (with reverse coding applied) 3. Original source values are NOT modified in the dataframe 4. NA values excluded from calculation (na.rm = TRUE) CRT Correct (crt_correct): Correct answers: - crt_1: "5 cents" - crt_2: "5 minutes" - crt_3: "47 days" Calculation: (Number of correct answers) / (Number of non-missing answers) CRT Intuitive (crt_int): Intuitive (common incorrect) answers: - crt_1: "10 cents" - crt_2: "100 minutes" - crt_3: "24 days" Calculation: (Number of intuitive answers) / (Number of non-missing answers) SPECIAL NOTES: - CRT scoring is case-insensitive and trims whitespace - Both CRT scores are proportions (0.00 to 1.00) - Empty/missing CRT responses are excluded from denominator ================================================================================ SCRIPT 06: dataP 06 - time interval differences.r ================================================================================ PURPOSE: Calculates absolute differences between time intervals to measure perceived change across time periods for all 15 items. VARIABLES CREATED: 90 total (6 difference types × 15 items) SOURCE COLUMNS: - present_pref_read through present_val_justice (15 columns) - past_5_pref_read through past_5_val_justice (15 columns) - past_10_pref_read through past_10_val_justice (15 columns) - fut_5_pref_read through fut_5_val_justice (15 columns) - fut_10_pref_read through fut_10_val_justice (15 columns) TARGET VARIABLES (by difference type): NPast_5 (Present vs Past 5 years) - 15 variables: Formula: |present - past_5| - NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap, NPast_5_pref_travel - NPast_5_pers_extravert, NPast_5_pers_critical, NPast_5_pers_dependable, NPast_5_pers_anxious, NPast_5_pers_complex - NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion, NPast_5_val_performance, NPast_5_val_justice NPast_10 (Present vs Past 10 years) - 15 variables: Formula: |present - past_10| - NPast_10_pref_read, NPast_10_pref_music, NPast_10_pref_TV, NPast_10_pref_nap, NPast_10_pref_travel - NPast_10_pers_extravert, NPast_10_pers_critical, NPast_10_pers_dependable, NPast_10_pers_anxious, NPast_10_pers_complex - NPast_10_val_obey, NPast_10_val_trad, NPast_10_val_opinion, NPast_10_val_performance, NPast_10_val_justice NFut_5 (Present vs Future 5 years) - 15 variables: Formula: |present - fut_5| - NFut_5_pref_read, NFut_5_pref_music, NFut_5_pref_TV, NFut_5_pref_nap, NFut_5_pref_travel - NFut_5_pers_extravert, NFut_5_pers_critical, NFut_5_pers_dependable, NFut_5_pers_anxious, NFut_5_pers_complex - NFut_5_val_obey, NFut_5_val_trad, NFut_5_val_opinion, NFut_5_val_performance, NFut_5_val_justice NFut_10 (Present vs Future 10 years) - 15 variables: Formula: |present - fut_10| - NFut_10_pref_read, NFut_10_pref_music, NFut_10_pref_TV, NFut_10_pref_nap, NFut_10_pref_travel - NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable, NFut_10_pers_anxious, NFut_10_pers_complex - NFut_10_val_obey, NFut_10_val_trad, NFut_10_val_opinion, NFut_10_val_performance, NFut_10_val_justice 5.10past (Past 5 vs Past 10 years) - 15 variables: Formula: |past_5 - past_10| - 5.10past_pref_read, 5.10past_pref_music, 5.10past_pref_TV, 5.10past_pref_nap, 5.10past_pref_travel - 5.10past_pers_extravert, 5.10past_pers_critical, 5.10past_pers_dependable, 5.10past_pers_anxious, 5.10past_pers_complex - 5.10past_val_obey, 5.10past_val_trad, 5.10past_val_opinion, 5.10past_val_performance, 5.10past_val_justice 5.10fut (Future 5 vs Future 10 years) - 15 variables: Formula: |fut_5 - fut_10| - 5.10fut_pref_read, 5.10fut_pref_music, 5.10fut_pref_TV, 5.10fut_pref_nap, 5.10fut_pref_travel - 5.10fut_pers_extravert, 5.10fut_pers_critical, 5.10fut_pers_dependable, 5.10fut_pers_anxious, 5.10fut_pers_complex - 5.10fut_val_obey, 5.10fut_val_trad, 5.10fut_val_opinion, 5.10fut_val_performance, 5.10fut_val_justice TRANSFORMATION LOGIC: All calculations use absolute differences: - NPast_5: |present_[item] - past_5_[item]| - NPast_10: |present_[item] - past_10_[item]| - NFut_5: |present_[item] - fut_5_[item]| - NFut_10: |present_[item] - fut_10_[item]| - 5.10past: |past_5_[item] - past_10_[item]| - 5.10fut: |fut_5_[item] - fut_10_[item]| Result: Always positive values representing magnitude of change Missing values in either source column result in NA SPECIAL NOTES: - Present time uses "pref_tv" (lowercase) while past/future use "pref_TV" (uppercase), so script handles this naming inconsistency - All values are absolute differences (non-negative) ================================================================================ SCRIPT 07: dataP 07 - domain means.r ================================================================================ PURPOSE: Calculates domain-level means by averaging the 5 items within each domain (Preferences, Personality, Values) for each of the 6 time interval difference types. VARIABLES CREATED: 18 total (6 time intervals × 3 domains) SOURCE COLUMNS: - NPast_5_pref_read through NPast_5_val_justice (15 columns) - NPast_10_pref_read through NPast_10_val_justice (15 columns) - NFut_5_pref_read through NFut_5_val_justice (15 columns) - NFut_10_pref_read through NFut_10_val_justice (15 columns) - 5.10past_pref_read through 5.10past_val_justice (15 columns) - 5.10fut_pref_read through 5.10fut_val_justice (15 columns) Total: 90 difference columns (created in Script 06) TARGET VARIABLES: NPast_5 Domain Means (3 variables): - NPast_5_pref_MEAN (mean of 5 preference items) - NPast_5_pers_MEAN (mean of 5 personality items) - NPast_5_val_MEAN (mean of 5 values items) NPast_10 Domain Means (3 variables): - NPast_10_pref_MEAN - NPast_10_pers_MEAN - NPast_10_val_MEAN NFut_5 Domain Means (3 variables): - NFut_5_pref_MEAN - NFut_5_pers_MEAN - NFut_5_val_MEAN NFut_10 Domain Means (3 variables): - NFut_10_pref_MEAN - NFut_10_pers_MEAN - NFut_10_val_MEAN 5.10past Domain Means (3 variables): - 5.10past_pref_MEAN - 5.10past_pers_MEAN - 5.10past_val_MEAN 5.10fut Domain Means (3 variables): - 5.10fut_pref_MEAN - 5.10fut_pers_MEAN - 5.10fut_val_MEAN TRANSFORMATION LOGIC: Each domain mean = average of 5 items within that domain Example for NPast_5_pref_MEAN: = mean(NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap, NPast_5_pref_travel) Example for NFut_10_pers_MEAN: = mean(NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable, NFut_10_pers_anxious, NFut_10_pers_complex) NA values excluded from calculation (na.rm = TRUE) PURPOSE OF DOMAIN MEANS: - Provides higher-level summary of perceived change by domain - Reduces item-level noise by aggregating across related items - Enables domain-level comparisons across time intervals - Parallel to Script 04 (DGEN means) but for difference scores instead of raw DGEN ratings SPECIAL NOTES: - This script depends on Script 06 being run first - Creates domain-level aggregates of absolute difference scores - All means are averages of non-negative values (absolute differences) ================================================================================ SUMMARY OF ALL CREATED VARIABLES ================================================================================ Total Variables Created: 202 By Script: - Script 01: 60 variables (past/future recoded items) - Script 02: 15 variables (present recoded items) - Script 03: 12 variables (DGEN domain scores) - Script 04: 4 variables (DGEN means) - Script 05: 3 variables (AOT & CRT scales) - Script 06: 90 variables (time interval differences) - Script 07: 18 variables (domain means for differences) By Category: - Time Period Items (75 total): * Present: 15 items * Past 5: 15 items * Past 10: 15 items * Future 5: 15 items * Future 10: 15 items - DGEN Variables (16 total): * Domain scores: 12 (3 domains × 4 time periods) * Mean scores: 4 (1 per time period) - Cognitive Scales (3 total): * AOT total * CRT correct * CRT intuitive - Time Differences (90 total): * NPast_5: 15 differences * NPast_10: 15 differences * NFut_5: 15 differences * NFut_10: 15 differences * 5.10past: 15 differences * 5.10fut: 15 differences - Domain Means for Differences (18 total): * NPast_5: 3 domain means * NPast_10: 3 domain means * NFut_5: 3 domain means * NFut_10: 3 domain means * 5.10past: 3 domain means * 5.10fut: 3 domain means ================================================================================ DATA PROCESSING NOTES ================================================================================ 1. PROCESSING ORDER: Scripts MUST be run in numerical order (01 → 07) as later scripts depend on variables created by earlier scripts. 2. SURVEY VERSION HANDLING: - Two survey versions (01 and 02) were used - Scripts 01 and 03 combine these versions - Preference given to version 01 when both exist 3. MISSING DATA: - Empty cells and NA values are preserved throughout processing - Calculations use na.rm=TRUE to exclude missing values from means - Difference calculations result in NA if either source value is missing 4. QUALITY ASSURANCE: - Each script includes QA checks with random row verification - Manual calculation checks confirm proper transformations - Column existence checks prevent errors from missing source data 5. FILE SAVING: - Most scripts save directly to eohi2.csv - Scripts 04, 06, and 07 have commented-out write commands for review - Each script overwrites existing target columns if present 6. SPECIAL NAMING CONVENTIONS: - "pref_tv" vs "pref_TV" inconsistency maintained from source data - DGEN variables use underscores (DGEN_past_5_Pref) - Difference variables use descriptive prefixes (NPast_5_, 5.10past_) ================================================================================ ITEM REFERENCE GUIDE ================================================================================ 15 Core Items (Used across all time periods): PREFERENCES (5 items): 1. pref_read - Reading preferences 2. pref_music - Music preferences 3. pref_TV/tv - TV watching preferences (note case variation) 4. pref_nap - Napping preferences 5. pref_travel - Travel preferences PERSONALITY (5 items): 6. pers_extravert - Extraverted personality 7. pers_critical - Critical thinking personality 8. pers_dependable - Dependable personality 9. pers_anxious - Anxious personality 10. pers_complex - Complex personality VALUES (5 items): 11. val_obey - Value of obedience 12. val_trad - Value of tradition 13. val_opinion - Value of expressing opinions 14. val_performance - Value of performance 15. val_justice - Value of justice ================================================================================ END OF DOCUMENTATION ================================================================================ Last Updated: October 1, 2025