eohi/.history/eohi2/README_Variable_Creation_20251001133606.txt

================================================================================
EOHI2 DATA PROCESSING PIPELINE - VARIABLE CREATION DOCUMENTATION
================================================================================

This README documents the complete data processing pipeline for eohi2.csv.
All processing scripts should be run in the order listed below.

Source File: eohi2.csv
Processing Scripts: dataP 01 through dataP 06

================================================================================
SCRIPT 01: dataP 01 - recode and combine past & future vars.r
================================================================================

PURPOSE:
  Combines responses from two survey versions (01 and 02) and recodes Likert
  scale text responses to numeric values for past and future time periods.

VARIABLES CREATED: 60 total (15 items × 4 time periods)

SOURCE COLUMNS:
  - Set A: 01past5PrefItem_1 through 01fut10ValItem_5 (60 columns)
  - Set B: 02past5PrefItem_1 through 02fut10ValItem_5 (60 columns)

TARGET VARIABLES:
  Past 5 Years (15 variables):
    - past_5_pref_read, past_5_pref_music, past_5_pref_TV, past_5_pref_nap,
      past_5_pref_travel
    - past_5_pers_extravert, past_5_pers_critical, past_5_pers_dependable,
      past_5_pers_anxious, past_5_pers_complex
    - past_5_val_obey, past_5_val_trad, past_5_val_opinion,
      past_5_val_performance, past_5_val_justice

  Past 10 Years (15 variables):
    - past_10_pref_read, past_10_pref_music, past_10_pref_TV, past_10_pref_nap,
      past_10_pref_travel
    - past_10_pers_extravert, past_10_pers_critical, past_10_pers_dependable,
      past_10_pers_anxious, past_10_pers_complex
    - past_10_val_obey, past_10_val_trad, past_10_val_opinion,
      past_10_val_performance, past_10_val_justice

  Future 5 Years (15 variables):
    - fut_5_pref_read, fut_5_pref_music, fut_5_pref_TV, fut_5_pref_nap,
      fut_5_pref_travel
    - fut_5_pers_extravert, fut_5_pers_critical, fut_5_pers_dependable,
      fut_5_pers_anxious, fut_5_pers_complex
    - fut_5_val_obey, fut_5_val_trad, fut_5_val_opinion,
      fut_5_val_performance, fut_5_val_justice

  Future 10 Years (15 variables):
    - fut_10_pref_read, fut_10_pref_music, fut_10_pref_TV, fut_10_pref_nap,
      fut_10_pref_travel
    - fut_10_pers_extravert, fut_10_pers_critical, fut_10_pers_dependable,
      fut_10_pers_anxious, fut_10_pers_complex
    - fut_10_val_obey, fut_10_val_trad, fut_10_val_opinion,
      fut_10_val_performance, fut_10_val_justice

TRANSFORMATION LOGIC:
  Step 1: Combine responses from Set A (01) and Set B (02)
          - If Set A has a value, use Set A
          - If Set A is empty, use Set B

  Step 2: Recode text responses to numeric values:
          "Strongly Disagree"            → -3
          "Disagree"                     → -2
          "Somewhat Disagree"            → -1
          "Neither Agree nor Disagree"   →  0
          "Somewhat Agree"               →  1
          "Agree"                        →  2
          "Strongly Agree"               →  3
          Empty/Missing                  → NA

ITEM DOMAINS:
  - Preferences (pref): Reading, Music, TV, Nap, Travel
  - Personality (pers): Extravert, Critical, Dependable, Anxious, Complex
  - Values (val): Obey, Tradition, Opinion, Performance, Justice


================================================================================
SCRIPT 02: dataP 02 - recode present VARS.r
================================================================================

PURPOSE:
  Recodes present-time Likert scale text responses to numeric values.

VARIABLES CREATED: 15 total

SOURCE COLUMNS:
  - prePrefItem_1 through prePrefItem_5 (5 columns)
  - prePersItem_1 through prePersItem_5 (5 columns)
  - preValItem_1 through preValItem_5 (5 columns)

TARGET VARIABLES:
  Present Time (15 variables):
    - present_pref_read, present_pref_music, present_pref_tv, present_pref_nap,
      present_pref_travel
    - present_pers_extravert, present_pers_critical, present_pers_dependable,
      present_pers_anxious, present_pers_complex
    - present_val_obey, present_val_trad, present_val_opinion,
      present_val_performance, present_val_justice

TRANSFORMATION LOGIC:
  Recode text responses to numeric values:
    "Strongly Disagree"            → -3
    "Disagree"                     → -2
    "Somewhat Disagree"            → -1
    "Neither Agree nor Disagree"   →  0
    "Somewhat Agree"               →  1
    "Agree"                        →  2
    "Strongly Agree"               →  3
    Empty/Missing                  → NA

SPECIAL NOTE:
  Present time uses "present_pref_tv" (lowercase) while past/future use
  "past_5_pref_TV" (uppercase). This is intentional and preserved from the
  original data structure.


================================================================================
SCRIPT 03: dataP 03 - recode DGEN vars.r
================================================================================

PURPOSE:
  Combines DGEN (domain general) responses from two survey versions (01 and 02).
  These are single-item measures for each domain/time combination.
  NO RECODING - just copies numeric values as-is.

VARIABLES CREATED: 12 total (3 domains × 4 time periods)

SOURCE COLUMNS:
  - Set A: 01past5PrefDGEN_1, 01past5PersDGEN_1, 01past5ValDGEN_1, etc.
  - Set B: 02past5PrefDGEN_1, 02past5PersDGEN_1, 02past5ValDGEN_1, etc.

TARGET VARIABLES:
  - DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
  - DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
  - DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
  - DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val

TRANSFORMATION LOGIC:
  - If Set A (01) has a value, use Set A
  - If Set A is empty, use Set B (02)
  - NO RECODING: Values are copied directly as numeric

SPECIAL NOTES:
  - Future columns in raw data use "_8" suffix for Pref/Pers items
  - Future Val columns use "ValuesDGEN" spelling in Set A, "ValDGEN" in Set B


================================================================================
SCRIPT 04: dataP 04 - DGEN means.r
================================================================================

PURPOSE:
  Calculates mean DGEN scores by averaging the three domain scores (Preferences,
  Personality, Values) for each time period.

VARIABLES CREATED: 4 total (1 per time period)

SOURCE COLUMNS:
  - DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
  - DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
  - DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
  - DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val

TARGET VARIABLES:
  - DGEN_past_5_mean
  - DGEN_past_10_mean
  - DGEN_fut_5_mean
  - DGEN_fut_10_mean

TRANSFORMATION LOGIC:
  Each mean = (Pref + Pers + Val) / 3
  - NA values are excluded from calculation (na.rm = TRUE)


================================================================================
SCRIPT 05: dataP 05 - recode scales VARS.r
================================================================================

PURPOSE:
  Processes two cognitive scales:
  1. AOT (Actively Open-minded Thinking): 8-item scale with reverse coding
  2. CRT (Cognitive Reflection Test): 3-item test with correct/intuitive scoring

VARIABLES CREATED: 3 total

SOURCE COLUMNS:
  AOT Scale:
    - aot_1, aot_2, aot_3, aot_4, aot_5, aot_6, aot_7, aot_8

  CRT Test:
    - crt_1, crt_2, crt_3

TARGET VARIABLES:
  - aot_total     (mean of 8 items with reverse coding)
  - crt_correct   (proportion of correct answers)
  - crt_int       (proportion of intuitive/incorrect answers)

TRANSFORMATION LOGIC:

  AOT Scale (aot_total):
    1. Items 4, 5, 6, 7 are reverse coded by multiplying by -1
    2. Calculate mean of all 8 items (with reverse coding applied)
    3. Original source values are NOT modified in the dataframe
    4. NA values excluded from calculation (na.rm = TRUE)

  CRT Correct (crt_correct):
    Correct answers:
      - crt_1: "5 cents"
      - crt_2: "5 minutes"
      - crt_3: "47 days"
    Calculation: (Number of correct answers) / (Number of non-missing answers)

  CRT Intuitive (crt_int):
    Intuitive (common incorrect) answers:
      - crt_1: "10 cents"
      - crt_2: "100 minutes"
      - crt_3: "24 days"
    Calculation: (Number of intuitive answers) / (Number of non-missing answers)

SPECIAL NOTES:
  - CRT scoring is case-insensitive and trims whitespace
  - Both CRT scores are proportions (0.00 to 1.00)
  - Empty/missing CRT responses are excluded from denominator


================================================================================
SCRIPT 06: dataP 06 - time interval differences.r
================================================================================

PURPOSE:
  Calculates absolute differences between time intervals to measure perceived
  change across time periods for all 15 items.

VARIABLES CREATED: 90 total (6 difference types × 15 items)

SOURCE COLUMNS:
  - present_pref_read through present_val_justice (15 columns)
  - past_5_pref_read through past_5_val_justice (15 columns)
  - past_10_pref_read through past_10_val_justice (15 columns)
  - fut_5_pref_read through fut_5_val_justice (15 columns)
  - fut_10_pref_read through fut_10_val_justice (15 columns)

TARGET VARIABLES (by difference type):

  NPast_5 (Present vs Past 5 years) - 15 variables:
    Formula: |present - past_5|
    - NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap,
      NPast_5_pref_travel
    - NPast_5_pers_extravert, NPast_5_pers_critical, NPast_5_pers_dependable,
      NPast_5_pers_anxious, NPast_5_pers_complex
    - NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion,
      NPast_5_val_performance, NPast_5_val_justice

  NPast_10 (Present vs Past 10 years) - 15 variables:
    Formula: |present - past_10|
    - NPast_10_pref_read, NPast_10_pref_music, NPast_10_pref_TV,
      NPast_10_pref_nap, NPast_10_pref_travel
    - NPast_10_pers_extravert, NPast_10_pers_critical, NPast_10_pers_dependable,
      NPast_10_pers_anxious, NPast_10_pers_complex
    - NPast_10_val_obey, NPast_10_val_trad, NPast_10_val_opinion,
      NPast_10_val_performance, NPast_10_val_justice

  NFut_5 (Present vs Future 5 years) - 15 variables:
    Formula: |present - fut_5|
    - NFut_5_pref_read, NFut_5_pref_music, NFut_5_pref_TV, NFut_5_pref_nap,
      NFut_5_pref_travel
    - NFut_5_pers_extravert, NFut_5_pers_critical, NFut_5_pers_dependable,
      NFut_5_pers_anxious, NFut_5_pers_complex
    - NFut_5_val_obey, NFut_5_val_trad, NFut_5_val_opinion,
      NFut_5_val_performance, NFut_5_val_justice

  NFut_10 (Present vs Future 10 years) - 15 variables:
    Formula: |present - fut_10|
    - NFut_10_pref_read, NFut_10_pref_music, NFut_10_pref_TV, NFut_10_pref_nap,
      NFut_10_pref_travel
    - NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable,
      NFut_10_pers_anxious, NFut_10_pers_complex
    - NFut_10_val_obey, NFut_10_val_trad, NFut_10_val_opinion,
      NFut_10_val_performance, NFut_10_val_justice

  5.10past (Past 5 vs Past 10 years) - 15 variables:
    Formula: |past_5 - past_10|
    - 5.10past_pref_read, 5.10past_pref_music, 5.10past_pref_TV,
      5.10past_pref_nap, 5.10past_pref_travel
    - 5.10past_pers_extravert, 5.10past_pers_critical, 5.10past_pers_dependable,
      5.10past_pers_anxious, 5.10past_pers_complex
    - 5.10past_val_obey, 5.10past_val_trad, 5.10past_val_opinion,
      5.10past_val_performance, 5.10past_val_justice

  5.10fut (Future 5 vs Future 10 years) - 15 variables:
    Formula: |fut_5 - fut_10|
    - 5.10fut_pref_read, 5.10fut_pref_music, 5.10fut_pref_TV, 5.10fut_pref_nap,
      5.10fut_pref_travel
    - 5.10fut_pers_extravert, 5.10fut_pers_critical, 5.10fut_pers_dependable,
      5.10fut_pers_anxious, 5.10fut_pers_complex
    - 5.10fut_val_obey, 5.10fut_val_trad, 5.10fut_val_opinion,
      5.10fut_val_performance, 5.10fut_val_justice

TRANSFORMATION LOGIC:
  All calculations use absolute differences:
    - NPast_5: |present_[item] - past_5_[item]|
    - NPast_10: |present_[item] - past_10_[item]|
    - NFut_5: |present_[item] - fut_5_[item]|
    - NFut_10: |present_[item] - fut_10_[item]|
    - 5.10past: |past_5_[item] - past_10_[item]|
    - 5.10fut: |fut_5_[item] - fut_10_[item]|

  Result: Always positive values representing magnitude of change
  Missing values in either source column result in NA

SPECIAL NOTES:
  - Present time uses "pref_tv" (lowercase) while past/future use "pref_TV"
    (uppercase), so script handles this naming inconsistency
  - All values are absolute differences (non-negative)


================================================================================
SUMMARY OF ALL CREATED VARIABLES
================================================================================

Total Variables Created: 184

By Script:
  - Script 01: 60 variables (past/future recoded items)
  - Script 02: 15 variables (present recoded items)
  - Script 03: 12 variables (DGEN domain scores)
  - Script 04:  4 variables (DGEN means)
  - Script 05:  3 variables (AOT & CRT scales)
  - Script 06: 90 variables (time interval differences)

By Category:
  - Time Period Items (75 total):
      * Present: 15 items
      * Past 5: 15 items
      * Past 10: 15 items
      * Future 5: 15 items
      * Future 10: 15 items

  - DGEN Variables (16 total):
      * Domain scores: 12 (3 domains × 4 time periods)
      * Mean scores: 4 (1 per time period)

  - Cognitive Scales (3 total):
      * AOT total
      * CRT correct
      * CRT intuitive

  - Time Differences (90 total):
      * NPast_5: 15 differences
      * NPast_10: 15 differences
      * NFut_5: 15 differences
      * NFut_10: 15 differences
      * 5.10past: 15 differences
      * 5.10fut: 15 differences


================================================================================
DATA PROCESSING NOTES
================================================================================

1. PROCESSING ORDER:
   Scripts MUST be run in numerical order (01 → 06) as later scripts depend
   on variables created by earlier scripts.

2. SURVEY VERSION HANDLING:
   - Two survey versions (01 and 02) were used
   - Scripts 01 and 03 combine these versions
   - Preference given to version 01 when both exist

3. MISSING DATA:
   - Empty cells and NA values are preserved throughout processing
   - Calculations use na.rm=TRUE to exclude missing values from means
   - Difference calculations result in NA if either source value is missing

4. QUALITY ASSURANCE:
   - Each script includes QA checks with random row verification
   - Manual calculation checks confirm proper transformations
   - Column existence checks prevent errors from missing source data

5. FILE SAVING:
   - Most scripts save directly to eohi2.csv
   - Scripts 04 and 06 have commented-out write commands for review
   - Each script overwrites existing target columns if present

6. SPECIAL NAMING CONVENTIONS:
   - "pref_tv" vs "pref_TV" inconsistency maintained from source data
   - DGEN variables use underscores (DGEN_past_5_Pref)
   - Difference variables use descriptive prefixes (NPast_5_, 5.10past_)


================================================================================
ITEM REFERENCE GUIDE
================================================================================

15 Core Items (Used across all time periods):

PREFERENCES (5 items):
  1. pref_read    - Reading preferences
  2. pref_music   - Music preferences
  3. pref_TV/tv   - TV watching preferences (note case variation)
  4. pref_nap     - Napping preferences
  5. pref_travel  - Travel preferences

PERSONALITY (5 items):
  6. pers_extravert   - Extraverted personality
  7. pers_critical    - Critical thinking personality
  8. pers_dependable  - Dependable personality
  9. pers_anxious     - Anxious personality
 10. pers_complex     - Complex personality

VALUES (5 items):
 11. val_obey         - Value of obedience
 12. val_trad         - Value of tradition
 13. val_opinion      - Value of expressing opinions
 14. val_performance  - Value of performance
 15. val_justice      - Value of justice


================================================================================
END OF DOCUMENTATION
================================================================================
Last Updated: October 1, 2025