eohi/.history/eohi2/README_Variable_Creation_20251001154444.txt
2025-12-23 15:47:09 -05:00

513 lines
19 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

================================================================================
EOHI2 DATA PROCESSING PIPELINE - VARIABLE CREATION DOCUMENTATION
================================================================================
This README documents the complete data processing pipeline for eohi2.csv.
All processing scripts should be run in the order listed below.
Source File: eohi2.csv
Processing Scripts: dataP 01 through dataP 07
================================================================================
SCRIPT 01: dataP 01 - recode and combine past & future vars.r
================================================================================
PURPOSE:
Combines responses from two survey versions (01 and 02) and recodes Likert
scale text responses to numeric values for past and future time periods.
VARIABLES CREATED: 60 total (15 items × 4 time periods)
SOURCE COLUMNS:
- Set A: 01past5PrefItem_1 through 01fut10ValItem_5 (60 columns)
- Set B: 02past5PrefItem_1 through 02fut10ValItem_5 (60 columns)
TARGET VARIABLES:
Past 5 Years (15 variables):
- past_5_pref_read, past_5_pref_music, past_5_pref_TV, past_5_pref_nap,
past_5_pref_travel
- past_5_pers_extravert, past_5_pers_critical, past_5_pers_dependable,
past_5_pers_anxious, past_5_pers_complex
- past_5_val_obey, past_5_val_trad, past_5_val_opinion,
past_5_val_performance, past_5_val_justice
Past 10 Years (15 variables):
- past_10_pref_read, past_10_pref_music, past_10_pref_TV, past_10_pref_nap,
past_10_pref_travel
- past_10_pers_extravert, past_10_pers_critical, past_10_pers_dependable,
past_10_pers_anxious, past_10_pers_complex
- past_10_val_obey, past_10_val_trad, past_10_val_opinion,
past_10_val_performance, past_10_val_justice
Future 5 Years (15 variables):
- fut_5_pref_read, fut_5_pref_music, fut_5_pref_TV, fut_5_pref_nap,
fut_5_pref_travel
- fut_5_pers_extravert, fut_5_pers_critical, fut_5_pers_dependable,
fut_5_pers_anxious, fut_5_pers_complex
- fut_5_val_obey, fut_5_val_trad, fut_5_val_opinion,
fut_5_val_performance, fut_5_val_justice
Future 10 Years (15 variables):
- fut_10_pref_read, fut_10_pref_music, fut_10_pref_TV, fut_10_pref_nap,
fut_10_pref_travel
- fut_10_pers_extravert, fut_10_pers_critical, fut_10_pers_dependable,
fut_10_pers_anxious, fut_10_pers_complex
- fut_10_val_obey, fut_10_val_trad, fut_10_val_opinion,
fut_10_val_performance, fut_10_val_justice
TRANSFORMATION LOGIC:
Step 1: Combine responses from Set A (01) and Set B (02)
- If Set A has a value, use Set A
- If Set A is empty, use Set B
Step 2: Recode text responses to numeric values:
"Strongly Disagree" → -3
"Disagree" → -2
"Somewhat Disagree" → -1
"Neither Agree nor Disagree" → 0
"Somewhat Agree" → 1
"Agree" → 2
"Strongly Agree" → 3
Empty/Missing → NA
ITEM DOMAINS:
- Preferences (pref): Reading, Music, TV, Nap, Travel
- Personality (pers): Extravert, Critical, Dependable, Anxious, Complex
- Values (val): Obey, Tradition, Opinion, Performance, Justice
================================================================================
SCRIPT 02: dataP 02 - recode present VARS.r
================================================================================
PURPOSE:
Recodes present-time Likert scale text responses to numeric values.
VARIABLES CREATED: 15 total
SOURCE COLUMNS:
- prePrefItem_1 through prePrefItem_5 (5 columns)
- prePersItem_1 through prePersItem_5 (5 columns)
- preValItem_1 through preValItem_5 (5 columns)
TARGET VARIABLES:
Present Time (15 variables):
- present_pref_read, present_pref_music, present_pref_tv, present_pref_nap,
present_pref_travel
- present_pers_extravert, present_pers_critical, present_pers_dependable,
present_pers_anxious, present_pers_complex
- present_val_obey, present_val_trad, present_val_opinion,
present_val_performance, present_val_justice
TRANSFORMATION LOGIC:
Recode text responses to numeric values:
"Strongly Disagree" → -3
"Disagree" → -2
"Somewhat Disagree" → -1
"Neither Agree nor Disagree" → 0
"Somewhat Agree" → 1
"Agree" → 2
"Strongly Agree" → 3
Empty/Missing → NA
SPECIAL NOTE:
Present time uses "present_pref_tv" (lowercase) while past/future use
"past_5_pref_TV" (uppercase). This is intentional and preserved from the
original data structure.
================================================================================
SCRIPT 03: dataP 03 - recode DGEN vars.r
================================================================================
PURPOSE:
Combines DGEN (domain general) responses from two survey versions (01 and 02).
These are single-item measures for each domain/time combination.
NO RECODING - just copies numeric values as-is.
VARIABLES CREATED: 12 total (3 domains × 4 time periods)
SOURCE COLUMNS:
- Set A: 01past5PrefDGEN_1, 01past5PersDGEN_1, 01past5ValDGEN_1, etc.
- Set B: 02past5PrefDGEN_1, 02past5PersDGEN_1, 02past5ValDGEN_1, etc.
TARGET VARIABLES:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TRANSFORMATION LOGIC:
- If Set A (01) has a value, use Set A
- If Set A is empty, use Set B (02)
- NO RECODING: Values are copied directly as numeric
SPECIAL NOTES:
- Future columns in raw data use "_8" suffix for Pref/Pers items
- Future Val columns use "ValuesDGEN" spelling in Set A, "ValDGEN" in Set B
================================================================================
SCRIPT 04: dataP 04 - DGEN means.r
================================================================================
PURPOSE:
Calculates mean DGEN scores by averaging the three domain scores (Preferences,
Personality, Values) for each time period.
VARIABLES CREATED: 4 total (1 per time period)
SOURCE COLUMNS:
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
TARGET VARIABLES:
- DGEN_past_5_mean
- DGEN_past_10_mean
- DGEN_fut_5_mean
- DGEN_fut_10_mean
TRANSFORMATION LOGIC:
Each mean = (Pref + Pers + Val) / 3
- NA values are excluded from calculation (na.rm = TRUE)
================================================================================
SCRIPT 05: dataP 05 - recode scales VARS.r
================================================================================
PURPOSE:
Processes two cognitive scales:
1. AOT (Actively Open-minded Thinking): 8-item scale with reverse coding
2. CRT (Cognitive Reflection Test): 3-item test with correct/intuitive scoring
VARIABLES CREATED: 3 total
SOURCE COLUMNS:
AOT Scale:
- aot_1, aot_2, aot_3, aot_4, aot_5, aot_6, aot_7, aot_8
CRT Test:
- crt_1, crt_2, crt_3
TARGET VARIABLES:
- aot_total (mean of 8 items with reverse coding)
- crt_correct (proportion of correct answers)
- crt_int (proportion of intuitive/incorrect answers)
TRANSFORMATION LOGIC:
AOT Scale (aot_total):
1. Items 4, 5, 6, 7 are reverse coded by multiplying by -1
2. Calculate mean of all 8 items (with reverse coding applied)
3. Original source values are NOT modified in the dataframe
4. NA values excluded from calculation (na.rm = TRUE)
CRT Correct (crt_correct):
Correct answers:
- crt_1: "5 cents"
- crt_2: "5 minutes"
- crt_3: "47 days"
Calculation: (Number of correct answers) / (Number of non-missing answers)
CRT Intuitive (crt_int):
Intuitive (common incorrect) answers:
- crt_1: "10 cents"
- crt_2: "100 minutes"
- crt_3: "24 days"
Calculation: (Number of intuitive answers) / (Number of non-missing answers)
SPECIAL NOTES:
- CRT scoring is case-insensitive and trims whitespace
- Both CRT scores are proportions (0.00 to 1.00)
- Empty/missing CRT responses are excluded from denominator
================================================================================
SCRIPT 06: dataP 06 - time interval differences.r
================================================================================
PURPOSE:
Calculates absolute differences between time intervals to measure perceived
change across time periods for all 15 items.
VARIABLES CREATED: 90 total (6 difference types × 15 items)
SOURCE COLUMNS:
- present_pref_read through present_val_justice (15 columns)
- past_5_pref_read through past_5_val_justice (15 columns)
- past_10_pref_read through past_10_val_justice (15 columns)
- fut_5_pref_read through fut_5_val_justice (15 columns)
- fut_10_pref_read through fut_10_val_justice (15 columns)
TARGET VARIABLES (by difference type):
NPast_5 (Present vs Past 5 years) - 15 variables:
Formula: |present - past_5|
- NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap,
NPast_5_pref_travel
- NPast_5_pers_extravert, NPast_5_pers_critical, NPast_5_pers_dependable,
NPast_5_pers_anxious, NPast_5_pers_complex
- NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion,
NPast_5_val_performance, NPast_5_val_justice
NPast_10 (Present vs Past 10 years) - 15 variables:
Formula: |present - past_10|
- NPast_10_pref_read, NPast_10_pref_music, NPast_10_pref_TV,
NPast_10_pref_nap, NPast_10_pref_travel
- NPast_10_pers_extravert, NPast_10_pers_critical, NPast_10_pers_dependable,
NPast_10_pers_anxious, NPast_10_pers_complex
- NPast_10_val_obey, NPast_10_val_trad, NPast_10_val_opinion,
NPast_10_val_performance, NPast_10_val_justice
NFut_5 (Present vs Future 5 years) - 15 variables:
Formula: |present - fut_5|
- NFut_5_pref_read, NFut_5_pref_music, NFut_5_pref_TV, NFut_5_pref_nap,
NFut_5_pref_travel
- NFut_5_pers_extravert, NFut_5_pers_critical, NFut_5_pers_dependable,
NFut_5_pers_anxious, NFut_5_pers_complex
- NFut_5_val_obey, NFut_5_val_trad, NFut_5_val_opinion,
NFut_5_val_performance, NFut_5_val_justice
NFut_10 (Present vs Future 10 years) - 15 variables:
Formula: |present - fut_10|
- NFut_10_pref_read, NFut_10_pref_music, NFut_10_pref_TV, NFut_10_pref_nap,
NFut_10_pref_travel
- NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable,
NFut_10_pers_anxious, NFut_10_pers_complex
- NFut_10_val_obey, NFut_10_val_trad, NFut_10_val_opinion,
NFut_10_val_performance, NFut_10_val_justice
5.10past (Past 5 vs Past 10 years) - 15 variables:
Formula: |past_5 - past_10|
- 5.10past_pref_read, 5.10past_pref_music, 5.10past_pref_TV,
5.10past_pref_nap, 5.10past_pref_travel
- 5.10past_pers_extravert, 5.10past_pers_critical, 5.10past_pers_dependable,
5.10past_pers_anxious, 5.10past_pers_complex
- 5.10past_val_obey, 5.10past_val_trad, 5.10past_val_opinion,
5.10past_val_performance, 5.10past_val_justice
5.10fut (Future 5 vs Future 10 years) - 15 variables:
Formula: |fut_5 - fut_10|
- 5.10fut_pref_read, 5.10fut_pref_music, 5.10fut_pref_TV, 5.10fut_pref_nap,
5.10fut_pref_travel
- 5.10fut_pers_extravert, 5.10fut_pers_critical, 5.10fut_pers_dependable,
5.10fut_pers_anxious, 5.10fut_pers_complex
- 5.10fut_val_obey, 5.10fut_val_trad, 5.10fut_val_opinion,
5.10fut_val_performance, 5.10fut_val_justice
TRANSFORMATION LOGIC:
All calculations use absolute differences:
- NPast_5: |present_[item] - past_5_[item]|
- NPast_10: |present_[item] - past_10_[item]|
- NFut_5: |present_[item] - fut_5_[item]|
- NFut_10: |present_[item] - fut_10_[item]|
- 5.10past: |past_5_[item] - past_10_[item]|
- 5.10fut: |fut_5_[item] - fut_10_[item]|
Result: Always positive values representing magnitude of change
Missing values in either source column result in NA
SPECIAL NOTES:
- Present time uses "pref_tv" (lowercase) while past/future use "pref_TV"
(uppercase), so script handles this naming inconsistency
- All values are absolute differences (non-negative)
================================================================================
SCRIPT 07: dataP 07 - domain means.r
================================================================================
PURPOSE:
Calculates domain-level means by averaging the 5 items within each domain
(Preferences, Personality, Values) for each of the 6 time interval difference
types.
VARIABLES CREATED: 18 total (6 time intervals × 3 domains)
SOURCE COLUMNS:
- NPast_5_pref_read through NPast_5_val_justice (15 columns)
- NPast_10_pref_read through NPast_10_val_justice (15 columns)
- NFut_5_pref_read through NFut_5_val_justice (15 columns)
- NFut_10_pref_read through NFut_10_val_justice (15 columns)
- 5.10past_pref_read through 5.10past_val_justice (15 columns)
- 5.10fut_pref_read through 5.10fut_val_justice (15 columns)
Total: 90 difference columns (created in Script 06)
TARGET VARIABLES:
NPast_5 Domain Means (3 variables):
- NPast_5_pref_MEAN (mean of 5 preference items)
- NPast_5_pers_MEAN (mean of 5 personality items)
- NPast_5_val_MEAN (mean of 5 values items)
NPast_10 Domain Means (3 variables):
- NPast_10_pref_MEAN
- NPast_10_pers_MEAN
- NPast_10_val_MEAN
NFut_5 Domain Means (3 variables):
- NFut_5_pref_MEAN
- NFut_5_pers_MEAN
- NFut_5_val_MEAN
NFut_10 Domain Means (3 variables):
- NFut_10_pref_MEAN
- NFut_10_pers_MEAN
- NFut_10_val_MEAN
5.10past Domain Means (3 variables):
- 5.10past_pref_MEAN
- 5.10past_pers_MEAN
- 5.10past_val_MEAN
5.10fut Domain Means (3 variables):
- 5.10fut_pref_MEAN
- 5.10fut_pers_MEAN
- 5.10fut_val_MEAN
TRANSFORMATION LOGIC:
Each domain mean = average of 5 items within that domain
Example for NPast_5_pref_MEAN:
= mean(NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV,
NPast_5_pref_nap, NPast_5_pref_travel)
Example for NFut_10_pers_MEAN:
= mean(NFut_10_pers_extravert, NFut_10_pers_critical,
NFut_10_pers_dependable, NFut_10_pers_anxious,
NFut_10_pers_complex)
NA values excluded from calculation (na.rm = TRUE)
PURPOSE OF DOMAIN MEANS:
- Provides higher-level summary of perceived change by domain
- Reduces item-level noise by aggregating across related items
- Enables domain-level comparisons across time intervals
- Parallel to Script 04 (DGEN means) but for difference scores instead of
raw DGEN ratings
SPECIAL NOTES:
- This script depends on Script 06 being run first
- Creates domain-level aggregates of absolute difference scores
- All means are averages of non-negative values (absolute differences)
================================================================================
SUMMARY OF ALL CREATED VARIABLES
================================================================================
Total Variables Created: 202
By Script:
- Script 01: 60 variables (past/future recoded items)
- Script 02: 15 variables (present recoded items)
- Script 03: 12 variables (DGEN domain scores)
- Script 04: 4 variables (DGEN means)
- Script 05: 3 variables (AOT & CRT scales)
- Script 06: 90 variables (time interval differences)
- Script 07: 18 variables (domain means for differences)
By Category:
- Time Period Items (75 total):
* Present: 15 items
* Past 5: 15 items
* Past 10: 15 items
* Future 5: 15 items
* Future 10: 15 items
- DGEN Variables (16 total):
* Domain scores: 12 (3 domains × 4 time periods)
* Mean scores: 4 (1 per time period)
- Cognitive Scales (3 total):
* AOT total
* CRT correct
* CRT intuitive
- Time Differences (90 total):
* NPast_5: 15 differences
* NPast_10: 15 differences
* NFut_5: 15 differences
* NFut_10: 15 differences
* 5.10past: 15 differences
* 5.10fut: 15 differences
- Domain Means for Differences (18 total):
* NPast_5: 3 domain means
* NPast_10: 3 domain means
* NFut_5: 3 domain means
* NFut_10: 3 domain means
* 5.10past: 3 domain means
* 5.10fut: 3 domain means
================================================================================
DATA PROCESSING NOTES
================================================================================
1. PROCESSING ORDER:
Scripts MUST be run in numerical order (01 → 07) as later scripts depend
on variables created by earlier scripts.
2. SURVEY VERSION HANDLING:
- Two survey versions (01 and 02) were used
- Scripts 01 and 03 combine these versions
- Preference given to version 01 when both exist
3. MISSING DATA:
- Empty cells and NA values are preserved throughout processing
- Calculations use na.rm=TRUE to exclude missing values from means
- Difference calculations result in NA if either source value is missing
4. QUALITY ASSURANCE:
- Each script includes QA checks with random row verification
- Manual calculation checks confirm proper transformations
- Column existence checks prevent errors from missing source data
5. FILE SAVING:
- Most scripts save directly to eohi2.csv
- Scripts 04, 06, and 07 have commented-out write commands for review
- Each script overwrites existing target columns if present
6. SPECIAL NAMING CONVENTIONS:
- "pref_tv" vs "pref_TV" inconsistency maintained from source data
- DGEN variables use underscores (DGEN_past_5_Pref)
- Difference variables use descriptive prefixes (NPast_5_, 5.10past_)
================================================================================
ITEM REFERENCE GUIDE
================================================================================
15 Core Items (Used across all time periods):
PREFERENCES (5 items):
1. pref_read - Reading preferences
2. pref_music - Music preferences
3. pref_TV/tv - TV watching preferences (note case variation)
4. pref_nap - Napping preferences
5. pref_travel - Travel preferences
PERSONALITY (5 items):
6. pers_extravert - Extraverted personality
7. pers_critical - Critical thinking personality
8. pers_dependable - Dependable personality
9. pers_anxious - Anxious personality
10. pers_complex - Complex personality
VALUES (5 items):
11. val_obey - Value of obedience
12. val_trad - Value of tradition
13. val_opinion - Value of expressing opinions
14. val_performance - Value of performance
15. val_justice - Value of justice
================================================================================
END OF DOCUMENTATION
================================================================================
Last Updated: October 1, 2025