426 lines
16 KiB
Plaintext
426 lines
16 KiB
Plaintext
================================================================================
|
||
EOHI2 DATA PROCESSING PIPELINE - VARIABLE CREATION DOCUMENTATION
|
||
================================================================================
|
||
|
||
This README documents the complete data processing pipeline for eohi2.csv.
|
||
All processing scripts should be run in the order listed below.
|
||
|
||
Source File: eohi2.csv
|
||
Processing Scripts: dataP 01 through dataP 06
|
||
|
||
================================================================================
|
||
SCRIPT 01: dataP 01 - recode and combine past & future vars.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Combines responses from two survey versions (01 and 02) and recodes Likert
|
||
scale text responses to numeric values for past and future time periods.
|
||
|
||
VARIABLES CREATED: 60 total (15 items × 4 time periods)
|
||
|
||
SOURCE COLUMNS:
|
||
- Set A: 01past5PrefItem_1 through 01fut10ValItem_5 (60 columns)
|
||
- Set B: 02past5PrefItem_1 through 02fut10ValItem_5 (60 columns)
|
||
|
||
TARGET VARIABLES:
|
||
Past 5 Years (15 variables):
|
||
- past_5_pref_read, past_5_pref_music, past_5_pref_TV, past_5_pref_nap,
|
||
past_5_pref_travel
|
||
- past_5_pers_extravert, past_5_pers_critical, past_5_pers_dependable,
|
||
past_5_pers_anxious, past_5_pers_complex
|
||
- past_5_val_obey, past_5_val_trad, past_5_val_opinion,
|
||
past_5_val_performance, past_5_val_justice
|
||
|
||
Past 10 Years (15 variables):
|
||
- past_10_pref_read, past_10_pref_music, past_10_pref_TV, past_10_pref_nap,
|
||
past_10_pref_travel
|
||
- past_10_pers_extravert, past_10_pers_critical, past_10_pers_dependable,
|
||
past_10_pers_anxious, past_10_pers_complex
|
||
- past_10_val_obey, past_10_val_trad, past_10_val_opinion,
|
||
past_10_val_performance, past_10_val_justice
|
||
|
||
Future 5 Years (15 variables):
|
||
- fut_5_pref_read, fut_5_pref_music, fut_5_pref_TV, fut_5_pref_nap,
|
||
fut_5_pref_travel
|
||
- fut_5_pers_extravert, fut_5_pers_critical, fut_5_pers_dependable,
|
||
fut_5_pers_anxious, fut_5_pers_complex
|
||
- fut_5_val_obey, fut_5_val_trad, fut_5_val_opinion,
|
||
fut_5_val_performance, fut_5_val_justice
|
||
|
||
Future 10 Years (15 variables):
|
||
- fut_10_pref_read, fut_10_pref_music, fut_10_pref_TV, fut_10_pref_nap,
|
||
fut_10_pref_travel
|
||
- fut_10_pers_extravert, fut_10_pers_critical, fut_10_pers_dependable,
|
||
fut_10_pers_anxious, fut_10_pers_complex
|
||
- fut_10_val_obey, fut_10_val_trad, fut_10_val_opinion,
|
||
fut_10_val_performance, fut_10_val_justice
|
||
|
||
TRANSFORMATION LOGIC:
|
||
Step 1: Combine responses from Set A (01) and Set B (02)
|
||
- If Set A has a value, use Set A
|
||
- If Set A is empty, use Set B
|
||
|
||
Step 2: Recode text responses to numeric values:
|
||
"Strongly Disagree" → -3
|
||
"Disagree" → -2
|
||
"Somewhat Disagree" → -1
|
||
"Neither Agree nor Disagree" → 0
|
||
"Somewhat Agree" → 1
|
||
"Agree" → 2
|
||
"Strongly Agree" → 3
|
||
Empty/Missing → NA
|
||
|
||
ITEM DOMAINS:
|
||
- Preferences (pref): Reading, Music, TV, Nap, Travel
|
||
- Personality (pers): Extravert, Critical, Dependable, Anxious, Complex
|
||
- Values (val): Obey, Tradition, Opinion, Performance, Justice
|
||
|
||
|
||
================================================================================
|
||
SCRIPT 02: dataP 02 - recode present VARS.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Recodes present-time Likert scale text responses to numeric values.
|
||
|
||
VARIABLES CREATED: 15 total
|
||
|
||
SOURCE COLUMNS:
|
||
- prePrefItem_1 through prePrefItem_5 (5 columns)
|
||
- prePersItem_1 through prePersItem_5 (5 columns)
|
||
- preValItem_1 through preValItem_5 (5 columns)
|
||
|
||
TARGET VARIABLES:
|
||
Present Time (15 variables):
|
||
- present_pref_read, present_pref_music, present_pref_tv, present_pref_nap,
|
||
present_pref_travel
|
||
- present_pers_extravert, present_pers_critical, present_pers_dependable,
|
||
present_pers_anxious, present_pers_complex
|
||
- present_val_obey, present_val_trad, present_val_opinion,
|
||
present_val_performance, present_val_justice
|
||
|
||
TRANSFORMATION LOGIC:
|
||
Recode text responses to numeric values:
|
||
"Strongly Disagree" → -3
|
||
"Disagree" → -2
|
||
"Somewhat Disagree" → -1
|
||
"Neither Agree nor Disagree" → 0
|
||
"Somewhat Agree" → 1
|
||
"Agree" → 2
|
||
"Strongly Agree" → 3
|
||
Empty/Missing → NA
|
||
|
||
SPECIAL NOTE:
|
||
Present time uses "present_pref_tv" (lowercase) while past/future use
|
||
"past_5_pref_TV" (uppercase). This is intentional and preserved from the
|
||
original data structure.
|
||
|
||
|
||
================================================================================
|
||
SCRIPT 03: dataP 03 - recode DGEN vars.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Combines DGEN (domain general) responses from two survey versions (01 and 02).
|
||
These are single-item measures for each domain/time combination.
|
||
NO RECODING - just copies numeric values as-is.
|
||
|
||
VARIABLES CREATED: 12 total (3 domains × 4 time periods)
|
||
|
||
SOURCE COLUMNS:
|
||
- Set A: 01past5PrefDGEN_1, 01past5PersDGEN_1, 01past5ValDGEN_1, etc.
|
||
- Set B: 02past5PrefDGEN_1, 02past5PersDGEN_1, 02past5ValDGEN_1, etc.
|
||
|
||
TARGET VARIABLES:
|
||
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
|
||
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
|
||
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
|
||
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
|
||
|
||
TRANSFORMATION LOGIC:
|
||
- If Set A (01) has a value, use Set A
|
||
- If Set A is empty, use Set B (02)
|
||
- NO RECODING: Values are copied directly as numeric
|
||
|
||
SPECIAL NOTES:
|
||
- Future columns in raw data use "_8" suffix for Pref/Pers items
|
||
- Future Val columns use "ValuesDGEN" spelling in Set A, "ValDGEN" in Set B
|
||
|
||
|
||
================================================================================
|
||
SCRIPT 04: dataP 04 - DGEN means.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Calculates mean DGEN scores by averaging the three domain scores (Preferences,
|
||
Personality, Values) for each time period.
|
||
|
||
VARIABLES CREATED: 4 total (1 per time period)
|
||
|
||
SOURCE COLUMNS:
|
||
- DGEN_past_5_Pref, DGEN_past_5_Pers, DGEN_past_5_Val
|
||
- DGEN_past_10_Pref, DGEN_past_10_Pers, DGEN_past_10_Val
|
||
- DGEN_fut_5_Pref, DGEN_fut_5_Pers, DGEN_fut_5_Val
|
||
- DGEN_fut_10_Pref, DGEN_fut_10_Pers, DGEN_fut_10_Val
|
||
|
||
TARGET VARIABLES:
|
||
- DGEN_past_5_mean
|
||
- DGEN_past_10_mean
|
||
- DGEN_fut_5_mean
|
||
- DGEN_fut_10_mean
|
||
|
||
TRANSFORMATION LOGIC:
|
||
Each mean = (Pref + Pers + Val) / 3
|
||
- NA values are excluded from calculation (na.rm = TRUE)
|
||
|
||
|
||
================================================================================
|
||
SCRIPT 05: dataP 05 - recode scales VARS.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Processes two cognitive scales:
|
||
1. AOT (Actively Open-minded Thinking): 8-item scale with reverse coding
|
||
2. CRT (Cognitive Reflection Test): 3-item test with correct/intuitive scoring
|
||
|
||
VARIABLES CREATED: 3 total
|
||
|
||
SOURCE COLUMNS:
|
||
AOT Scale:
|
||
- aot_1, aot_2, aot_3, aot_4, aot_5, aot_6, aot_7, aot_8
|
||
|
||
CRT Test:
|
||
- crt_1, crt_2, crt_3
|
||
|
||
TARGET VARIABLES:
|
||
- aot_total (mean of 8 items with reverse coding)
|
||
- crt_correct (proportion of correct answers)
|
||
- crt_int (proportion of intuitive/incorrect answers)
|
||
|
||
TRANSFORMATION LOGIC:
|
||
|
||
AOT Scale (aot_total):
|
||
1. Items 4, 5, 6, 7 are reverse coded by multiplying by -1
|
||
2. Calculate mean of all 8 items (with reverse coding applied)
|
||
3. Original source values are NOT modified in the dataframe
|
||
4. NA values excluded from calculation (na.rm = TRUE)
|
||
|
||
CRT Correct (crt_correct):
|
||
Correct answers:
|
||
- crt_1: "5 cents"
|
||
- crt_2: "5 minutes"
|
||
- crt_3: "47 days"
|
||
Calculation: (Number of correct answers) / (Number of non-missing answers)
|
||
|
||
CRT Intuitive (crt_int):
|
||
Intuitive (common incorrect) answers:
|
||
- crt_1: "10 cents"
|
||
- crt_2: "100 minutes"
|
||
- crt_3: "24 days"
|
||
Calculation: (Number of intuitive answers) / (Number of non-missing answers)
|
||
|
||
SPECIAL NOTES:
|
||
- CRT scoring is case-insensitive and trims whitespace
|
||
- Both CRT scores are proportions (0.00 to 1.00)
|
||
- Empty/missing CRT responses are excluded from denominator
|
||
|
||
|
||
================================================================================
|
||
SCRIPT 06: dataP 06 - time interval differences.r
|
||
================================================================================
|
||
|
||
PURPOSE:
|
||
Calculates absolute differences between time intervals to measure perceived
|
||
change across time periods for all 15 items.
|
||
|
||
VARIABLES CREATED: 90 total (6 difference types × 15 items)
|
||
|
||
SOURCE COLUMNS:
|
||
- present_pref_read through present_val_justice (15 columns)
|
||
- past_5_pref_read through past_5_val_justice (15 columns)
|
||
- past_10_pref_read through past_10_val_justice (15 columns)
|
||
- fut_5_pref_read through fut_5_val_justice (15 columns)
|
||
- fut_10_pref_read through fut_10_val_justice (15 columns)
|
||
|
||
TARGET VARIABLES (by difference type):
|
||
|
||
NPast_5 (Present vs Past 5 years) - 15 variables:
|
||
Formula: |present - past_5|
|
||
- NPast_5_pref_read, NPast_5_pref_music, NPast_5_pref_TV, NPast_5_pref_nap,
|
||
NPast_5_pref_travel
|
||
- NPast_5_pers_extravert, NPast_5_pers_critical, NPast_5_pers_dependable,
|
||
NPast_5_pers_anxious, NPast_5_pers_complex
|
||
- NPast_5_val_obey, NPast_5_val_trad, NPast_5_val_opinion,
|
||
NPast_5_val_performance, NPast_5_val_justice
|
||
|
||
NPast_10 (Present vs Past 10 years) - 15 variables:
|
||
Formula: |present - past_10|
|
||
- NPast_10_pref_read, NPast_10_pref_music, NPast_10_pref_TV,
|
||
NPast_10_pref_nap, NPast_10_pref_travel
|
||
- NPast_10_pers_extravert, NPast_10_pers_critical, NPast_10_pers_dependable,
|
||
NPast_10_pers_anxious, NPast_10_pers_complex
|
||
- NPast_10_val_obey, NPast_10_val_trad, NPast_10_val_opinion,
|
||
NPast_10_val_performance, NPast_10_val_justice
|
||
|
||
NFut_5 (Present vs Future 5 years) - 15 variables:
|
||
Formula: |present - fut_5|
|
||
- NFut_5_pref_read, NFut_5_pref_music, NFut_5_pref_TV, NFut_5_pref_nap,
|
||
NFut_5_pref_travel
|
||
- NFut_5_pers_extravert, NFut_5_pers_critical, NFut_5_pers_dependable,
|
||
NFut_5_pers_anxious, NFut_5_pers_complex
|
||
- NFut_5_val_obey, NFut_5_val_trad, NFut_5_val_opinion,
|
||
NFut_5_val_performance, NFut_5_val_justice
|
||
|
||
NFut_10 (Present vs Future 10 years) - 15 variables:
|
||
Formula: |present - fut_10|
|
||
- NFut_10_pref_read, NFut_10_pref_music, NFut_10_pref_TV, NFut_10_pref_nap,
|
||
NFut_10_pref_travel
|
||
- NFut_10_pers_extravert, NFut_10_pers_critical, NFut_10_pers_dependable,
|
||
NFut_10_pers_anxious, NFut_10_pers_complex
|
||
- NFut_10_val_obey, NFut_10_val_trad, NFut_10_val_opinion,
|
||
NFut_10_val_performance, NFut_10_val_justice
|
||
|
||
5.10past (Past 5 vs Past 10 years) - 15 variables:
|
||
Formula: |past_5 - past_10|
|
||
- 5.10past_pref_read, 5.10past_pref_music, 5.10past_pref_TV,
|
||
5.10past_pref_nap, 5.10past_pref_travel
|
||
- 5.10past_pers_extravert, 5.10past_pers_critical, 5.10past_pers_dependable,
|
||
5.10past_pers_anxious, 5.10past_pers_complex
|
||
- 5.10past_val_obey, 5.10past_val_trad, 5.10past_val_opinion,
|
||
5.10past_val_performance, 5.10past_val_justice
|
||
|
||
5.10fut (Future 5 vs Future 10 years) - 15 variables:
|
||
Formula: |fut_5 - fut_10|
|
||
- 5.10fut_pref_read, 5.10fut_pref_music, 5.10fut_pref_TV, 5.10fut_pref_nap,
|
||
5.10fut_pref_travel
|
||
- 5.10fut_pers_extravert, 5.10fut_pers_critical, 5.10fut_pers_dependable,
|
||
5.10fut_pers_anxious, 5.10fut_pers_complex
|
||
- 5.10fut_val_obey, 5.10fut_val_trad, 5.10fut_val_opinion,
|
||
5.10fut_val_performance, 5.10fut_val_justice
|
||
|
||
TRANSFORMATION LOGIC:
|
||
All calculations use absolute differences:
|
||
- NPast_5: |present_[item] - past_5_[item]|
|
||
- NPast_10: |present_[item] - past_10_[item]|
|
||
- NFut_5: |present_[item] - fut_5_[item]|
|
||
- NFut_10: |present_[item] - fut_10_[item]|
|
||
- 5.10past: |past_5_[item] - past_10_[item]|
|
||
- 5.10fut: |fut_5_[item] - fut_10_[item]|
|
||
|
||
Result: Always positive values representing magnitude of change
|
||
Missing values in either source column result in NA
|
||
|
||
SPECIAL NOTES:
|
||
- Present time uses "pref_tv" (lowercase) while past/future use "pref_TV"
|
||
(uppercase), so script handles this naming inconsistency
|
||
- All values are absolute differences (non-negative)
|
||
|
||
|
||
================================================================================
|
||
SUMMARY OF ALL CREATED VARIABLES
|
||
================================================================================
|
||
|
||
Total Variables Created: 184
|
||
|
||
By Script:
|
||
- Script 01: 60 variables (past/future recoded items)
|
||
- Script 02: 15 variables (present recoded items)
|
||
- Script 03: 12 variables (DGEN domain scores)
|
||
- Script 04: 4 variables (DGEN means)
|
||
- Script 05: 3 variables (AOT & CRT scales)
|
||
- Script 06: 90 variables (time interval differences)
|
||
|
||
By Category:
|
||
- Time Period Items (75 total):
|
||
* Present: 15 items
|
||
* Past 5: 15 items
|
||
* Past 10: 15 items
|
||
* Future 5: 15 items
|
||
* Future 10: 15 items
|
||
|
||
- DGEN Variables (16 total):
|
||
* Domain scores: 12 (3 domains × 4 time periods)
|
||
* Mean scores: 4 (1 per time period)
|
||
|
||
- Cognitive Scales (3 total):
|
||
* AOT total
|
||
* CRT correct
|
||
* CRT intuitive
|
||
|
||
- Time Differences (90 total):
|
||
* NPast_5: 15 differences
|
||
* NPast_10: 15 differences
|
||
* NFut_5: 15 differences
|
||
* NFut_10: 15 differences
|
||
* 5.10past: 15 differences
|
||
* 5.10fut: 15 differences
|
||
|
||
|
||
================================================================================
|
||
DATA PROCESSING NOTES
|
||
================================================================================
|
||
|
||
1. PROCESSING ORDER:
|
||
Scripts MUST be run in numerical order (01 → 06) as later scripts depend
|
||
on variables created by earlier scripts.
|
||
|
||
2. SURVEY VERSION HANDLING:
|
||
- Two survey versions (01 and 02) were used
|
||
- Scripts 01 and 03 combine these versions
|
||
- Preference given to version 01 when both exist
|
||
|
||
3. MISSING DATA:
|
||
- Empty cells and NA values are preserved throughout processing
|
||
- Calculations use na.rm=TRUE to exclude missing values from means
|
||
- Difference calculations result in NA if either source value is missing
|
||
|
||
4. QUALITY ASSURANCE:
|
||
- Each script includes QA checks with random row verification
|
||
- Manual calculation checks confirm proper transformations
|
||
- Column existence checks prevent errors from missing source data
|
||
|
||
5. FILE SAVING:
|
||
- Most scripts save directly to eohi2.csv
|
||
- Scripts 04 and 06 have commented-out write commands for review
|
||
- Each script overwrites existing target columns if present
|
||
|
||
6. SPECIAL NAMING CONVENTIONS:
|
||
- "pref_tv" vs "pref_TV" inconsistency maintained from source data
|
||
- DGEN variables use underscores (DGEN_past_5_Pref)
|
||
- Difference variables use descriptive prefixes (NPast_5_, 5.10past_)
|
||
|
||
|
||
================================================================================
|
||
ITEM REFERENCE GUIDE
|
||
================================================================================
|
||
|
||
15 Core Items (Used across all time periods):
|
||
|
||
PREFERENCES (5 items):
|
||
1. pref_read - Reading preferences
|
||
2. pref_music - Music preferences
|
||
3. pref_TV/tv - TV watching preferences (note case variation)
|
||
4. pref_nap - Napping preferences
|
||
5. pref_travel - Travel preferences
|
||
|
||
PERSONALITY (5 items):
|
||
6. pers_extravert - Extraverted personality
|
||
7. pers_critical - Critical thinking personality
|
||
8. pers_dependable - Dependable personality
|
||
9. pers_anxious - Anxious personality
|
||
10. pers_complex - Complex personality
|
||
|
||
VALUES (5 items):
|
||
11. val_obey - Value of obedience
|
||
12. val_trad - Value of tradition
|
||
13. val_opinion - Value of expressing opinions
|
||
14. val_performance - Value of performance
|
||
15. val_justice - Value of justice
|
||
|
||
|
||
================================================================================
|
||
END OF DOCUMENTATION
|
||
================================================================================
|
||
Last Updated: October 1, 2025
|
||
|