Data Flow Documentation
This file documents the transformation of a dataset from raw collection to final analysis-ready form.
๐ Dataset Overview
Field |
Description |
Dataset Name |
we_employment_survey.csv |
Source |
XLSForm via Kobo Toolbox |
Format |
CSV |
Location |
Jharkhand, India |
Collection Date |
Jan 2025 |
Uploaded By |
field_team_1 |
Step No. |
Date |
Action |
Tool Used |
Notes |
1 |
2025-01-15 |
Removed duplicates on respondent_id |
Python |
Used check_duplicates.py |
2 |
2025-01-16 |
Recoded education_level to numeric |
SPSS |
Added ordinal labels |
3 |
2025-01-17 |
Merged with wage_2021.csv |
Stata |
Merge on respondent_id |
4 |
2025-01-18 |
Converted wide to long format |
SPSS |
Used VARSTOCASES |
5 |
2025-01-19 |
Created formal_sector from employment |
Python |
New binary derived variable |
๐งฎ Derived Variables
Variable Name |
Description |
Formula/Source |
formal_sector |
Whether respondent is in formal employment |
employment_type recode |
age_group |
Age bins: 18โ25, 26โ35, 36+ |
Binned from age |
edu_level_num |
Ordinal numeric version of education_level |
Recoded via SPSS |
๐ Final Output
File Name |
Description |
we_employment_survey_clean.csv |
Cleaned dataset with derived vars |
we_employment_survey_long.csv |
Long-form reshaped dataset |
๐งพ Notes
- This dataset is now ready for regression and stratified analysis.
- Use
data_validation/
and spss_tools/
references for applied scripts.