Data Flow Documentation

This file documents the transformation of a dataset from raw collection to final analysis-ready form.


๐Ÿ“Œ Dataset Overview

Field Description
Dataset Name we_employment_survey.csv
Source XLSForm via Kobo Toolbox
Format CSV
Location Jharkhand, India
Collection Date Jan 2025
Uploaded By field_team_1

๐Ÿงน Data Cleaning & Transformation Log

Step No. Date Action Tool Used Notes
1 2025-01-15 Removed duplicates on respondent_id Python Used check_duplicates.py
2 2025-01-16 Recoded education_level to numeric SPSS Added ordinal labels
3 2025-01-17 Merged with wage_2021.csv Stata Merge on respondent_id
4 2025-01-18 Converted wide to long format SPSS Used VARSTOCASES
5 2025-01-19 Created formal_sector from employment Python New binary derived variable

๐Ÿงฎ Derived Variables

Variable Name Description Formula/Source
formal_sector Whether respondent is in formal employment employment_type recode
age_group Age bins: 18โ€“25, 26โ€“35, 36+ Binned from age
edu_level_num Ordinal numeric version of education_level Recoded via SPSS

๐Ÿ“ Final Output

File Name Description
we_employment_survey_clean.csv Cleaned dataset with derived vars
we_employment_survey_long.csv Long-form reshaped dataset

๐Ÿงพ Notes

  • This dataset is now ready for regression and stratified analysis.
  • Use data_validation/ and spss_tools/ references for applied scripts.