DevData Practice — Realistic Datasets for Development Economics

Household Survey (LSMS-style)

~75k rows

Structure

One row per household member. ~15,000 households with 2-8 members each, expanded to ~75k individual-level rows. Household-level variables are repeated across members.

Key Columns

individual_id, household_id — Unique identifiers
country, district, urban — Geography
relationship, age, female, education_years — Demographics
monthly_pce_usd — Monthly per-capita expenditure (USD PPP, log-normal)
food_share — Engel curve: food expenditure as share of total
food_insecurity_score — FIES-like 0-8 scale
owns_radio through owns_improved_stove — 8 binary asset indicators
wall_material, rooms, water_source, toilet_type — Housing
life_satisfaction — Subjective well-being (1-10)

Realistic Features

Latent wealth factor drives correlated consumption, assets, and housing
Engel curve: food share declines with wealth (0.70 base, -0.08 per SD)
Urban premium on wealth (+0.6 SD)
MNAR missingness: richer households less likely to report income

RCT Experiment

25k rows

Structure

One row per participant. 4 arms: control, cash transfer, cash + training, training only. Stratified by district × gender.

Key Columns

treatment_arm — Randomized assignment
actually_treated — Compliance indicator (65-85% take-up)
baseline_consumption_usd, endline_consumption_usd — Primary outcomes
baseline_food_insecurity, endline_food_insecurity — HFIAS 0-27
attrited — Endline attrition (differential by arm)
spillover_risk — Flag for control units in treated villages

Embedded Effects

Cash: +15%, Cash+Training: +22%, Training: +8% consumption increase
Heterogeneous effects: larger for women and poorer baseline
Attrition: 8% base + 3% higher in control + rural premium

Panel Data (WDI-style)

625 rows

Structure

Balanced panel: 25 developing countries × 25 years (2000-2024). 20+ indicators per country-year.

Key Columns

gdp_per_capita_usd, log_gdp_per_capita — Income
life_expectancy, infant_mortality_per_1000, under5_mortality_per_1000 — Health
primary_enrollment_pct, secondary_enrollment_pct, adult_literacy_pct — Education
fertility_rate, electricity_access_pct, sanitation_access_pct — Development
poverty_headcount_215, gini_coefficient — Welfare (sparse)

Realistic Features

AR(1) shock persistence + country-specific growth trends
COVID-2020 GDP shock (-8%) and 2021 partial recovery
WDI-like missingness: poverty and Gini ~30% missing, GDP near-complete

Agricultural Survey

~31k rows

Structure

One row per plot. ~15k farm households with 1-4 plots each. 7 crops, full input-output accounting.

Key Columns

plot_size_acres, crop, soil_quality — Plot characteristics
improved_seed, fertilizer_kg, irrigation, pesticide_used — Inputs
rainfall_deviation_sd — Weather shock (SD from normal)
harvest_kg, price_per_kg_usd, revenue_usd, profit_usd — Outputs
extension_contact, distance_to_market_km — Access

Production Function

Yields follow a Cobb-Douglas: Y = A · L^0.50 · Lab^0.25 · F^0.15 with TFP shifters for improved seed (+15%), irrigation (+10%), and soil quality. Rainfall has an inverted-U effect.

Health & Nutrition (DHS-style)

35k rows

Structure

One row per child under 5 with linked mother characteristics. Wealth quintile drives most health gradients.

Key Columns

height_for_age_z, weight_for_age_z, weight_for_height_z — WHO z-scores
stunted, underweight, wasted — Binary flags (z < -2)
bcg_vaccine, dpt1_vaccine, dpt3_vaccine, measles_vaccine, fully_vaccinated
anc_visits, facility_delivery, skilled_birth_attendant — Maternal health
diarrhea_2wk, fever_2wk, cough_2wk, sought_treatment — Morbidity

Realistic Features

Growth faltering after 6 months (HAZ declines with age)
Age-appropriate vaccination (can't get measles before 9 months)
Age heaping at 6, 12, 24, 36, 48 months (interviewer rounding)
Birth weight MAR: heavier babies more likely to be weighed

Education Outcomes

~30k rows

Structure

Students nested in 500 schools (grades 3-8). Multi-level data for HLM analysis.

Key Columns

math_score, reading_score — 0-100 test scores
attendance_rate, distance_to_school_km — Student access
ses_score, school_type, school_meal_program — SES & resources
pupil_teacher_ratio, pct_trained_teachers, has_library — School quality
repeated_grade, dropped_out — Persistence

Embedded Effects

School random effect (ICC ~0.20): 20% of score variance is between-school
Girls outperform in reading (+2 pts), boys in math (+2 pts)
Dropout students have missing endline scores (structural missingness)

Labor Market

40k rows

Structure

Working-age adults (15-64) with employment status, wages, sector, and social protection.

Wage Equation (Mincer)

ln(wage) = 1.8 + 0.08·educ + 0.04·exp - 0.0006·exp² - 0.18·female + 0.25·urban + 0.35·formal + ε

8% return to education, concave experience profile
18% gender wage gap (suitable for Oaxaca-Blinder decomposition)
35% formality premium
Employment: wage, self-employed, unpaid family, unemployed
Migration and remittances with realistic amounts

Microfinance

30k rows

Structure

Loan-level records from a microfinance institution. Group and individual lending products.

Key Columns

loan_amount_usd, interest_rate_annual_pct, term_months — Terms
loan_product, loan_purpose — Product classification
cycle_number — Repeat borrower indicator (graduation)
defaulted, days_past_due, repayment_rate — Performance
internal_credit_score — 300-850 score

Default Model

Default probability via logistic: higher for larger loans, consumption purpose, first-cycle borrowers, uncollateralized. Women default less. Repeat borrowers graduate to larger amounts.

Program Targeting (PMT)

20k rows

Structure

Household-level data with both true consumption (survey) and PMT-predicted consumption. Built for targeting accuracy analysis.

Key Columns

true_monthly_pce_usd — Actual per-capita consumption (with residual noise)
pmt_predicted_pce_usd — Fitted values from proxy means formula
truly_poor, pmt_classified_poor — Binary poverty flags
exclusion_error, inclusion_error — Targeting mistakes
community_selected — Community-based targeting comparison
monthly_benefit_usd — Calculated transfer amount

Why It's Useful

The gap between true and predicted consumption creates realistic targeting errors. Students can compare PMT, community, and categorical targeting; compute leakage and undercoverage; and simulate benefit reforms.

Trade & Market Prices

~67k rows

Structure

80 markets × 104 weeks × 8 staple commodities. 2 years of weekly price data across 16 countries.

Key Columns

market, country, date, commodity — Identifiers
price_per_kg_usd — Commodity price with seasonality
volume_traded_kg — Market volume
transport_cost_pct — Distance-based cost wedge

Price Dynamics

Seasonal cycles: lean-season highs, post-harvest lows
AR(1) persistence in price levels
Spatial correlation: nearby markets move together
Border effect: +8% premium for cross-country market pairs
Suitable for cointegration / market integration analysis

Gender Programme

25k rows

Structure

One row per woman/girl. 25,000 individuals across programme and non-programme areas with empowerment, GBV, and SRH indicators.

Key Columns

individual_id, household_id, country, district — Identifiers
programme_participant, programme_type — Cash transfers, skills training, savings groups, awareness, legal aid
decides_own_healthcare through decides_own_earnings — 5 WEAI decision-making domains
decision_making_score, empowerment_index — Composite empowerment scores
owns_land, owns_house, has_bank_account, has_mobile_money — Economic empowerment
care_work_hours_day, productive_work_hours_day, leisure_hours_day — Time use diary
reported_physical_gbv, reported_emotional_gbv, reported_economic_gbv — GBV with underreporting
using_modern_contraception, unmet_need_family_planning — SRH indicators

Realistic Features

WEAI-like empowerment domains with intra-household bargaining
GBV prevalence modelled with 40–60% underreporting of true cases
Time-use data summing to realistic daily totals
Programme effects vary by type (cash vs. training vs. awareness)

Girls’ Education

20k rows

Structure

One row per girl (grades 1–12). Tracks enrollment, learning outcomes, safety, menstrual hygiene, and dropout barriers.

Key Columns

girl_id, grade, age, wealth_quintile — Identifiers & demographics
enrolled, attendance_rate, dropped_out — Enrollment status
math_score, literacy_score — Learning outcomes (0–100)
barrier_marriage, barrier_pregnancy, barrier_cost, barrier_distance — Dropout barriers
has_menstruated, mhm_knowledge, has_sanitary_products, missed_school_menstruation — MHM
srgbv_experienced, feels_safe_route_to_school — Safety indicators
receives_scholarship, receives_school_meals — Programme support
at_primary_secondary_transition, transitioned_to_secondary — Transition tracking

Realistic Features

Gendered dropout barriers: early marriage, pregnancy, household chores, cost, distance
MHM affects attendance — girls without sanitary products miss more school
School-related GBV (SRGBV) linked to dropout
Primary–secondary transition rates driven by scholarships and parental attitudes

Climate & Resilience

20k rows

Structure

One row per household. Climate shock exposure, coping strategies, adaptation practices, and RIMA-like resilience indices.

Key Columns

agroecological_zone — Arid, semi-arid, sub-humid, humid, highland
experienced_drought, experienced_flood, experienced_cyclone, experienced_pest_outbreak — Shock exposure
crop_loss_pct, livestock_loss_pct, income_loss_pct — Shock losses
cs_reduced_meals, cs_sold_assets, cs_borrowed_money, cs_migration — Coping strategies
adopted_drought_resistant_crop, adopted_irrigation, has_crop_insurance — Adaptation
absorptive_capacity, adaptive_capacity, transformative_capacity — RIMA pillars
resilience_index — Composite resilience score
carbon_footprint_tco2_yr — Household emissions proxy

Realistic Features

Shock exposure varies by agro-ecological zone (drought in arid, floods in humid)
RIMA-like resilience measurement: absorptive, adaptive, transformative pillars
Coping strategy severity ladder from consumption smoothing to asset depletion
Early warning system access improves preparedness outcomes

Agriculture & Value Chain

25k rows

Structure

One row per transaction across a 4-node value chain: producer → aggregator → processor → retailer. 8 commodities.

Key Columns

chain_node — Position in value chain (producer/aggregator/processor/retailer)
commodity — Maize, coffee, dairy, poultry, horticulture, rice, groundnuts, cassava
quality_grade — A/B/C affecting price premiums
volume_kg, price_per_kg_usd, revenue_usd — Transaction values
total_cost_usd, margin_usd, margin_pct — Profitability
post_harvest_loss_pct — Loss rates by chain node
in_cooperative, has_contract, has_certification — Market linkage
buyer_type, season — Market context

Realistic Features

Value addition markups: 15% aggregator, 45% processor, 80% retailer
Quality grading premiums: A = +25%, C = −25%
Post-harvest losses decline along the chain (30% farm → 8% retail)
Contract farming and cooperative membership yield price premiums

Animal Welfare

15k rows

Structure

One row per animal/household. Covers livestock, working animals, and companion animals with Five Freedoms welfare assessment.

Key Columns

animal_type — Cattle, goats, sheep, poultry, donkey, horse, pig, camel, dog, cat
freedom_hunger_thirst through freedom_fear_distress — Five Freedoms (1–5 each)
welfare_score_avg, body_condition_score, shelter_score — Composite welfare
distance_to_vet_km, accessed_vet_last_year, vaccinated, dewormed — Veterinary access
working_hours_daily, working_has_wounds, working_proper_harness — Working animal indicators
companion_rabies_vaccinated, companion_sterilized — Companion animal health
hh_consumes_animal_source_food — Nutrition linkage

Realistic Features

Five Freedoms framework with inter-correlated domain scores
Body Condition Score (1–5) driven by wealth and training
Working animal data (donkeys, horses, camels) with wound prevalence and harness quality
Rabies vaccination coverage for companion animals

Public Health & Epidemiology

20k rows

Structure

One row per individual. Disease surveillance, health facility use, insurance, NCDs, mental health (PHQ-9), and COVID-19 vaccination.

Key Columns

malaria_tested, malaria_rdt_positive — Malaria RDT cascade
tb_ever_diagnosed, tb_on_treatment — TB cascade
hiv_tested_ever, hiv_positive, hiv_on_art — HIV cascade
facility_visits_12m, oop_health_spending_usd, catastrophic_health_expenditure — Utilization
chw_contact_6m, chw_referred, chw_referral_completed — CHW referral cascade
health_insurance_type — None, CBHI, NHIF, private, employer
hypertension_diagnosed, hypertension_controlled, diabetes_diagnosed — NCD screening
phq9_score, depression_moderate, depression_severe — Mental health
covid_vaccine_doses — 0/1/2/booster doses with wealth gradient

Realistic Features

Disease cascades (tested → diagnosed → treated) with realistic drop-offs
Catastrophic health expenditure flag (>10% of household consumption)
PHQ-9 depression score (0–27) correlated with poverty and gender
CHW referral completion rates driven by distance and wealth

Livelihoods & Economic Strengthening

20k rows

Structure

One row per household. Three treatment arms (control, livelihoods only, livelihoods + savings). Covers income, VSLA, training, assets, and food security.

Key Columns

treatment_arm — Control, livelihoods_only, livelihoods_plus_savings
primary_income_source, n_income_sources, income_diversification_index — Income
owns_enterprise, enterprise_type, enterprise_monthly_revenue_usd — Enterprise
vsla_member, vsla_savings_usd, vsla_shareout_usd, vsla_loan_usd — Savings groups
received_vocational_training, training_type, completed_apprenticeship — Skills
baseline_asset_index, endline_asset_index — 7-asset accumulation
food_consumption_score, fcs_category, reduced_coping_strategies_index — Food security
has_mobile_money, has_bank_account, accessed_credit_12m — Financial inclusion

Realistic Features

Shannon-like income diversification index
VSLA cycle: savings → share-out → borrowing with realistic interest
Baseline–endline asset change driven by treatment arm
Youth (15–35) employment, NEET, and training indicators

Advocacy, Rights & Legal Empowerment

15k rows

Structure

One row per individual. Legal identity, land tenure, dispute resolution, rights awareness, civic participation, and advocacy campaigns.

Key Columns

has_birth_certificate, has_national_id — Legal identity documentation
owns_land, has_land_title, land_dispute_experienced — Land tenure security
rights_awareness_score, knows_cedaw, knows_child_rights — Rights knowledge
experienced_dispute, dispute_type, resolution_mechanism, dispute_resolved — Justice
barrier_cost, barrier_distance, barrier_fear, barrier_distrust — Justice barriers
voted_last_election, attended_community_meeting, feels_can_influence_decisions — Civic participation
exposed_to_advocacy_campaign, campaign_channel, changed_behavior_post_campaign — Campaigns

Realistic Features

Legal identity gaps driven by wealth and rural status
Dispute resolution pathways: formal courts, customary leaders, legal aid, mediation
Justice barriers — cost, distance, fear, distrust — vary by gender and wealth
Advocacy campaign reach and self-reported behavior change

WASH (Water, Sanitation & Hygiene)

18k rows

Structure

One row per household. JMP service ladders for water, sanitation, and hygiene. Water quality testing, CLTS, MHM, and school WASH.

Key Columns

water_source, water_improved, jmp_water_service_level — JMP water ladder
ecoli_cfu_100ml, ecoli_risk_category, turbidity_ntu — Water quality testing
liters_per_person_day, sufficient_water_15lpd — Water quantity (Sphere)
sanitation_facility, jmp_sanitation_service_level, open_defecation — JMP sanitation
clts_triggered, community_odf_declared, odf_slippage — CLTS programme
hw_water_and_soap, hygiene_service_level — Observed handwashing
mhm_private_space, mhm_materials_available — Menstrual hygiene
child_diarrhea_2wk, diarrhea_ors_used, diarrhea_zinc_used — Child health
school_separate_toilets_girls, school_has_mhm_facility, school_pupil_toilet_ratio — School WASH

Realistic Features

JMP service ladders: safely managed → basic → limited → unimproved → surface water / open defecation
E. coli and turbidity correlated with source type
Handwashing observation vs. self-reported discrepancy (social desirability bias)
Child diarrhoea prevalence linked to WASH conditions via logistic model
CLTS triggering → ODF declaration → verification → slippage cascade

Humanitarian & Disaster Response

18k rows

Structure

One row per individual with SADD (Sex and Age Disaggregated Data). Displacement, multi-sector needs, aid distribution, protection, and accountability.

Key Columns

displacement_status — IDP, refugee, returnee, host community
crisis_type, months_displaced, times_displaced — Displacement profile
need_food through need_livelihoods — 7 sector needs scored 0–5 (JIAF-like)
overall_severity, people_in_need — Composite need assessment
meets_sphere_water, meets_sphere_shelter, meets_sphere_food — Sphere standards
received_aid, aid_modality, aid_amount_usd — Aid distribution
gbv_risk_reported, child_protection_concern, mine_uxo_awareness — Protection
knows_feedback_mechanism, filed_complaint, complaint_resolved — Accountability
movement_intention — Stay, return, relocate, seek asylum, undecided

Realistic Features

SADD throughout: 5 standard age groups with sex disaggregation
Multi-sector severity scoring following JIAF methodology
Sphere minimum standards compliance (water 15 L/p/d, shelter 3.5 m²/p, food 2100 kcal/p/d)
Communication with Communities (CwC): information access, preferred channels, feedback loops
Vulnerability markers: unaccompanied minors, pregnant/lactating, disability, elderly alone

Social Protection & Cash Transfers

20k rows

Structure

One row per beneficiary household. Programme registry with transfer tracking, conditionality compliance, and graduation model.

Key Columns

programme_type — Unconditional cash, conditional cash, public works, cash plus, food vouchers, school feeding
transfer_modality — Mobile money, cash-in-hand, bank transfer, voucher, in-kind
monthly_transfer_usd, total_received_usd, pct_payments_received — Transfer tracking
has_conditionality, conditionality_compliant — Compliance monitoring
baseline_consumption_usd, endline_consumption_usd — Impact measurement
fcs_baseline, fcs_endline, rcsi_baseline, rcsi_endline — Food security change
graduation_score, graduated, would_cope_without_transfer — Graduation model

Realistic Features

Graduation model with thresholds on food security, assets, and savings
Payment regularity metrics (% received, delays) affecting outcomes
Baseline → endline change in consumption, food security, and asset accumulation
Dependency indicator: “would cope without transfer”

Governance & Accountability

15k rows

Structure

One row per citizen. Service delivery satisfaction, trust in institutions, corruption experience, budget transparency, and social accountability.

Key Columns

satisfaction_health through satisfaction_police, overall_service_satisfaction — Service delivery (1–5)
trust_local_govt through trust_ngos — Institutional trust (1–5)
bribery_experience, bribery_context, bribe_amount_usd, reported_bribery — Corruption
aware_of_local_budget, budget_literacy_score — Budget transparency
in_social_accountability_prog, attended_scorecard_session — Programme participation
scorecard_health, scorecard_education, scorecard_water — Community scorecards (0–100)
knows_rti_law, gets_info_radio, gets_info_social_media — Information access

Realistic Features

Bribery reporting rate very low (8–13%) — realistic underreporting
Social accountability programme effects on trust and budget awareness
Community scorecard scores varying across service sectors
MNAR missingness: bribe amounts more likely missing for larger bribes

Behaviour Change (KAP)

20k rows

Structure

One row per individual. Knowledge-Attitude-Practice survey for health, WASH, and nutrition behaviours with campaign exposure tracking.

Key Columns

exposed_to_campaign, campaign_type, campaign_doses — Campaign exposure (radio, community drama, peer education, SMS, poster, social media)
knowledge_score, knows_handwashing_times, knows_ors_for_diarrhea, knows_exclusive_breastfeeding — Knowledge (0–10)
attitude_score, approves_family_planning, gender_equitable_attitude, stigma_hiv — Attitudes (0–10)
practice_score, practices_handwashing, uses_treated_water, uses_mosquito_net — Practice (0–10)
knowledge_practice_gap, attitude_practice_gap — KAP cascade gaps
self_efficacy_score, perceives_community_support, discussed_with_peers — Social norms

Realistic Features

KAP cascade: campaigns improve knowledge > attitudes > practice (realistic drop-off)
Campaign dose-response: more exposures yield stronger effects
Self-efficacy mediates the attitude–practice gap

Cost-Effectiveness Analysis

15k rows

Structure

One row per programme. Cost breakdowns, beneficiary data, outcomes, and CEA metrics across health, education, nutrition, WASH, livelihoods, and social protection sectors.

Key Columns

sector, programme_type, implementer_type — Programme classification
total_cost_usd, personnel_cost_usd, materials_cost_usd, overhead_cost_usd — Cost breakdown
cost_per_beneficiary_usd, cost_per_outcome_usd, overhead_ratio, personnel_ratio — Cost ratios
effect_size, icer, daly_averted, qaly_gained — CEA metrics
is_pilot — Pilot vs. at-scale with economies of scale

Realistic Features

Lognormal cost distributions; personnel 40–65%, overhead 8–25%
DALYs and QALYs only for health sector; ICER for all
Economies of scale: at-scale programmes have lower unit costs

Decent Work (ILO)

25k rows

Structure

One row per worker. ILO decent work framework covering formal/informal employment, earnings, social protection, working conditions, freedom of association, and the informal economy.

Key Columns

employment_status — Formal wage, informal wage, self-employed, casual daily, unpaid family
monthly_earnings_usd, hourly_wage_usd, below_minimum_wage — Earnings with embedded gender gap (~0.82 ratio)
has_written_contract, has_social_security, has_health_insurance, has_pension — Social protection
occupational_safety_training, experienced_injury_12m, workplace_harassment — Conditions
member_of_union, freedom_of_association, collective_bargaining_covered — Labour rights
operates_without_registration, no_bookkeeping — Informality indicators

Realistic Features

Gender pay gap embedded in earnings (female/male ratio ~0.82)
Social protection strongly linked to formal employment status
Occupational segregation: different sector distributions by gender

Care Economy & Time Use

20k rows

Structure

One row per individual (mixed gender). Time use diary data with care breakdown, care infrastructure, opportunity cost, and time poverty indicators.

Key Columns

sleep_hours, paid_work_hours, unpaid_care_hours, domestic_work_hours, leisure_hours — Time diary (~24h/day)
childcare_hours, eldercare_hours, cooking_hours, water_collection_hours — Care breakdown
has_childcare_access, has_electricity, has_improved_cookstove — Care infrastructure
forgone_earnings_usd, reduced_labor_participation — Opportunity cost
time_poor — Flag for >10.5 hours/day on paid + unpaid work

Realistic Features

Women do ~3.4x more unpaid care than men (4.3h vs. 1.3h daily)
Care infrastructure reduces care burden (water access, cookstoves, childcare)
Time poverty at ~19.5% of population, higher for women

Intersectional Inequality

25k rows

Structure

One row per individual. Multiple identity dimensions (caste, religion, disability, sexuality) with socioeconomic outcomes showing multiplicative intersectional disadvantage.

Key Columns

caste_category (general/OBC/SC/ST), religion, has_disability, disability_type, sexual_minority, indigenous
monthly_income_usd, employed, housing_quality_score, food_security_score — Outcomes
experienced_discrimination, discrimination_basis, discrimination_context — Discrimination
accessed_education through accessed_justice — Service access

Realistic Features

Multiplicative disadvantage: SC + female + disability worse than sum of individual effects
Intersectional penalty increases with each additional axis of marginalisation
Discrimination basis and context vary by identity combination

Environmental Justice

20k rows

Structure

One row per household. Pollution exposure, environmental hazards, health outcomes, cooking fuel, green space, climate vulnerability, and environmental governance.

Key Columns

air_quality_pm25, indoor_air_pollution, water_contamination_score, noise_pollution_level — Pollution
proximity_to_industrial_site_km, proximity_to_waste_dump_km, flood_risk_zone — Hazards
respiratory_illness_12m, waterborne_illness_12m, child_blood_lead_elevated — Health outcomes
cooking_fuel_type, cooking_location — Indoor air quality determinant
carbon_footprint_tco2, climate_vulnerability_score — Climate justice

Realistic Features

Environmental racism/classism: poorer communities have higher pollution exposure
Health outcomes linked to pollution load via logistic model
Indoor air pollution driven by cooking fuel type (firewood > LPG)

Community Development

18k rows

Structure

One row per individual. Social capital measurement with bonding vs. bridging ties, collective action, participatory governance, and community-driven development.

Key Columns

n_group_memberships, primary_group_type — Group membership
bonding_social_capital_score, bridging_social_capital_score — Bonding vs. bridging
trust_neighbors, trust_strangers, trust_local_leaders — Trust (1–5)
participated_in_collective_action, collective_action_type — Collective action
attended_village_assembly, voiced_opinion_in_meeting — Participatory governance
in_cdd_programme, contributed_to_project, satisfied_with_project — CDD

Realistic Features

Bonding social capital higher in rural areas; bridging higher for educated
Free-rider perception inversely related to trust
CDD programme effects on participation and community asset satisfaction

Digital Access & Literacy

20k rows

Structure

One row per individual. Device ownership, connectivity, 7-level digital skills hierarchy, usage patterns, misinformation, privacy, and barriers to access.

Key Columns

owns_smartphone, owns_computer, shared_device_only — Device access
has_internet_access, internet_type, monthly_data_cost_usd — Connectivity
can_make_call through can_use_govt_services_online — 7 hierarchical skills
digital_literacy_score — Composite (0–10)
encountered_misinformation, can_identify_misinformation, shared_unverified_info
barrier_cost, barrier_literacy, barrier_language, barrier_infrastructure

Realistic Features

Gender digital divide: women have lower access and literacy scores
Age divide: youth more digitally literate, elderly less connected
Digital literacy is hierarchical: basic skills prerequisite for advanced ones

Social-Emotional Learning

15k rows

Structure

One row per student (ages 6–18). CASEL framework with 5 SEL domains, academic outcomes, wellbeing, bullying, prosocial behaviour, and teacher/parent ratings.

Key Columns

self_awareness, self_management, social_awareness, relationship_skills, responsible_decision_making — CASEL domains (1–5)
sel_composite_score — Overall SEL (1–5)
math_score, reading_score, attendance_rate — Academic outcomes
life_satisfaction, bullying_experienced, bullying_perpetrated — Wellbeing
in_sel_programme, programme_duration_months, teacher_trained_in_sel — Programme

Realistic Features

SEL programme improves all 5 domains; bigger effects with longer duration
Girls score higher on social awareness and relationship skills
SEL composite correlates with academic performance and lower bullying

NGO Programme Finance

10k rows

Structure

One row per programme. Organisation characteristics, funding, budget breakdowns, financial health, compliance, effectiveness, and partnerships.

Key Columns

org_type — Local NGO, INGO, CBO, faith-based, social enterprise
annual_budget_usd, n_donors, primary_donor_type, funding_gap_usd — Funding
personnel_pct, programme_activities_pct, admin_pct, indirect_cost_pct — Budget
burn_rate_pct, months_of_reserves, sustainability_score — Financial health
audit_completed, audit_qualified, vfm_score — Compliance & VfM

Realistic Features

Admin costs 10–25%, with ~25% of orgs under-reporting overhead (real-world pressure)
Larger orgs have better compliance and financial health scores
Funding secured percentage varies by donor type and org capacity

Aid Effectiveness (ODA)

12k rows

Structure

One row per aid flow (2010–2025). Donor-recipient pairs with Paris Declaration indicators, fragmentation, conditionality, and coordination.

Key Columns

donor_country, recipient_country, year — Flow identifiers
oda_amount_usd, disbursement_type, channel — Aid modality
country_ownership_score, alignment_with_national_plan, uses_country_systems — Paris principles
donor_concentration_index, is_tied_aid — Fragmentation
has_conditionality, conditionality_type, conditionality_met — Conditionality

Realistic Features

Aid flows more to poorer, larger countries (with geopolitical weighting)
Paris indicators improve slightly over time (2010–2025 trend)
Budget support improves Paris scores; humanitarian aid bypasses systems

Media & Information Ecosystems

18k rows

Structure

One row per individual. Media access, consumption, source trust, media literacy, development communication, misinformation, and press freedom perceptions.

Key Columns

has_radio, has_tv, has_smartphone, has_internet — Media access
radio_hours, tv_hours, social_media_hours — Consumption (hours/week)
primary_news_source, trusts_primary_source — Information sources
media_literacy_score, can_identify_fake_news — Media literacy (0–10)
exposed_to_dev_content, dev_content_topic — Development communication
encountered_health_misinformation, shared_misinformation — Misinformation

Realistic Features

Urban–rural divide: urban = more digital, rural = more radio
Youth lean to social media; elderly to radio and TV
Media literacy reduces misinformation sharing

IRT Psychometric Assessment

20k rows

Structure

One row per respondent with 30 item responses. 3-Parameter Logistic (3PL) IRT model with item difficulty, discrimination, guessing, and response times.

Key Columns

theta — Latent ability (standard normal, driven by education/wealth/urban)
item_1 through item_30 — Binary responses (0/1 correct)
Item parameters: difficulty (b, −3 to 3), discrimination (a, 0.5–2.5), guessing (c, ~0.25)
rt_item_1 through rt_item_30 — Response times (seconds, lognormal)
total_score, pct_correct — Test-level summaries

Realistic Features

3PL model: P = c + (1−c) / (1 + exp(−a × (theta − b)))
Differential Item Functioning: items 5, 12, 18 favor females; items 8, 22, 27 favor males
Response times: slower for harder items, faster for higher-ability respondents
Theta–total_score correlation ~0.87

Field Survey Quality (Paradata)

20k rows

Structure

One row per survey interview. 200 enumerators with timing, GPS, response pattern, back-check, and enumerator-level quality data. ~5% of enumerators are fabricators.

Key Columns

interview_duration_min, travel_time_min, too_short, too_long, outside_working_hours — Timing
gps_latitude, gps_longitude, gps_accuracy_m, gps_suspicious — GPS validation
straightlining_score, acquiescence_score, digit_preference_score — Response patterns
back_checked, back_check_match_rate — Verification
enumerator_experience_months, surveys_completed_today, fatigue_flag — Enumerator
missing_rate, dont_know_rate, refused_rate, outlier_count — Data quality

Realistic Features

~5% fabricating enumerators: short duration + straightlining + GPS issues
Fatigue effect: quality degrades after 6+ surveys per day
Suitable for training data quality auditors and building fraud detection models

The Datasets

Household Survey

RCT Experiment

Panel Data

Agricultural Survey

Health & Nutrition

Education Outcomes

Labor Market

Microfinance

Program Targeting

Trade & Market Prices

Sector-Specific Datasets

Gender Programme

Girls' Education

Climate & Resilience

Agriculture & Value Chain

Animal Welfare

Public Health

Livelihoods

Advocacy & Rights

WASH

Humanitarian Response

Social Protection

Governance & Accountability

Cross-cutting & Methodological Datasets

Behaviour Change (KAP)

Cost-Effectiveness Analysis

Decent Work (ILO)

Care Economy & Time Use

Intersectional Inequality

Environmental Justice

Community Development

Digital Access & Literacy

Social-Emotional Learning

NGO Programme Finance

Aid Effectiveness (ODA)

Media & Information Ecosystems

IRT Psychometric Assessment

Field Survey Quality (Paradata)

Getting Started

1 Install & Generate All

2 Customize

Documentation

Structure

Key Columns

Realistic Features

Structure

Key Columns

Embedded Effects

Structure

Key Columns

Realistic Features

Structure

Key Columns

Production Function

Structure

Key Columns

Realistic Features

Structure

Key Columns

Embedded Effects

Structure

Wage Equation (Mincer)

Structure

Key Columns

Default Model

Structure

Key Columns

Why It's Useful

Structure

Key Columns

Price Dynamics

Sector-Specific Datasets

Structure

Key Columns

Realistic Features

Structure

Key Columns

Realistic Features

Structure