Secondary Data


B.1 Six Data Sets With Rich Information on the First 1000 Days of Life and Childhood for Recent Periods, Primarily in the 21st Century:

(1-4) Young Lives is a unique international household survey of childhood poverty following the changing lives of about 8,000 children who were born in 2001-02 in Ethiopia, India, Peru and Vietnam – with survey rounds when the children were 1, 5, and 8 years of age (this is the “Younger Cohort” for Young Lives). Information was collected on all four risk factors emphasized in the GCC RFP in 2001-02 and on indicators of child cognitive, socioemotional, executive functioning and physical development by ages 5 and 8, as well as extensive household and community information (Barnett et al. (2012)). These are public use data for which access instructions are at

(5) The Chilean Encuesta Longitudinal de la Primera Infancia (ELPI) (, a random sample of 15,000 children aged 0-5 years in 2010 with follow-up in starting in May 2012 including an additional 3,000 children randomly selected from the birth records since the first survey round and with which four Team1000+ members are involved, will provide additional rich data on a broad spectrum of early childhood investments and both cognitive as well as non-cognitive outcomes.  These are public use data for which access instructions are at the website noted above.

(6) The NCAER HDPI Survey (1993-94) and the India Human Development Survey (2004-05) together are a longitudinal socioeconomic survey of approximately 13,000 representative Indian households, with a variety of information on the socioeconomic characteristics of the household such as location, caste, religion, household income, and asset and land ownership. For individual household members, data were collected on demographic variables, education, labor market outcomes, short-term and long-term morbidity, and access to healthcare. In addition, information on early childhood health interventions (e.g. immunization, breastfeeding) and child anthropometric measures were collected in both rounds, and standardized test scores for language and mathematics were collected for 8-11 year old children in the 2004-05 survey. The project will exploit the longitudinal nature of the data to analyze the medium-term effects of early childhood shocks or interventions on cognitive and educational outcomes. Since the interventions recorded in 1993-94 data were not randomized, quasi-experimental methods will be used to exploit that different groups of children were exposed to the shocks or interventions differently. Furthermore, the association between height (as a measure of past nutrition) and cognitive outcomes (test scores) in the 2004-05 dataset will be investigated. These datasets are publicly available from

B.2 Nine Data Sets with Information on the First 1000 Days of Life and 14+ Years Thereafter:

(1-5) COHORTS (Consortium of Health-Orientated Research in Transitioning Societies) (Richter et al. (2011b) includes the five largest prospective birth cohort data sets from low- and middle- income regions that in 2005 had at least 15 years of follow-up and an initial sample size of 2000 or more newborns: (1) the Pelotas (Brazil) Birth Cohort Study originally included all 5914 children born in the city’s hospitals during the 1982 calendar year with participants having an average age of 23 years in the last round, (2) the Institute of Nutrition of Central America and Panama (INCAP) Nutrition Trial Cohort (Guatemala) originally included all children <7 years in 1969, and all children born between 1969 and 1977 (N = 2392) with participants having an average age of 33 years in the last round; (3) The New Delhi Birth Cohort (India) recruited all married women living in a defined area of the city from 1969 to 1972; pregnancies were identified and the newborns were enrolled (N = 8181) and followed up, with participants having an average age of 36 years in the last round; (4) The Cebu Longitudinal Health and Nutrition Survey (CLHNS; Philippines) recruited pregnant women living in 33 randomly selected neighborhoods of metropolitan Cebu in 1983-4 (N = 3080) with participants having an average age of 22 years in the last round; and (5) The Birth-to-Twenty (Bt20; Soweto-Johannesburg, South Africa) cohort identified pregnant women with gestational age of 26–40 weeks in 1990 (N = 3273) with participants having an average age of 19 years in the last round. All five of these data sets are population-based and started recruitment during gestation or at delivery; their study populations experienced high rates of maternal and/or child undernutrition, and all are currently undergoing rapid demographic, nutritional and epidemiological transitions. Contact information and websites for each of these data sets is available in Table 3 in Richter et al. (2011a), which is appended to the end of this document.

(6) The Turkish Early Enrichment Project (TEEP), an intervention carried out in 1983–1985 with 4–6 year old children from deprived backgrounds (originally 255 participants) with a 19-year follow-up when the sample members were young adults will be available to Team1000+ (Kagitcibasi et al. (2001), Kagitcibasi et al. (2009); for questions about data access contact is "Kagitcibasi, Cigem" <>)  

(7) The Andhra Pradesh Children and Parents Study (APCAPS) is a cohort re-enrollment study that initially provided nutritional supplements through a cluster-randomized trial in Andhra Pradesh, India. The original intervention trial was conducted among more than 2,000 women and their children in 15 treatment and 14 control villages in 1987-90. Mothers and children in the treatment group were provided supplements from pregnancy through the first five years after childbirth, and the controls did not receive any supplements. In 2003-05, more than 1,100 of the children born during the original study were re-enrolled for a second round of survey (age group 13-18 years) that collected information on their health, education and labor market outcomes. In the final phase of the study, these children and their siblings (age group 18-21) were again surveyed during 2008-10. The longitudinal nature of the data will allow us the identification of the longer-run effects of the nutritional supplement interventions. However, due to the lack of child identifier information from the original 1987-90 survey, the subsequent rounds of the survey followed an “intent-to-treat” approach where all children born in the treatment and control areas were surveyed (irrespective of their enrollment in the original trial). Therefore, a cluster level “intent-to-treat” analysis of the effects of the intervention on educational outcomes will be conducted in the project. More information on the APCAPS data is available from The data are publicly available from the PHFI, and information about access to these data can be obtained from "Laxminarayan, Prof. Ramanan PHFI"

(8) The Indonesian Family Life Survey (IFLS) is a large-scale longitudinal socioeconomic and health survey. The first wave of the IFLS was administered in 1993 to 7,224 households representative of 83% of the Indonesian population. The full sample of individuals and households from IFLS1 was subsequently targeted for re-interviews during the 1997, 2000 and 2007 waves of the survey. The panel includes extensive information at the individual, household and community levels. The survey modules include various measures of chronic malnutrition (weight, height, birth weight), infectious diseases (incidence of worm infestation, measles and other infections), and pregnancy and birth complications (miscarriages, history of pregnancies, breastfeeding patterns), allowing the investigation of associations between exposure to these risk factors during early childhood and later life human capital accumulation, with children followed from their First 1000 Days in 1993 into adolescence in the year 2007.  For information about access to these public use data see

(9) The Chinese Health and Nutrition Survey (CHNS) is a longitudinal sample of about 4,400 households and 19,000 individuals initiated in 1989 with follow-up surveys in 1991, 1993, 1997, 2000, 2004, 2006, 2009 and 2011 (the 2011 round is not yet available for public use). The survey covers nine  provinces that vary substantially in geography, economic development, public resources, and health indicators. The combination of geographic and temporal variation in economic development makes this an unusually rich data set with which to explore how poverty relates to the four risk factors of interest and their impacts. A multistage, random cluster process was used to draw the samples surveyed in each of the provinces. Since the 1993 survey, all new households formed from sample households have been added. Since 1997, new households in original communities also have been added to replace households no longer participating in the study. The CHNS includes a wide range of variables pertaining to ECD inputs and outcomes both at the household and the community level, with particular emphasis on health and nutrition but also other important indicators, such as the time spent on child care and subsequent child schooling and post-schooling activities. Children who were 0-2 years of age in the 1989 CHNS were 20-22 years of age in 2009, so there is coverage over two decades of their lives.  For information about access to these public use data see

B.3 Multiple Cross-Sectional Data Sets that Permit Exploration of “Natural Experiments:” Team1000+ will use 11 combinations of data that fall into this category each of which is for a particular developing country and permits exploring specific interventions:

(1) Evaluation of the impacts on cognitive development and later life economic performance of a Clean Water reform introduced by the Mexican government in 1991, due to which the share of the population with access to chlorinated water rose from 53% to 90% in one year but with differential increases across states, with linkage to Programme for International Student Assessment (PISA) test scores for 15 year olds in 2003, 2006 and 2009 (the first two cohorts born before 1991 and the third born after) and with schooling attainment for cohorts born before and after this reform as reported in the 2010 census, the ENSANUT 2012 survey and the Mexican Family Life Survey (2002, 2005). State level time series data on diarrhea and a vector of control diseases will be obtained from the Mexican Secretary of Health ( Potential state-year varying confounders will be controlled including GDP (from German-Soto 2005), rainfall (from the National Meteorological Service) and state-led schooling programs.  For further information about the availability of these data, contact is "Bhalotra, Sonia"

(2) Evaluation of a Chilean 2003 secondary school subsidy on teenage pregnancy, child birth weight and cognitive development using a combination of the Chilean national socioeconomic survey (CASEN) and birth weight census records,. For further information about the availability of these data, contact is "Bhalotra, Sonia"

(3) The impact of the Nigerian Universal Primary Education program of 1976-81 in Nigeria on women’s education, fertility and schooling attainment of the next generation using recent Demographic and Health surveys which are public accesss household surveys and administrative records on variation of the reform by state. For further information about the availability of these data, contact is "Bhalotra, Sonia"

(4) A linked GCC funded project (PI’s Sikander and Maselko) will collect survey data as a follow up of a trial conducted in rural Pakistan in 2006. Those data will include questions  to be analysed in this project. The data access policy for these data will be exactly the data access policy in the Sikander-Maselko project (on which Bhalotra is a co-applicant). For further information, contact Jill Ahs or Joanna Maselko at Duke. Email:

(5) Health and educational outcomes of all public primary school children in the Indian state of Haryana in 2010 that will be linked with retrospective intra-state variation in childhood health programs. The dataset from Haryana is further discussed later. For more information about the availability of these data, contact is "Laxminarayan, Prof. Ramanan PHFI"

(6) The APCAPS cohort re-enrollment study in the state of Andhra Pradesh in India linked to District Level Household Surveys (DLHS). The DLHS dataset is discussed in more details later. For further information about the availability of these data, contact is "Laxminarayan, Prof. Ramanan PHFI"

(7)-(10) Comparisons of the Younger Cohort and the Older Cohort for the Young Lives data that are described above. In additional to the 8,000 children in the Younger Cohort born in 2001-2002 in Ethiopia, India, Peru and Vietnam, the Young Lives data include similar information on an Older Cohort of about 4,000 children over the three rounds, but with children about 8 years old in 2002 and therefore born in 1995-6. This means that policy changes that affected the four risk factors that occurred between 1997 and 2000 affected the Younger Cohort when they were in their First 1000 Days, but not the older cohort when they were in their First 1000 Days. It also means that the Younger and the Older Cohorts (and their mothers and other family members) experienced different economic, disease and weather shocks when they, respectively, were in their First 1000 Days. In addition both the Younger and the Older Cohorts can be linked to other data for at least some of countries, such as rainfall data around the time of child birth (in poor societies, particularly in rural areas, rainfall fluctuations often have strong effects on economic options and welfare), so it is possible to examine the impact of rainfall fluctuations during the First 1000 Days on cognitive achievement at ages 5, 8, 12 and 15 years.

(11) Haryana IBSY Data: In 2010, the state government of Haryana, India, launched an integrated child health scheme called the Indira Bal Swasthya Yojana (IBSY). Under this scheme, 0-18 year old children in schools and child welfare centers (Anganwadis) are regularly examined for various disease symptoms, and their anthropometric and hemoglobin level measures are taken. Children requiring treatment for certain diseases are treated either at the school following the health examination, or are referred to the nearest health facility for further follow up. The first phase of the scheme was launched in all public primary schools in 2010, resulting in a census dataset of more than 570,000 children in the age group 6-12 years. The project will use this health dataset in conjunction with educational outcome data for these children from the state Department of Education. Data on covariates will come from large household surveys such as the district level household surveys (2007-08), and intra-state variation in shocks and health policy interventions will be used to create natural experiment type settings. The IBSY scheme plans to cover each child once every year, and design child health policies tailored for the areas of need. If data from additional annual rounds become available during our study period, we will attempt to exploit the longitudinal nature of the data. These data are the property of the government of Haryana. They will be made available to the public pending approval from the government. Information on IBSY can be obtained from For further information about the availability of these data, contact is "Laxminarayan, Prof. Ramanan PHFI"

(12) The 2007-08 Indian District Level Household Survey (2007-08) is modeled after the well-known Demographic and Health Surveys but is much larger in size (approximately 1,000 households per district, resulting in a sample size of more than 600,000 households) and collects very detailed information on maternal and child health, including child birth and death, pregnancy, pre-natal and post-natal care, family planning, child immunization, and anthropometric measures. Data on household socioeconomic characteristics, and demographic and educational indicators for individuals are also collected. These data will be used along with the state level variation in maternal and child health policies in India. Team1000+ already has identified more than 25 such state specific schemes from the last three decades, which will allow difference-in-difference analysis of the long term impact of these policies. These data are publicly available from the survey agency – the International Institute of Population Sciences in India. Data access instructions are available at For further information about the availability of these data, contact is "Laxminarayan, Prof. Ramanan PHFI"

B.4 GCC Saving Brains Re-Enrollment Data Set Nominated for Funding: Maternal depression, cognitive stimulation and child development at the age of school entry -- follow up of the Thinking Healthy Programme RCT of 2005/6 in Pakistan. The reenrollment study for which one of the Team1000+ members is an investigator will assess long-term effects of a community-based mental health intervention. The original intervention established first-order impacts in lowering the incidence of depression at 6 and 12 months post-partum. The children whose mothers were randomized into treatment are 6-7 years old in 2012. The current proposal builds upon this project to look at a wider set of outcomes and model their economic implications. Specifically, we will (a) model the impact of the intervention on the employment and economic status of the mother and her family, and (b) simulate the returns to child level cognitive and developmental gains from the treatment of maternal depression on future economic outcomes. For further information about the availability of these data, contact is Joanna Maselko

B-5. Detailed Data on Public and Private Resource Costs of Interventions to Mitigate the Four Risk Factors: Team1000+ has considerable experience in identifying relevant resource cost data that are essential for evaluating the relative attractiveness of alternative interventions to ameliorate the four risk factors of interest during the First 1000 Days of life.

(1)  Carol Levine, formerly at Program for Appropriate Technology in Health (PATH) and now at the University of Washington has considerable experience in this area internationally, and will contribute to the project a literature review and new estimates for leading interventions directed towards each of the four risk factors in the four Young Lives countries – Ethiopia, India, Peru and Vietnam – which include a wide range of relatively low-income developing countries and which will permit more integrated analysis with the Young Lives data described above. For questions about access to these data, contact "Levin, Carol"

(2)  Two members of Team1000+, Pia Britto and Jan van Ravens, will bring to the project their considerable expertise and experience with UNICEF with costing pre-school, and in particular day-care (directed towards children less than 3 years old) programs in developing countries. For questions about access to these data, contact "Pia. Britto" <> or "van Ravens, Jan" <>