PROTOCOL: Protocol for a Systematic Review: Teach For America (TFA) for Improving Math, Language Arts, and Science Achievement of Primary and Secondary Students in the United States: A Systematic Review

Campbell Systematic Reviews(2016)

引用 0|浏览0
暂无评分
摘要
Research shows a shortage of effective teachers in many rural and urban K-12 public schools serving the highest proportions of high-poverty students across the US (Clotfelter, Ladd & Vigdor, 2006; Peske & Haycock, 2006; Monk, 2007). This shortage has persisted for decades (Darling-Hammond, 1984; Ingersoll 2001; Ingersoll & Perda, 2010). In the past decade, alternative route teacher preparation programs aimed at addressing this shortage proliferated across the United States (Kane, Rockoff, & Staiger, 2007). Alternative route teacher preparation programs seek to increase the supply of teachers more rapidly than traditional teacher preparation programs (Hess, 2002; Raymond & Fletcher, 2002; Blazer, 2012). Although their requirements vary widely, most of these programs are shorter, less expensive, and more practically oriented than traditional teacher preparation programs (Blazer, 2102). These programs also vary widely in their selection criteria for teacher candidates, approach to training these candidates, notoriety among education stakeholders, and evidence of their effectiveness (Hess, 2002; Kaine, Rockoff, & Staiger, 2007; Constantine et al. 2009). Teach For America (TFA) is nationally recognized as an alternate route teacher preparation program that has sought to address the shortage of effective teachers specifically in high-poverty rural and urban schools across the US (Teach For America, 2010). TFA stands out among its peer preparation programs for several reasons. TFA is the largest source of new teachers and is the largest recipient of philanthropic funding for teacher recruitment for K-12 education (Blazer, 2012; Mead, 2015) . Since 1990, TFA has recruited, selected, trained, placed, and supported approximately 25,000 new public school teachers (corps members) in the highest-poverty school districts in rural and urban areas. As of 2010, TFA corps members, represented between 10-15% of all new hires in high-poverty schools in the 35 regions served by TFA. In the 2013-14 school year alone, 11,000 TFA corps members reached more than 750,000 students in high-poverty K-12 rural and urban schools. 1 TFA is also the most publicly visible and widely debated alternative route teacher preparation program as noted by the Conner P. Williams (2014) article entitled “Stop Scapegoating Teach for America.”2 Finally, TFA is the most evaluated program of its kind. There have been multiple quasi-experimental and experimental studies conducted on the effectiveness of TFA to improve student outcomes. However, this body of primary studies has not yet been systematically reviewed and meta-analyzed (Raegan Miller, personal communication, 16 May 2014). TFA is a selective alternative route teacher preparation program that recruits college graduates (many from top colleges) and professionals to teach in low-income schools (Clark, Isenberg, Liu, Makowsky & Zukiewicz, 2015). Ensuring that corps members become effective teachers “who lead their students to significant academic achievement” is ensconced in TFA's mission of eliminating educational inequity in US public schools (Teach For America, 2010). To fulfill this mission, TFA developed a data-driven program model that includes (1) a rigorous selection process, (2) intensive pre-service training for selected corps members, (3) two years of ongoing professional development for corps members, and (4) programming that fosters alumni leadership after TFA corps members have completed their two-year commitment (Teach For America, 2010). Rigorous Selection Process. TFA's selection process includes a writing activity, telephone interview, sample teaching lesson plan, group discussion, and an in-person interview. TFA selects potential corps members who demonstrate competency in areas such as academics, leadership, critical thinking, ability to influence and motivate others, organizational ability, respect for students and families in low-income communities, and perseverance (Teach For America, 2010). Selected corps members, receive a five-week intensive summer training and agree to teach in their assigned school for at least two years. Those that complete the two-year commitment become alumni, and continue to be a part of the TFA community with continued access to resources and support for alumni (see Programming for Alumni section below). Intensive Pre-Service Training. The five-week intensive pre-service summer training covers (1) instructional and pedagogical philosophies and practices, (2) classroom management skills, (3) attitudes towards teaching, and (4) academic ability. These skills and attitudes are hypothesized to have a positive and meaningful effect on students' academic achievement. This effect is hypothesized to be larger than the effect the same students would have experienced had the TFA corps member not been placed in their classroom. On-going Professional Development. TFA corps members continue to receive training and support throughout their two-year teaching commitment to help them further develop skills and attitudes introduced during the pre-service training. This ongoing professional development includes observation and coaching from program directors; access to online classroom resources, advice, community support, and self-directed online learning on a private secure website for corps members and alumni. Programming for Alumni. At the end of their two-year assignment, TFA alumni are encouraged to continue to engage in meaningful ways to advance the mission of TFA and become influential education leaders and advocates for children. TFA alumni have access to teaching resources as well as support of the TFA community as they continue their professional careers. TFA Theory of Change Source: Teach For America Business Plan 2010-15 No systematic reviews and meta-analysis on TFA: The effects of TFA corps members and alumni on student academic outcomes have been investigated by educational researchers and economists alike using correlation, quasi-experimental, and randomized controlled trial designs.3 There have been vigorous debates about TFA's effectiveness based on these primary studies. When the results are compared by research design, the quasi-experimental studies send mixed signals but experimental findings have consistently found a positive and statistically significant effect in math, but not reading (Raymond, Fletcher, and Luque, 2001; Laczko-Kerr & Berliner, 2002; Seftor & Mayer, 2003; Decker, Glazerman, and Mayer, 2004; Clark et al., 2013). However, until a systematic review of these studies is conducted, we do not know the average effect of TFA across these experiments. Furthermore, this average effect may vary according to academic outcome, grade level, teacher experience, and teacher certification status, and this variation can only be investigated through a meta-analysis. Also, the methodological quality of the quasi-experimental studies (e.g., such as establishment of baseline equivalence between groups in the analysis sample) and experimental studies (e.g., high attrition disrupting random assignment) has not been systematically and rigorously evaluated using C2 systematic review methods. By systematically reviewing the primary studies on TFA, we can apply methods designed to limit the bias in the retrieval, appraisal, and statistical synthesis of the primary study findings (Petticrew and Roberts, 2006; Cooper, 2010). Using C2 systematic review and meta-analysis methods, we can empirically investigate whether effect sizes reported in primary studies are consistent, and can be generalized across populations and settings. We can also investigate whether findings vary by subsets of primary studies. The use of meta-analysis, after primary studies have been systematically reviewed, will allow us to statistically synthesize study findings, potentially increase the power and precision of effect sizes reported in primary studies, and potentially enhance their generalization (Chalmers and Altman, 1995; Petticrew and Roberts, 2006; Cooper, 2010). Narrative reviews are not a substitute for systematic reviews: The narrative reviews of quasi-experimental studies and experimental studies that have been conducted on the effects of TFA on K-12 students' academic outcomes are helpful to gain an approximate idea of the amount of agreement or disagreement of treatment effects across studies. They also help us understand what treatment effects look like across different samples. However, the primary limitation of a narrative review is the lack of coding of study characteristics and effect sizes and a statistical synthesis of these effect sizes. Without this, it is difficult, if not impossible for narrative reviews to cognitively and systematically manage and control for the many sources of variation in primary study characteristics and effect sizes (Chalmers and Altman, 1995; Petticrew and Roberts, 2006; Cooper, 2010). These variations arise from the different time periods, within study sampling, sample characteristics, group comparisons, outcomes measures, and designs. Reporting of such studies, if not handled systematically, could produce the appearance of conflicting results, or produce consistent results without empirical information across studies to understand why. In contrast, a systematic review transparently and systematically combs through the evidence, controls for study quality, and, when appropriate, statistically synthesizes the results with a view to presenting findings with greater clarity, and less potential bias, than narrative (or literature) reviews (Chalmers and Altman, 1995; Petticrew and Roberts, 2006; Cooper, 2010). Reliable and valid systematic evidence to address future scale up of TFA: With the continued shortage of effective teachers in high-poverty rural and urban schools, it is reasonable to predict that the demand for alternative route teacher preparation programs, like TFA, will increase, not decrease. TFA used the i3 scale up funds to more than double its corps members from 7,300 to 15,000 teachers and increase its presence from 46 to 60 urban and rural regions across the country. By the end of 2015, TFA teachers would reach nearly one million students in some of our country's highest-need communities. There were enough internally valid randomized control trials and matched comparison studies with substantially positive and statistically significant findings to motivate the US Department of Education's Office of Innovation and Improvement (i3) to award TFA in the fall 2010 a 50 million dollar grant to scale up nationally at the elementary, middle, and high school levels. However, the randomized controlled trials and matched comparison studies were presented in a narrative review, by an independent evaluator, to make the case for the i3 scale up funding. The i3 award and narrative review is not a substitute for using C2 systematic review methods to objectively review the quality of randomized controlled trials and matched comparison studies and synthesizing the effect sizes to estimate the average effect of TFA that report the effects of TFA on student academic outcomes A TFA systematic review and meta-analysis, at this time, as an important benchmark: The Investing in Innovation Fund (i3) scale up impact evaluation of TFA is presently being conducted. The effect of TFA on student academic outcomes will be evaluated with an RCT at the elementary grades, and matched comparison QEDs at the middle and high school grades because of the challenges in randomly assigning students beyond elementary grades. The findings from these studies will be released by the US Department of Education in spring 2016. Appropriately synthesized effect sizes in a C2 systematic review prior to TFA scale up could serve as the “maintenance” benchmark for comparative purposes when the i3 scale up results are released, and allow us to evaluate if the effects are maintained or changed during and at the end of TFA scaling up. The comparison of pre-scale up average effect sizes (from a C2 systematic review) to scale up effect sizes (from the i3 evaluation) can make an important contribution to knowledge because one of the primary challenges associated with scaling up interventions is maintaining the effectiveness of the intervention as the intervention goes to scale (Klingner, Boardman, and McMaster, 2013). The systematic review will also create an empirical database of TFA effect sizes (and corresponding study characteristics) that can be used to summarize the empirical landscape of the highest quality research on TFA, based on C2 systematic review standards, prior to release of the scale up findings. In addition, the systematic review can be used to compare effect sizes from the scale up study (across treatments, samples and settings) to average effect sizes (across studies, treatments, samples, and settings) from the systematic review. The answers to the first and second questions will provide education policymakers and stakeholders with a systematic profile that presents the author(s), date of publication (or reporting), sample characteristics, TFA and comparison groups, design, and outcomes. This profile will help education policymakers understand the potential sources of variability in TFA studies and how the reporting of such studies, if not handled systematically, could produce the appearance of conflicting results. The answers to the third, fourth, and fifth questions are on the main effect of TFA and are considered confirmatory. The main effect is defined as the effect of TFA corps members on a particular student academic outcome, such as math, relative to non-TFA teachers without controlling for certification status and years of experiences (these controls are implemented when addressing question seven). We focus on TFA corps members because of TFA debates and the popular press focuses more on corps members and less on the alumni. This is partly because a smaller percentage of corps members transition to alumni status and continue to teach for five years (Noell & Gansle, 2009). The answer to the sixth question addresses the methodological issue of whether to combine results of RCTs and QEDs. This decision should be based on the methodological quality of the studies, and how similar the average effect sizes are. For studies that passed the methodological quality screening and are included in the meta-analysis, we will evaluate whether the effect size differences between the RCTs and QEDs exceed .05 standard deviation units. We focus on the magnitude of the difference. The total number of RCTs and QEDs used to test for this difference may result in low statistical power. For this reason, we define a “substantial” difference, in the weighted average effect for RCTs and QEDs, using the WWC baseline equivalence standards (WWC Handbook Version 3.0). If the difference between the average effect sizes for the RCTs and QEDs exceeds .05 standard deviations, we will not combine the RCTs and QEDs into a single meta-analysis to produce an overall, weighted average effect size across the two design types. The answer to the seventh question is based on an exploratory analysis but will provide evidence to inform future research on TFA that speaks directly to the debates between TFA critics and TFA proponents. This will be accomplished by estimating whether the main effects of TFA are moderated by TFA status (corps members and alumni), certification status, or years of teaching experience, through a series of ANOVA analysis for categorical moderators and bivariate meta-regression analyses for the continuous moderator. Similarly, the answer to the eighth question examines whether the main effect of TFA differs by level of fidelity of TFA implementation as reported in the primary study. Teacher turnover in TFA is an important issue in TFA studies and if teacher turnover is reported as an outcome in both the TFA and comparison groups in primary studies. The answer to the ninth question will address whether TFA has a main effect on this outcome. The answer to the eleventh question is descriptive, relies on what the research reports, and is designed to provide contextual information for study finding by reporting what authors found regarding the cost effectiveness of TFA. This review will include primary studies with research designs that, when implemented well, are capable of generating data that can be used to make generalizable causal inferences about the effects of TFA on student academic outcomes. Eligible designs that meet these criteria are randomized controlled trials (RCTs), regression discontinuity designs (RDDs), single-case designs (SCDs), and quasi-experimental designs (QEDs). However, we limit the eligible designs for this review to RCTs, where random assignment is used to form intervention and comparison groups, and QEDs, where non-random methods such as matching or other statistical methods are used to form a counterfactual group that is comparable to the intervention group on measured characteristics. RDDs and SCDs will be excluded from this review since statistical methods for incorporating RDD and SCD data into meta-analyses are, to the best of our knowledge, not well established. For example, the Campbell Collaboration Methods Policy Brief is silent on the statistical synthesis of the RDD and SCD. Furthermore, the nature of TFA interventions and the results of our cursory literature search indicate that RDDs and SCDs are rare. Research designs that lack a comparison group, such as single-group “pretest/posttest” designs, will be excluded from the review. It is well established in the methodological literature and in the practice of educational research that designs without a comparison group cannot rule out a competing explanation for observed differences between intervention and comparison groups on an outcome (Shadish, Cook, and Campbell, 2002). We will include studies with participants who are K-12 students with TFA corps members, TFA alumni, and non-TFA teachers in rural and urban public schools in the United States. At the time of the intervention, the teachers in the treatment condition must be TFA corps members or TFA alumni; the control condition must include non-TFA teachers who have never participated in TFA. Non-TFA teachers may vary in their years of teaching experience and certification status. During the time frame of the study, all students must have a teacher who meets the eligibility criteria for TFA teachers or for non-TFA teachers. We will exclude studies that focus on Teach for America's Early Childhood Initiative, defined as initiatives that start prior to kindergarten, since early childhood studies are outside the policy relevant scope of this review. The TFA intervention condition will include TFA corps members, who are serving their two year commitment, TFA alumni who have completed the two year program but continue to teach, or both. The non-TFA comparison condition will include teachers who have never participated in TFA. These teachers must not have received preparation or training in programs associated with TFA. Teachers in the non-TFA comparison condition can vary in their certification status: traditional, alternative, emergency and uncertified. To be included in the review, the study must include a TFA condition (as described) and a non-TFA condition (as described). Studies that create an intervention group by bundling the TFA corps members or alumni with teachers trained in other alternative teacher preparation, such as the New York Teaching Fellows Program, will be excluded from the review. The reason is that when TFA is bundled in this way it is impossible to disentangle the effect of TFA from the effect of other alternatively prepared teachers in the intervention group. The review will include studies with at least one academic student outcome in math, English language arts, or science domains. Student outcomes in other non-academic (or non-cognitive) domains will be documented in the coding guide but will not be reported in the review. Multiple types of outcome measures will be included, although our experience with reviewing the TFA literature indicates that the primary types of outcome measures we will encounter will be state assessments, end-of-course assessments, and other standardized assessments. State assessments, end-of-course assessments, and other standardized assessments are eligible for inclusion in the review provided that they were administered as intended. Non-standardized assessments, such as researcher-developed assessments, are eligible for inclusion however the study must provide evidence that the measure (1) has face validity and reliability, (2) is not over aligned with the intervention, and (3) administered in the same way for both intervention and comparison groups. The first criterion is that the measure has face validity and sufficient reliable. A description that shows that the measure is clearly defined and measures the construct it is supposed to measure can serve as evidence for face validity in this review. Reliability evidence may come in the form of internal consistency, test-retest reliability, or inter-rater reliability. The second criterion is that a study must provide evidence that the measure does not closely resemble aspects of the intervention. For example, the measure should not have items or materials that intervention teachers have access to through their TFA training materials but that comparison teachers do not. The third criterion is that eligible outcome measures must have been used the same way in the treatment and comparison conditions. The review will include studies that meet the outcome inclusion criteria for students and have at least on teacher outcome on teacher leadership which is a key mediator between TFA teacher training and student achievement in the TFA theory of change. Additional teacher outcomes that will be included in the review are content knowledge, years of teaching experience, or overall academic ability. Teacher Leadership. There is no single definition, however. One that comes closest to aligning with the TFA framework is teachers who take on leadership roles and additional professional responsibilities such that leadership roles and decision making responsibilities extend beyond the school or district administrative team to the teacher. Reliable and valid measures designed to tap into this construct will be eligible for the review. Content knowledge. To be eligible for the review, an outcome should measure a teacher having a solid background in a subject or content area as exhibited by a college minor or major in the subject or content area such as math, reading, or science. Teaching experience. To be eligible for the review, an outcome should measure the total number of years of classroom teaching experience in the field. Academic ability. To be eligible for the review, an outcome should tap into the construct of of academic skills as measured by SAT scores, ACT scores, grade point average, or selectivity of the college attended. When reviewing outcome measures according to the three criteria, we will apply the definitions in the WWC Procedures and Standards Handbook Version 3.0, page 16, section 4. For example, thresholds for the psychometric properties that determine the reliability of an outcome measure will be based on the WWC Evidence Standards that require (a) internal consistency of 0.50 or higher; (b) temporal stability/test-retest reliability of 0.40 or higher; or (c) inter-rater reliability of 0.50 or higher. Teacher outcomes such as years of teaching experience that do not have psychometric properties must show evidence of being collected consistently across groups in the study. One school year is the minimum dosage that study participants must have had in order for the study to be included in the review. All durations of follow-up above or equal to the minimum dosage will be included in the review. In the meta-analysis we will control for study to study differences in the follow-up period by meta-analyzing studies with the same follow-up periods (studies with one-year follow up outcome will be meta-analyzed on that outcome together, studies with two-year follow-up outcome will be meta-analyzed on that outcome together, and so on). Studies will be excluded if the minimum dosage is less than one school year or if the treatment and comparison groups do not have a comparable dosage. The review will include studies that take place in K-12 public schools, including charter schools, in the United States. Limiting the setting to K-12 public schools helps ensure that the review will generate evidence that informs the TFA policy debate. Privately funded schools, early childhood education programs, higher education programs, adult education programs, and alternative schools, such as correctional programs, will not be included in the review. The goal of the literature search, consistent with the C2 Information Retrieval Policy Brief, is to identify all eligible studies on the effectiveness of TFA that are formally published (peer review literature) and informally published (grey literature). This involves developing search strategies that are efficient, capture the relevant studies while minimizing the amount of irrelevant material, and minimize bias. With this goal in mind, the final search strategy will be developed in consultation with a C2 Trials Search Co-coordinator and academic librarian at the University of Pennsylvania. The literature search will be implemented by (1) searching electronic databases, (2) searching the grey literature in which studies are published informally, (3) soliciting previous authors of TFA studies, and (4) manually scanning the Table of Contents of the most current issues of those journals in which TFA effectiveness studies are published. To ensure study relevance, electronic searches will limit retrieved articles to those published (formally or informally) between 1994 and 2015. Our experience with the TFA effectiveness literature leads us to predict that this twenty year window is wide enough to include all of the effectiveness studies that have a comparison group and, therefore, would be eligible for the review. We also plan to search additional databases not covered by ProQuest. These are JSTOR, Academic Search Premier, and Education Next/Full Text. Based on these domains, we will employ search terms that connect the domains with the Boolean “AND” operator. Within domains, we will use the Boolean “OR” operator in order to search multiple keywords. The search terms will be similar to the following: AND AND AND (“random assignment” OR “randomized experiment” OR experiment* OR “experimental design” OR “control group” OR “non-experiment” OR “non-experimental” OR “quasi-experiment” OR “quasi-experimental” OR “comparison group” OR “matched comparison group” OR “matched comparison” OR “matched groups” OR “statistical matching” OR “propensity score matching” OR “systematic review” OR “review” OR “meta-analysis” OR “research synthesis” OR “research review”) Publication Date = 1 Jan 1994 – 2015 Due to controlled vocabulary differences across databases, the main search term will change for each database. We will consult the database thesauruses in order to build the final search terms for each database. We will search each database separately and tailor our strategy for each. For example, we will look up search words in each database thesaurus to see which descriptors are available, and make use of grade level and publication type filters. Results from our literature searches and other searches described next will be stored and managed using RefWorks online bibliographic software. In addition to the main database searches, there will be a five-step grey literature search that involves 1) searching grey literature databases, 2) manually searching targeted websites, 3) searching conference presentation databases, 4) searching existing reviews, and 5) searching Google. The database searches will use PolicyFile, PsycExtra, and OpenGrey.eu. The manual grey literature searches will include general websites for organizations that conduct research across many areas of education (Table 1) as well as targeted websites for organizations that have a focus on teacher education or TFA research (Table 2). In order to make sure that relevant conference presentations are included in the meta-analysis, we will search the EditLib and Index of Conference Proceedings databases for conference abstracts using search criteria similar to what will be used for the main database search. We will also search existing reviews in order to refine the search strategy and check references for studies that should be included. Existing reviews will be identified through the main database searches and the grey literature searches as well as through searching the Campbell Library. >Lastly, we will do an advanced Google search where we use criteria similar to the main database search and screen the first twenty pages of results. Based on studies we retrieved from our cursory/background searches, we will develop an email list of all researchers who authored an effectiveness study. We will also develop an email template that briefly describes the C2 systematic review on TFA, provides a bibliography of all effectiveness studies identified from our literature search whether or not these studies are eligible for our review, and requests that study authors refer us to 1) any effectiveness studies not in the bibliography or 2) any authors not in the bibliography who may be aware of TFA studies that have not been formally published, or 3) both. The limited resources and personnel prevent us from conducting a comprehensive hand search of social science journals where TFA may be published. Moreover, our search of bibliographic databases that indexed grey literature, grey literature dat
更多
查看译文
关键词
systematic review,secondary students,teach,science achievement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要