The National Airworthiness Council artificial intelligence working group (NACAIWG) summit proceedings 2022

Jonathon Parry,Donald Costello,Jason Rupert,Gavin Taylor

Systems Engineering（2023）

引用 2|浏览3

暂无评分

摘要

In 2021, the United States Department of Defense (DoD) airworthiness community joined together to form the National Airworthiness Council Artificial Intelligence Working Group (NACAIWG). In June 2021, the NACAIWG held their first summit and examined the use case of an uncrewed aircraft (UA), operating under the guidance of a United States Navy (USN) permanent flight clearance (PFC), performing automated air-to-air refueling (A3R), a mission standardized by the North Atlantic Treaty Organization (NATO) 3 months post the 2021 summit,1 as the probed receiver of a drogue configured aircraft.2 In June of 2022, a second summit was held to examine potential artifacts collected from academia to support a technical assessment, leveraging defined standards, criteria, and methods of compliance within specific relevant technical domains, by airworthiness authorities for a learning enabled component (LEC) of a learning enabled system (LES) to perform the object detection portion of the A3R task covered in the 2021 summit. This communication article summarizes the findings of the 2022 summit. During the 2021 summit, a baseline overview of literature pertaining to the field of computer vision (CV) was provided. For the 2022 summit, a sample of literature published since the 2021 summit was provided to introduce participants to updated work in the field of test and evaluation (T&E), CV, algorithmic assurance, and frameworks to provide assurance of algorithms. All papers were collected from the Purdue University Library3 and Dissertations and Theses Database4 and cited in references.5-14 Building off the standards survey conducted for the 2021 summit, a follow-on survey of standards for all modes of transportation (e.g., rail, sea, road, etc.) was conducted in support of the 2022 summit to determine if any notable improvements or advancements had occurred. While evolutionary gradual improvements have occurred within the field, a standard that governs ML applications that could be considered final currently does not exist at the time of submission of this paper. Key documents developed over the past few years not covered at the 2021 Summit were the Safety Assurance Objectives for Autonomous Systems Version 3.015 and SAE AS-6983.16 The greatest potential addition to the airworthiness certification community for the certification of LES since the 2021 Summit was the draft document AS-6983 from SAE G-34.16 AS-6983 is targeted at filling the void, identified in AFE-87 and AIR-6988, in the existing civilian aviation standards for the certification of traditional systems, but not a LES. Many of the members of the G-34 are from European Aviation Safety Agency (EASA), and thus contributed to the construction of the EASA Roadmap and EASA Level 1 guidance.17 Universally, the 2022 survey found that key areas associated with flight safety critical applications remain unaddressed to include the areas of ML tool qualification, ML hardware concerns for graphic processing units (GPU) and Tensor Processing Units (TPU), ML object-oriented software language concerns, ML reuse concerns, and reinforcement learning. The overall certification of the human/air system combination is divided into two parts: certifying the human and certifying the air system. The naval airworthiness process to certify a system is a technical assessment process, leveraging defined standards, criteria, and methods of compliance within specific relevant technical domains. An airworthiness assessment identifies areas of technical compliance and potentially, non-compliance. Areas of non-compliance are examined to identify and characterize resultant hazards, possible mitigations, or even areas potentially requiring re-design. Resultant residual risks are adjudicated through an appropriate risk acceptance process. Once the risk has been identified and mitigated to an acceptable level, the Airworthiness and Cybersafe Office (ACO) certifies the air system to be operated by a human operator, either in (HITL) or on the loop (HOTL), within a defined clearance envelope. The 2022 summit's goal was to reach a consensus on the required artifacts and a test plan for acquiring those artifacts that would provide assurance for a LES to perform a task traditionally reserved a human operator. As current certification processes assume that a fielded system would be operated by a human, the distinction in the use case that the LES would not be monitored by a human presented a difficult problem in terms of certification. Before a LES installed on a platform can be certified to operate without a HITL/HOTL, standards and methods of compliance need to be proposed, evaluated, and approved. A Neural Network is a function f ( d , θ ) $f( {d,\theta } )$ which maps an input data point d (such as an image) from dataset Ɗ to a vector in ℜn using calculations involving parameters θ. This output vector can be interpreted in a variety of ways; for example, in object detection, where the users wish to use the deep neural network (DNN) to find an object in an image, this vector may be four-dimensional, with the elements representing the row column, height, and width of a box intended to bound the searched-for object. The task of “learning” in the context of DNNs is the calculation of θ such that the outputs of the DNN are as accurate as possible on the given dataset; briefly (and roughly) speaking, we can hope that if the values of θ cause the DNN to be accurate on our given examples, we can have justified confidence in the DNN's accuracy for new examples, as well. To allow for learning the user of the DNN creates a loss function L ( $\mathcal{L}($ Ɗ, θ) which is small when f is close to being correct on Ɗ, and large otherwise. For example, in object detection, the distances between the row, column, height, and widths predicted by f and those drawn by people can be summed, resulting in a small value when the human and DNN agree. Starting with random values of θ, this prediction can be iteratively improved through an algorithm such as Stochastic Gradient Descent, which allows for θ to gradually become more and more appropriate for the dataset. For this to be effective, the function f must be very complex, Ɗ must be very large, and the amount of computational resources available to perform learning must be significant. However, these negatives have a powerful tradeoff; it is not necessary for a person to describe the object search for. Computer vision (CV) has historically been a very difficult problem, in large part because humans are bad at this step; our recognition of objects is intuitive, not algorithmic. DNNs allow us to instead merely indicate correct and incorrect output from the network, and mathematics to take care of the rest. The test plan presented at the summit was designed around identifying metrics and tools used in academia to provide a level of assurance of a neural network trained to classify, localize, and detect objects within the field of view of the camera. The objects of interest for this test program will be the KC-130 drogue, the coupler unit within the drogue, a 3-D printed probe tip, and the appearance of contact between the probe tip and the drogue. To appropriately scope the test program, limitations were set to ensure that the program could be completed in FY23 (Table 1) and only the lab portion of a test program was deemed within scope. In flight test, the domain you would plan to test for would be defined by the mission set the capability was designed to be used for. To standardize the design of the operational domain, the National Highway Traffic Safety Administration (NHTSA) presented their framework for designing testable cases and scenarios for automated driving systems.21 The six top level categories were adapted to the A3R mission after collecting inputs from multiple carrier qualified aircrew familiar with the air-to-air refueling mission set during carrier operations (Figure 1). This classification framework will serve as the design document for not only the test matrix, but also the collection matrix and each component of the classification framework would include many sublevels. Once the model has been trained by roboflow, the T&E portion will consist of algorithmic assurance, and “developmental” T&E metrics. To provide airworthiness additional assurance of the algorithm, academic research in the field of machine learning algorithm assurance was leveraged and proven verifiers were selected to be integrated into the test program.22-27 To collect “developmental” test point to evaluate with the computer vision model, metrics were selected from academic research published since 2015, collected from the Purdue University Library.3 A table of the selected metrics are provided in Table 2 with references provided for individual metrics below the table. Not documented on the performance metrics were the different types of misses that could occur when identifying objects, but additional information could be collected on each miss and those six types of errors were documented.34 While descoped to a 1-year effort, this test program will still provide critical gap analysis, education opportunities, and risk reduction to follow on efforts integrating in additional systems while preparing for surrogate flight test. In conclusion, the second NACAIWG summit was held in June of 2022 and the overall goal of the summit was to come to a consensus on a test plan to support collecting artifacts to support a risk assessment for naval airworthiness authorities of a LES. The paper provided a review of the summit findings and discussion of the path forward for this research program. The authors would like to thank the following people for their contributions towards the success of not only these proceedings and the summit, but their contributions in furthering the ability for the United States Navy towards achieving a certification framework for machine learning enabled systems. Mr. Marshall “Steve” Hynes Mr. Robert O. Jacob Dr. Anthony “Tony” Page Mr. Jon Rice Ms. Kristin Swift This work was supported in part by ONR grant N00014-22-S-B001. The authors do not have any conflict of interests. Jonathon Parry is Native of Indiana, Jonathon Parry attended Purdue University in West Lafayette, Indiana from 2007 to 2011 where he earned a Bachelor of Science in Applied Mathematics. From 2018 to 2020, he earned a Masters in Applied Data Science from Syracuse University in Syracuse, New York. He is currently pursuing a Doctor of Technology degree from Purdue Polytechnic Institute in West Lafayette. His current research interests include test and evaluation of learning enabled systems. In 2008, he joined the United States Navy where he continues to serve. During his time in the Navy, he completed pilot training from 2011 to 2015, where he graduated “Commodore's list with distinction” at both primary and advanced flight school. From 2015 to 2017, he completed training in the EA-18G and served in Carrier Air Wing 11 in support of Operation Inherent Resolve. In 2017, he was selected for the United States Naval Test Pilot School (USNTPS) and graduated in 2018 prior to serving as the Aeromechanical Project Officer for the Next Generation Jammer Mid Band at Development Test Squadron Two Three. He currently serves as the Advanced Development Deputy Program Manager at Airborne Electronic Attack Program Office. Jonathon is a member of the Society of Experimental Test Pilots, SAE G-34 committee, ASTM F38 committee, International Test and Evaluation Association, and numerous other autonomy based academic organizations. Donald Costello received the B.S. degree in systems engineering from the United States Naval Academy, Annapolis, MD, USA in 2000, the M.A.S. in aeronautical science from Embry-Riddle Aeronautical University, Daytona Beach, FL, USA in 2005, the M.S. in aeronautical engineering from the Air Force Institute of Technology, Dayton, OH, USA in 2009, the M.S. in systems engineering from the Naval Post Graduate School, Monterey, CA, USA in 2011, and the Ph.D. in mechanical engineering from the University of Maryland, College Park, MD, USA in 2020. He is a Permanent Military Professor in the Weapons, Robotics, and Control Engineering department at the United States Naval Academy, Annapolis, MD, USA. His work focuses on the certification and development of unmanned autonomous systems for practical use. Jason Rupert received Bachelors of Science in Physics and Mathematics from the University of Alabama in Huntsville (UAH), 1998, and then Masters of Science in Mechanical/Aerospace Engineering from UAH in 2001. Then in 2001, Mr. Rupert began his career by performing Unmanned Aviation Systems (UAS) flight test support at Fort Huachuca. That initial exposure was not enough so he carried on by providing UAS six degree of freedom (6-DOF) development and analysis, as well as supporting exploratory research and development test and evaluation (RDT&E) efforts. In 2006 he was lucky enough to support an RDT&E effort that was examined the effectiveness of intelligent agents being applied to existing UAS. He had a small career detour in 2007 that allowed him to run 6-DOFs and perform statistical analysis on volumes of Hellfire missile stockpile reliability data to determine hit and kill effectiveness, but quickly return to UAS in 2011. His return to UAS was in an assurance role, where he served as a software airworthiness functional for a decade on Army UAS. A year ago Mr. Rupert began his work on AI/ML certification, specifically assessing the possibility of certifying AI/ML for use on manned and unmanned flight safety critical applications for. In that role he has collaborated with colleagues from all branches of the US Military and various communities of practice, for example, SAE G-34 and Safety. Gavin Taylor is an Associate Professor in the USNA Computer Science Department. His area of expertise is AI and Machine Learning. He has a PHD in Computer Science from Duke University. He was the 2022 winner of USNA's Civilian Faculty Teaching Excellence Award. Dr. Taylor is co-director of USNA's Center for High Performance Computing Education and Research, and is chairing the development of USNA's new Data Science major. Data not publicly available due to legal considerations.

查看译文

关键词

opportunity management, SEE08 verification and validation, SEE13 risk, SEE21 safety and security, SEE23 acquisition and supply

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要