by W Harrod · 2010 — Submit a 1-page PDF file to acar@cs.uiuc by 6pm CST, Monday November 30, 2009. Workshop Format, Dates and Location. The workshop will focus on the topic of

100 KB – 30 Pages

PAGE – 1 ============
1 Workshop on Advancing Computer Architecture Research (ACAR-1) Failure is not an Option: Popular Parallel Programming Organizers: Josep Torrellas (University of Illinois) and Mark Oskin (University of Washington). Steering Committee: Chita Das (NSF and Pennsylvania State University), William Harrod (DARPA), Mark Hill (Uni versity of Wisconsin), James Larus (Microsoft Research), Margaret Martonos i (Princeton University), Jose Moreira (IBM Research), and Kunle Olukotun (Stanford University). Written by: Josep Torrellas, Mark Oskin, Sarita Adve, George Almasi, Luis Ceze, Almadena Chtchelkanova, Chita Das, Bill Feiereisen, William Harrod, Mark Hill, Jon Hiller, Sampath Kannan, Krishna Kant, Christos Kozyrakis, James Larus, Richard Murphy, Onur Mutlu, Satish Narayanasamy, Kunle Olukotun, Yale Patt, Anand Sivasubramaniam, Kevin Skadron, Karin Strauss, Steven Swanson, and Dean Tullsen. Funded by the Computing Research Asso ciation™s (CRA) Computing Community Consortium (CCC) as a fivisioning exercis efl meant to promote fo rward thinking in computing research and then bring these ideas to a funded program. Held on February 21-23, 2010 in San Diego, California Contact:; Websites: .php; August 2010

PAGE – 2 ============
2Contents 1Executive Summary . . 42Introduc tion .. 62.1 Background .. 62.2 Strategic Importance . .. 62.3 The Opportunity . 62.4 Limited Industry Experience and De ficient Educational Systems 6 2.5 Talent Abundance and Funding Sc arcity .. 73Workshop Ob jectives 74Recommendations for Computer Ar chitecture Resear ch Thrusts .. 84.1 Data Centers and La rge Scale Systems .. 84.1.1 Problem Statemen t and Goals 84.1.2 Research Thrust Description .. 94.1.3 Why is this Research Transformative . 114.1.4 What Are the Deliverables .. . 114.1.5 Research Disciplin es Involved . .. 114.1.6 Risk to Industry and So ciety If No t Purs ued 114.1.7 Benefits of Success to Industry an d Society .. . 114.1.8 Why Not Let Indu stry Do It . 114.1.9 Likelihood of Success . 124.2 Architectures to Enha nce Programmability .. 124.2.1 Problem Statemen t and Goals . 124.2.2 Research Thrust Description 124.2.3 Why Is This Resear ch Transformative.. . 144.2.4 What Are the Deliverables .. . 144.2.5 Research Disciplin es Involved . .. 144.2.6 Risk to Industry and So ciety if No t Purs ued 154.2.7 Benefits of Success to Industry an d Society .. . 154.2.8 Why Not Let Indu stry Do It . 154.2.9 Likelihood of Success . 154.3 Hardware-Software Co-D esign and As ymmetry . 154.3.1 Problem Statemen t and Goals . 154.3.2 Research Thrust Description 164.3.3 Why is this Research Transformative . 174.3.4 What Are the Deliverables .. . 174.3.5 Research Disciplin es Involved . .. 184.3.6 Risk to Industry and So ciety if No t Purs ued 184.3.7 Benefits of Success to Industry an d Society .. . 184.3.8 Why not Let indus try Do It . . 184.3.9 Likelihood of Success . 184.4 Domain Specific Languages .. 194.4.1 Problem Statemen t and Goals . 194.4.2 Research Thrust Description 194.4.3 Why is this Research Transformative . 214.4.4 What Are the Deliverables .. . 214.4.5 Research Disciplin es Involved . .. 214.4.6 Risk to Industry and So ciety If No t Purs ued 214.4.7 Benefits of Success to Industry an d Society .. . 214.4.8 Why Not Let Indu stry Do It . 214.4.9 Likelihood of Success . 225Educational Perspective . .. 195.1 Vision 225.2 The Challenges to Ma king this Happen .. 225.3 Approaches .. .. 225.4 Recommendations .. 23

PAGE – 3 ============
36Industry Collabora tion .. 236.1 How Academics Can Contribute .. .. 236.2 How Industry Ca n Contribute .. 237The Funding Landscape for Com puter Architectur e Research . 247.1 National Science F oundation (NSF) . 247.2 Defense Advanced Research Proj ects Administratio n (DARPA) .. 25 8Next Steps .. . 259Acknowledgments .. 2510Appendix A: Call for Position Papers 2611Appendix B: Workshop Attendees .. 2812Appendix C: Workshop Schedule . .. 2913Appendix D: Slides from the Working Groups . 30

PAGE – 4 ============
41 Executive SummaryThe arrival of the ubiquitous multi-core is a game-changing event for the computing industry. Much of the industry today is relying on parallel processing becoming main-strea m — although most software is still single-threaded and past experience consistently shows that parallelization is a difficult task. For the industry to make progress at historical rates, the next few years will need to witness significant changes in the so ftware and hardware of our computing platforms. In particular, from a computer arch itecture perspective, multi/many-cores will have to evolve to enable and support high-productivity parallel software development and execution. This workshop brought together computer architecture researchers from academia, industry, and national laboratories, and program managers from funding agencies to examine how computer ar chitecture can help enable ubiquitous parallel computing. The main goal was to identif y the key computer architectur e research challenges in devising the programmable parallel computing platforms of years 2020-2025, and to articulate an agenda and roadmap to address these challenges. The resulting research directions had to have broad community support, be key for funding agencies to fund, be forward looking rather th an incremental, and lead to a deep understanding of our field. The attendees identified four main comput er architecture research th rusts. They are (1) data centers and large-scale systems, (2) architectures to enhance programmability, (3) hardware-software co-design and asymmetry, and (4) domain specific languages. For each thrust, the re port outlines the research efforts recommended. 1. Data centers are emerging as a critical component of our IT infrastructure. In particular, they form the backbone of cloud computing, which will likely provide the utility computing necessary to revolutionize experimental computing techniques. Data centers will be used by billi ons of cost-conscious users and run applications written by many developers. In this research thrust, our ambitious long-term goals are to reduce the cost of data-center infrastructure to 1 Watt and $1 per month for the typical user, and to enable individual programs to efficiently scale from a single-node system with tens of users to a full data-center deployment with millions of nodes and billions of users. 2. The continued performance scaling of computer sy stems now requires extensive software and hardware changes to exploit parallelism. Attaining a high-performance, correct 1000-core chip that is also highly programmable is a major challenge. In this research th rust of architectures for pr ogrammability, our long-term goal is three-fold: (1) to ensure that programming for pa rallel architectures is as easy as it is now for sequential architectures; (2) to maintain Moore™s Law for performan ce — namely, to double the speed-up every 2 years; and (3) to eliminate concurrency bugs. 3. Specialization of the hardware and system-software layers eliminates ine fficiencies and overheads that come with the flexibility of general-purpose systems. It is pos sible to obtain orders of magnitude improvements in performance, performance per Watt, and performance per dollar. In the past, specialization has usually been limited to a small number of high-volume consumer applications. In this research thrust, we want to develop technologies necessary to deliver “turn-key” specialized computing systems quickly and economically. Specifically, our long-term goal is to design specialized architectures that deliver up to 10,000x speed-up for particular applications for less than $10,000. The ulti mate goal is a fully-automated generation of application- specific hardware for each program. 4. Domain specific languages (DSLs) are designed to express a particular class of applications efficiently. In the context of parallel computing, a DSL provides a set of sema ntics that are expressive and natural for developers of such a class of applications, yet describe the computation at a sufficiently high enough level that it is easy to exploit parallelism. In this research thrust, our long-term goals are to create DSL infr astructure that makes it easy to develop new DSLs, and to attain 5-10 commercially-available DSLs in the next 10-15 years. These DSLs should have widespread use. They should be as succes sful as SQL for their partic ular application domains. To make progress in all of these challenges, our universities must offer a strong educational program in parallel computing. We suggest making low-resistance changes to the curriculum to embed parallel thinking, including augmenting existing courses with simple additions where th ey make sense. We should work with NSF to impart urgency in the need to teach parallel practices throughout the curriculum.

PAGE – 5 ============
5 It is crucial to initiate a broad discussion between industry and academia on prog ramming models, programming languages, programmer productivity, applications, and the co mputer architectures to sup port them all. To make the academia/industry relationship robust, acade mics should invite indus try participants to pane ls, workshops, and other discussion forums on parallel computing. Academics should also prepare short courses on parallel programming and computing for professionals in industry. Industry, in turn, should provide direction and input into the critical problems it faces, provide access to experimental data, and provid e funding for academic research. Government funding for computer architecture research should be organized along larger, more ambitious projects that cover multiple layers of the computing stack. It shou ld also involve a concerted, complementary effort by multiple funding agencies, focusing on the research thrusts identified here. The next steps involve working with our professional colleagues to publicize this report among funding agencies, industry, academic circles, and the broad co mputer science and engineering community.

PAGE – 6 ============
62 Introduction 2.1 Background The arrival of the ubiquitous multi-core has caused an uph eaval in the computing industry. Hardware vendors can no longer produce microprocessors that make yesterday™s software exponentially faster. Instead, the industry is betting on radical changes in software an d hardware for its success. The fate of much of the IT industry rests on the success of main-streaming parallel (or concurrent) computing. The switch to parallel hardware began, from the consume r™s perspective, around 2004 . Since then, the number of cores per chip has kept increasing. Similar to the sw itch from 32 to 64-bit computing, hardware and software changes are uncoordinated. Consumer-oriented computers w ith eight cores are easily ava ilable for less than $1,000. Yet, just as a vast array of systems still execute 32-bit operating systems on 64-bit capable hardware, most software today remains single-threaded. Those applications that do take advantage of multi-core architectures do so only tepidly — with limited threads and little hope of scaling to the kind of parallel resources that hardware vendors will be capable of providing in the near future. Against this disappointing background, domain-specific systems such as GPUs, and novel concurrency models such as cloud computing flourish, but they do not use many of the bread-and- butter multi-core designs currently planned. 2.2 StrategicImportance The Information Technology (IT) sector has been a leading driver of economic growth in the modern world. As recent downturns have shown, a significant drop in the IT sector has widespre ad ramifications for all areas of our economy. At the center of the IT growth and innovation is the ability for hardware to provide the core building block of the industry, the processor, in a form that is expo nentially more efficient — either faster or lower power — each year. Historically, the computing in dustry has used this exponential effici ency gain to provide the world with ever richer software environments, more powerful por table devices, and faster communication links. Multi-core changes this. These efficiencies are no longer applied to products automatically. Instead, software developers are expected to be an integral component in the exponential efficiency gains we hope to see going forward. It is for this reason that popularizing parallel programming is of strategic importance to the IT industry and nation at large. 2.3 TheOpportunityAs with all great challenges, there comes a great opportunity : if software can be successfully adapted to execute on multi-core devices, then new levels of performance and effici encies can be obtained. If one task can be parallelized across multiple cores, those cores can finish that task soon er or, for the same total execution time, with significantly less energy. In practice, many applications can deliver new capabilities if one can provide orders of magnitude improvement in performance, performance per unit of power, or performance per unit of cost. This is true in all types of computer systems, from embedded and mobile systems to supercomputers. Examples in the mobile space include speech recognition, language translation, data analysis or s ituational awareness. For example, these could benefit firefighters, police, doctors, or soldiers in real-time field work. In th e desktop, notebook, or workstation space, some examples of applications include video processing, rich user interfaces or virt ual reality interfaces, and interactive problem solving and data analysis. As local st orage capacities grow, the types of important problems that can be solved in a personal computi ng environment also grow. For exampl e, the entire human genome fits in approximately 4 GB, allowing a single workstation to perform significant computations. Finally, at data-center or super-computer scale, problems that can benefit from th e efficiencies of multi-cores include computational fluid dynamics, drug discovery, simulation and modeling (e.g., weather, geology, and tsunami prediction), data mining, machine learning, and graph analytics. 2.4 LimitedIndustryExperience andDeficient EducationalSystems The fact that parallel programming has never been widespread in the IT industry complicates the ability to solve this challenge. Parallel programming has traditionally been a niche — successful in a few important domains such as

PAGE – 8 ============
8The workshop had to be inclusive of enough voices in the research community that the results are legitimately viewed as coming from the community, speaking with a unif ied voice. Moreover, the workshop should include as many junior researchers in our community as possible to ensure continuous leadership. A call for position papers was issued and publicized widely. Appendix A shows the Call for Position Papers. The co- organizers and the Steering Committee selected a total of 16 position papers among th em. The workshop took place on February 21-23, 2010 in San Diego, California, and 25 individuals attended. Appendix B lists the attendees and Appendix C shows the schedule of the workshop. The workshop had keynotes from George Almasi (IBM Research) and James Larus (Microsoft Research). It also had a panel of funding agency directors chaired by Sampath Kannan (NSF). In the morning of the first day, the authors of th e selected position papers presented them. The rest of the workshop consisted of breakout sessions (where the attendees broke into small working groups) and plenary sessions (where the findings of the groups were presented and critiqued). The workshop converged into and fleshed out four research thrusts, namely (1) data centers and large scale systems, (2) architectures to enhance programmability, (3) hardwa re-software co-design and asymmetry, and (4) domain specific languages. In addition, there were discussions on educational issues, industry collaboration, and funding collaboration. The rest of this report documents the findings. Appendix D shows the slides of the working groups, on which this report is based. 4 RecommendationsforComputerArchitecture Research ThrustsWe identify four key computer architectur e research thrusts. They are (1) data centers and large scale systems, (2) architectures to enhance pr ogrammability, (3) hardware-software co-des ign and asymmetry, and (4) domain specific languages. We present each in turn. 4.1 DataCentersandLarge ScaleSystems ProblemStatement andGoalsData Centers (DCs) are emerging as a critical component of our IT infrastructure. These large-scale systems provide the computation power and storage n ecessary for online services ranging from webmail and search, to social networking and customer-relationship ma nagement tools. Moreover, DCs form the backbone of cloud computing, which is viewed as the prominent way provide the utility co mputing necessary to revolutionize experimental techniques for biology, neuroscience, drug discovery, and other scientific disciplines. DCs are in many ways similar to traditional supercomputers, which support large-scale scientific experiments. Both include tens of thousands of nodes, implement distributed memory and storage schemes, and involve hierarchical networks. However, as both the enterprise and the scientif ic community strive toward utility computing, a new set of requirements is emerging. DCs will now be used by billions of users, either directly through services like social networks or indirectly through services like bioinformatic s. The number of data center application developers will also be significantly higher, including developers of online services and scientists using analytical techniques. Moreover, the vision of utility computing requires low cost, for both capital and operational expenses. We offer two grand challenges or goals for research in enterprise DCs and next generation supercomputers: (1) Reduce the cost of DC infrastructure to 1 Watt and $1 per month for the typical user; and (2) Enable individual programs to efficiently scale from a single-node system with tens of users to a full DC deployment with millions of nodes and billions of users. The first challenge comes from the need for low cost, utili ty computing. It is an aggressive goal that requires cooperative research in hardware architecture, system so ftware, and resource management. If we put all aspects of our lives and all scientific experiments on DCs using curren t approaches, we will probably be off by factors of 100x to 1,000x from the goal of 1 Watt and $1 per month per user.

PAGE – 9 ============
9The second challenge stems from the need for scalable capabilities and ease-of-use. It requires significant improvements to the hardware infrastructure, programming models, and system software used in DCs. ResearchThrustDescriptionTo address these challenges, we propose several research vectors that systematic ally revisit all aspects of a DC. The key is to view and optimize the DC in the same way we opt imize an individual computer, with energy efficiency as the crosscutting theme. The five, inter acting research vectors are: (1) Chip architecture — what new features are needed for chips used in DCs? (2) Node architecture — wh at is the most efficient node organization for DCs? (3) Memory and storage — what are the hardware and so ftware mechanisms for DC-scale memory and storage hierarchies? (4) Operating and runtime systems — what is the management layer for the whole data center? and (5) Energy-efficient DC design — a cr osscutting theme across all layers. ChipArchitecture forDataCenters Our vision is to enable many-core chips that are efficient building blocks for DCs. The key research question we aim to answer is how do we design a many-core chip from the ground up to enable a mini supercomputer/DC on a chip? There are several key topics to research to enable this visi on. First, we need to enable mechanisms for flexible on- chip resource management. This includes on-chip support for isolation and privacy, to eliminate interference between multiple tasks or virtual machines. It also includes prioritization and partitioning mechanisms to enable flexible allocation of on-chip resources (e.g., cores, cach es, interconnects, memory, and bandwidth) among threads or applications. Second, we need to en able mechanisms for flexible scaling down of on-chip and memory resources. This is essential for the scalability and energy proportionality of the data center. It can be attained with flexible modes for energy/performance levels, possibly with configurable or heterogeneous designs, and for different on-chip resources. Third, we need to enable fast messaging and synchronization support for both on- and off-chip communication, as well as mechanisms for safe and efficien t remote memory and storage access. Fourth, we need to provide mechanisms for end-to-end monitoring of applica tion performance. This is essential for understanding, tuning, and optimizing applications running on the large-scal e system. Fifth, we need to enable efficient execution of dynamic languages. Finally, we need to provide chip-level mechanisms for manageability, reliability, and upgradability to enable easier management of the entire DC. For each of these topics, we need to consider the hardware/software interface, virt ualization mechanisms, and practi cal hardware implementations. Node Architecture forDataCentersCurrent DC nodes are a direct evolution of personal computers. As such, they are not optimal for the parameters of large-scale DCs. Therefore, a rethinking of th e node is necessary to build an optimal DC. A number of questions need to be researched to enable e fficient node architectures. One is how to compose the chips and memory into a flexible node. The node can be homogeneous with simple cores, homogeneous with complex cores, or heterogeneous with a mix of cores of varying degrees of complexity and energy/performance levels. The node can include accelerators or, instead, acceleration can be performed at the chip level. The memory system and interconnect should be designed to enable trade-offs between throughput and quality of service. Different design choices have implications for the programming languages and the operating and runtime systems of the DC. It is therefore important to investigate the node design in conjunction with the design of such software. Another key research question is memory and communication system organization at the node level. We need to understand how to organize node-level memory and storage among different ch ips (e.g., shared or partitioned). We also need to research what are the best mechanisms to enable effi cient sharing and resource management. Most existing systems solely use commodity DRAM, which has high idle power consumption. The role of solid state disks and emerging non-volatile technologies need to be researched to enable large improvements in energy efficiency. It is also important to re-think the communication substrate at the node level to enable efficient and high- bandwidth communication and synchronization. An important avenue of investigation is using new technologies for communication, such as photonics.

PAGE – 10 ============ MemoryandStorageHierarchyThe memory and storage hierarchy for DCs must provide software developers with fast, high-bandwidth access to peta-bytes of data, hiding its distributed implementation nature and all its consequences (i.e., latency and bandwidth bottlenecks, synchronization, and consistency issues). There are several key topics to research to enable this visi on, including hardware and software issues at the node and global level. To reduce latency, a significant percentage of DC services store data in DRAM. This creates a need to revisit the balance of processing and memory within each node. Current and new solid -state memory technologies must also be utilized to alleviate the challenges due to the cost or volatility of DRAM. At the system level, we need to consider locality optimizations that will hide latenc y. Depending on the application characteristics, DC-level prefetching and caching mechanisms may be possible. Altern atively, we can employ ru ntime techniques that move computation to data. In any case, locality optimizations mu st interact well with isolation, availability, and energy management considerations. OperatingandRuntimeSystems The operating and runtime systems of the DC play a critical role in managing th e distributed resources to ensure the right provisioning to the right activity at the right time. This software stack has to necess arily scale to thousands of nodes, handling multiple levels of parallelism — from tightly knit cores on a chip to mu ltiple sockets in a node and to the hundreds of nodes connected by a loosely coupled network — and ensuring that the computational, communication and storage resources are allocated to meet the different needs. These needs include (1) response time and/or throughput requirements of the applications, (2) availability requirements (e.g., five 9™s of availability), (3) security and privacy assurances to insulate the set of running applications, and (4) compliance for regulation and auditability. There are several dimensions to the research that is ne eded for the development of this software stack, namely scalability, end-to-end monitoring, and multi-tenancy. Scalability is of param ount concern to accommodate millions of nodes under performance, availability and power vagaries. The assumptions about parallelism possible at a node level do not hold under the loosely-coupled distributed environment of a DC, mandating dynamic adaptation to errors, performance, and resource availability. To build a software stack that makes dynamic adaptation decisions, we need end-to-end monitoring of resource usage, performance, and availability. The software stack should also accommodate grac eful degradation and scale- down based on resource availability. Finally, multi-tenancy is an inherent feature of these sy stems, and accommodating the different applications and users temporally and spatially mandates extensive support fo r isolation from the performance, security and failure perspectives. This not just a software problem, but includes the entire stack from the silicon to the runtime system. EnergyEfficient DataCenterDesign Energy efficiency is a crossc utting theme across all the layers of a DC. By optimizing hardware and software within and across nodes, we can improve energy efficiency in te rms of requests/sec/Watt by a factor of 100Š1,000x or more over what conventional technology scaling can provide. The research highlighted above applied to commodity chips for compute, memory, and storag e can improve energy effi ciency by at least a f actor of 10–100x. The improvements come from optimizing the balance and organization of chips in the system, the type of chip used for each task, and how tasks are moved close to data or vice versa. They also co me from static and dynamic compilation techniques that reduce the software bloa t of online services, which directly c ontributes to energy overheads. If we use chips optimized for data centers, we can expect a nother factor of 10–100x in energy efficiency. These improvements stem primarily from eliminating the overhead of complex software protocols for messaging, remote access, or isolation, reducing the overhead of checks for dynamic languages, and pr oviding lower-power active modes for storage and network components that can be exploited by the runtim e to achieve energy proportionality. Energy proportionality is important in order to avoid waste during periods of low utilization. The active portion of the DC should be scaled down to match the lower utilizati on. While such scale-down approaches are already popular

PAGE – 11 ============
11for stateless computations, they are cu rrently impossible for most services because each node stores gigabytes to terabytes of state. To achieve state- aware scale-down, we need new low-po wer active modes for all DC resources (including on-chip resources, chips, and memory/storage resources) and runtime techniques that can manage scale- down without compromising quality of service or availability. Beyond energy efficiency, environmental sustainability is also a noteworthy goal for DCs. Current DC hardware is replaced every three years. We should build data center chips and memory/storage systems such that the hardware continues to be useful for a long time — even in the presen ce of high failure and wear-out rates. We should (re)use materials that are envi ronmentally friendly. Whyisthis ResearchTransformativeApplication requirements are growin g at a very rapid pace, both in terms of problem scale and resolution. Evolutionary technology scaling is woefully in adequate to meet these requ irements, since (1) voltages are not likely to be reduced much further, (2) as a result, it is no longer possible to power all the transistors on the chip, and (3) wires and interconnect do no t scale with feature size reduction. In stead, these requirements can only be met with disruptive innovations at all levels of hardware and software. The research proposed here will enable DCs and supercomputers that are 1,000x more energy efficient and higher performance than what is possible with evolutionary technology scaling and known energy-efficient design techniques. This will enable new applications that require major improvements in en ergy efficiency an d compute power. The growing size of DCs introduces several new challenges related to large scale infrastructure management and operational costs associated with powe r delivery and distribution systems, c ooling, space, and asset management. Only innovative architectural solutions can help address these issues, reducing power consumption and real estate footprint, doing more with less, and ex tending the lifetime of the infrastructure. In addition, integrated management at all levels can significantly lower the management and administration costs of DCs, enhance their availability and lower the bar on enabling widespread usage across a diverse user base. 4.1.4 WhatAretheDeliverables The deliverables are DCs and supercomputers that, for the same level of performance, are 1,000x more energy efficient than what evolutionary technology scaling can provide, m easured in requests/sec/Watt. 4.1.5 ResearchDisciplinesInvolved The research proposed requires comprehensive, whole-sy stem research. It includes the areas of computer architecture (chip design, platform architecture, and me mory and storage systems), systems software, parallel programming, and new technologies. 4.1.6 Risk to IndustryandSocietyIfNotPursued The inability to solve the more complex and larger problems that are critical to the well-being of mankind — such drug discovery and climate change adaptation — will have a large negative impact on society. Even if one could solve larger problems by relying on evolutionary technol ogy scaling, it will not be a cost-effective and scalable option, since the cost will increase significantly faster than economically desirable. BenefitsofSuccesstoIndustryandSocietyThe well-being of mankind critically depends on being able to solve key problems in the medical and physical sciences. The research proposed, by de livering more cost-effective DCs and supercomputers, can help us solve these problems. WhyNot Let IndustryDoItIndustry is typically bound by concerns for backward comp atibility. To make faster progress, we need research that breaks backward compatibility barriers and is not bound by possible short-term product concerns. Even if industry

100 KB – 30 Pages