Jean-Loup Baer (S'66-M'69) received the Di-& plome d'lngenieur degree in electrical engineeringand the Doctorat 3' Cycle in computer sciencefrom the University of Grenoble, France, and thePh.D. degree from the University of California,Los Angeles (UCLA), in 1968.Prior to joining the University of Washington,Seattle, in 1969, where he is currently a Professorof Computer Science, he was a Research Engineer with the Laboratoire de Calcul, University of Gre-noble, France, from 1961-1963, and a memberof the Digital Technology Group at UCLA from 1965-1969. His present in-terests are in parallel and distributed processing. He is the author of ComputerSystems Architecture (Computer Science Press). He served as an IEEEComputer Society Distinguished Visitor from 1973-1975 and was an Asso-ciation for Computing Machinery National Lecturer from 1977-1978. Heis a Guggenheim Fellow (1979-1980), an Editor of the Journal of VLSI andComputerSystems, an Editor of IEEE TRANSACTIONS ON COMPUTERS,and an Associate Editor of the Journal of Computer Languages. Hung-Chang Du received the B.S. degree in math-ematics from National Tsing-Hua University, Tai-wan, Republic of China, in 1974, and the M.S. andPh.D. degrees in computer science from the Uni-versity of Washington, Seattle, in 1980 and 1981,respectively.Since September 1981, he has been with the De-partment of Computer Science, University ofMinnesota, Minneapolis. Presently, he is an Assis-tant Professor and his current research interestsinclude computer architecture, parallel/distribut-ed processing, and physical database design.Richard E. Ladner was born in Berkeley, CA, onAugust 22, 1943. He received the B.S. degreefrom St. Mary's College, Moraga, CA, in 1965and the Ph.D. degree from the University of Cali-fornia, 33657

Berkeley, in 1971, both in mathematics.From 1971 until the present he has been on thefaculty at the University of Washington, where heis currently Professor of Computer Science. Hisresearch interests include computational complex-ity, algorithms, and computer-aided communica-tion for the disabled. Abstract-A classification method for parallel sorting architecturesis presented for comparison of existing designs. Some commondrawbacks of these parallel sorters are noted: in general, they arelimited by their data accessing mechanism, their hardware utilizationis low, and their sorting speed is in some sense independent of thenumber of sorting elements used in the architecture. In this paper wepresent a detailed design and analysis of distributor based sorters(parallel balanced tree sorters) which can sort lists whose size is pro-portional to the memory space available and whose speed is propor-tional to the number of sorting elements used.Index Terms-Computer architectures, distributed system, parallelsorting, performance analysis algorithm.Manuscript received August 8, 1981; revised November 8, 1982.L. E. Winslow is with the Department of Computer Science, Universityof Dayton, Dayton, OH 45469.Y.-C. Chow is with the Department of Computer and Information Sciences,University of Florida, Gainesville, FL 32611.1. INTRODUCTIONSINCE sorting is an essential operation which utilizes asignificant portion of the system resources, faster sortingmethods are desirable. In general, sorting methods can beclassified into two major categories: sorting by distribution andsorting by comparison. In either case, 源`自*六)维[论*文'网www.lwfree.cn

there is a theoreticallower limit to the time required if the sorting process is im-plemented by using a single distributor or comparator. To pushbeyond this limit, parallel sorting elements must be used. Todesign and analyze a parallel sorting machine, a number offactors should be considered; among these factors are. methods of accessing the data elements* size of data elements* number of sorting elements* structure of the sorting element* architecture of the sorting machine. A number of parallel sorting machine designs have appearedin the literature; for example, the ladder [2], the network [ 1],and the tree [8] sorters. All of these designs utilize multiplesorting elements to gain speed improvements over traditionalsingle-element sorter approaches. In comparing these parallelsorting machines, however, we find that increasing the numberof sorting elements only increases the size of the list which canbe sorted; it has no effect on the sorting time required. In otherwords, the sorting time is in some sense independent of thenumber of sorting elements.In this paper, we first survey the existing designs of parallelsorting machines. A classification technique is introduced tocompare various sorting architectures and their complexities.A balanced tree (BT) sorter designed by using distributorsrather than comparators is developed and analyzed in SectionsIII and IV. Section V presents a parallel balanced tree (PBT)sorter, an extended BT sorting which incorporates multiple BTsorters working in parallel. Both sorting designs are shown tobe superior to the existing sorting architectures. Significantimprovement in the sorting speed and the size of the list canbe achieved.11. A CLASSIFICATION AND COMPARISON OFPARALLEL SORTING MACHINESThere are sorting algorithms, such as Quicksort, which re-quire sorting times of O(N * log2 N) on a single processorsystem. Information theoretic arguments show that sorting alist of N items is equivalent to computing N * logb N digits(where the base b is arbitrary), so Quicksort is in some sensean optimal base 2 method. If P processors are working inparallel for time T, then they should produce PT units of work.Thus, using parallel processors should ideally reduce thesorting time by a factor equal to the number of processors used.To measure how close a design approaches this ideal, we in-troduce an efficiency measure E, defined as the ratio of thetheoretical work to the work actually produced; that isN * logbN --. Number of elements * Time usedThis measure of efficiency is useful in comparing differentdesigns of parallel sorters.When parallel processing elements are introduced into theprocess of sorting, it is essential to consider the method of ac-cessing data. The speed of a parallel sorter will be limited if thedata can only be accessed sequentially. On the other hand, itis not always practical to assume that all the data to be sortedcan be fed to the parallel sorter simultaneously. To have a fairand precise comparison of parallel sorting designs, we classifythe sorters into the following categories:I') sequential input/sequential output2) sequential input/parallel output3) parallel input/sequential output4) parallel input/parallel output5) hybrid input/hybrid output.Fig. 1 illustrates these five categories. The boxes and trianglesin Fig. I represent the collection of comparators and the ar-chitecture of the sorters. Dots and lines represent the parallelor serial nature of the data input and output. The last category, hybrid input/hybrid output, includes processing parallel se-quences of serial data; for example, when the data are storedin a secondary storage device, such as a disk, and can be ac-cessed in parallel only to some small degree.The ladder sorter, introduced by Chen [2], is a typical se-quential input/sequential output sorter. It consists of a chainof shift registers with comparators between links in the chain.The data are loaded sequentially, and, as the items are loaded,they are interchanged among the shift registers so that as soonas the last item is loaded, the sorted data start emerging se-quentially from the ladder. The size of the ladder and thesorting time, which includes the time for loading, are both of0(N).A typical example of the parallel input/sequential outputsorter is the tree sorter developed by Song [8]. Parallel dataare fed to the leaves of a binary tree; pairs of leaves are com-pared simultaneously and the larger (or smaller) of the twosons is propagated to the father node; continuing in this fash-ion, after log2 N time units, the sorted sequence starts emergingfrom the root of the tree. The size and time complexities are0(N) and 0(N + log2 N), respectively.Examples of the parallel input/parallel output sorters arethe network sorters proposed by Batcher [I1]. These networksinclude the merge sorter and the bitonic sorter. Both sortersrequire 0(N * (log2 N)2) comparators and 0((10g2 N)2) timeunits. Muller and Preparata [5] have shown that sorting timesof the order 0(0og2 N) are possible for parallel input/paralleloutput sorters if 0(N2) comparators are used. It is interestingthat the fastest sorter is also the most efficient.Since practical data can be stored in either primary or sec-ondary storage, additional time may be necessary to put the data into suitable form for input to the sorter and/or to takeoutput from the sorter and place it in storage. 分拣机英文文献及中文翻译:http://www.lwfree.cn/fanyi/20190308/30880.html