Advanced Computer Architecture Kai Hwang Notes

1 view
Skip to first unread message

Miqueo Snyder

unread,
Aug 5, 2024, 10:47:17 AM8/5/24
to faltorstalpnet
Thisdocument provides an overview of advanced computer architecture topics related to parallelism, scalability, and programmability. It covers parallel computer models including shared-memory multiprocessors, distributed-memory multicomputers, vector supercomputers, and theoretical models. It also discusses program and network properties, principles of scalable performance, and hardware technologies to support parallelism.Read less

Instructor: Ali Akoglu


Office: ECE 356B


Office Hours: Tuesdays 11:00 AM - 12:00 PM Thursdays 3:00 PM - 4:00 PM or by appointment


Scope: ECE 569 stresses the need for and the design of high-performance computing (HPC) systems. HPC is more than just for achieving high performance - it is a compelling vision for how computation can seamlessly scale from a single processor to virtually limitless computing power. The single enabling force for HPC is the use of parallelism. The market demands general-purpose processors that deliver high single threaded performance as well as multi-core throughput for a wide variety of workloads on client, server, and high-performance computing (HPC) systems. This pressure has given us almost three decades of progress toward higher complexity and higher clock rates. This progress hasn't always been steady. Intel cancelled its "Tejas" processor, which was rumored to have a 40-stage pipeline, and later killed off the entire Pentium 4 "NetBurst" product family because of its relative inefficiency. The Pentium 4 ultimately reached a clock rate of 3.8 GHz in the 2004 "Prescott" model, a speed that Intel has been unable to match since. In the more recent Core 2 (Conroe/Penryn) and Core i7 (Nehalem) processors, Intel uses increased complexity to deliver substantial performance improvements over the Pentium 4 line, but the pace of these improvements is slowing. Each new generation of process technology requires ever more heroic measures to improve transistor characteristics; each new core microarchitecture must work disproportionately harder to find and exploit instruction-level parallelism (ILP). As these challenges became more apparent in the 1990s, CPU architects began referring to the "power wall," the "memory wall," and the "ILP wall" as obstacles to the kind of rapid progress seen up until that time. New commodity parallel computing devices, bring the originally elite high performance computing into the reach of general public. To program and accelerate applications on the new high performance computing devices, we must understand both the computational architecture and the principles of program optimization. Throughout the course we will study state of the art processor architectures such as the IBM CELL BE, Nvidia Tesla GPU, Intel Larrabee Microarchitecture and Intel Nehalem microarchitecture. We will then discuss the trends in cluster computing and cluster based systems. We will study parallel algorithm design and programming issues for such systems. We will evaluate power, memory and ILP challenges from the perspectives of Programming Model, Computational Model, Processor Architecture Model, Threading Model, Memory Model and Power Model. Therefore, this course will provide students with an in-depth analysis of these current issues in HPC systems including: (1) Parallel Computing (2) New Processor Architectures, (3) Power-Aware Computing and Communication, (4) Advanced Topics on Petascale Computing and Optical Systems. In addition, we will also study parallel models of computation such as dataflow, and demand-driven computation. While there are no specific prerequisites for the course, the students are expected to be well versed with basics of uniprocessor computer architecture and digital logic.

Textbook: No required text books. Following books are recommended and will help your projects.

[1] "Highly Parallel Computing", by George S. Almasi and Alan Gottlieb

[2] "Advanced Computer Architecture: Parallelism, Scalability, Programmability", by Kai Hwang, McGraw Hill 1993

[2] "Parallel Computer Architecture: A hardware/Software Approach", by David Culler Jaswinder Pal Singh, Morgan Kaufmann, 1999.

[3] "Scalable Parallel Computing", by Kai Hwang, McGraw Hill 1998.

[4] "Principles and Practices on Interconnection Networks", by William James Dally and Brian

Towles, Morgan Kauffman 2004.

[5] GPU Gems 3 --- by Hubert Nguyen (Chapter 29 to Chapter 41)

[6] Introduction to Parallel Computing, Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, 2nd edition, Addison-Welsey, 2003.

[7] Petascale Computing: Algorithms and Applications, David A. Bader (Ed.), Chapman & Hall/CRC Computational Science Series, 2007.


2. Parallel Programming with CUDA

a) Processor Architecture, Interconnect, Communication, Memory Organization, and Programming Models in high performance computing architectures: (Examples: IBM CELL BE, Nvidia Tesla GPU, Intel Larrabee Microarchitecture and Intel Nehalem microarchitecture)

b) Memory hierarchy and transaction specific memory design

c) Thread Organization


Assignments

Students will gain experience with leading-edge performance analysis tools, cycle-accurate hardware simulators, and dynamic program instrumentation systems to examine the operation of next-generation applications on modern hardware. Students will have programming assignments to evaluate and compare the architectural features of the state of the art high performance commodity hardware platforms.


Semester project will involve 2 phases:

o During the first half of the course, students will:

o Propose a project on a selected topic taught in class,

o Document their survey by reporting existing solutions,

o Tackle a problem and propose their solution,

o Present their initial findings and solution strategy

o During the second half of the course, students will:

o Implement their proposed approach,

o Put together a paper quality document with experimental results,

o Present project findings


o Course will have 2-4 assignments, 1 mid-term examination, a semester project

o No late assignments will be accepted, except under extreme non-academic circumstances discussed with the instructor at least one week before the assignment is due.

o Make-ups for assignments and exam may be arranged if a student's absence is caused by documented illness or personal emergency. A written explanation (including supporting documentation) must be submitted to your instructor; if the explanation is acceptable, an alternative to the graded activity will be arranged. When possible, make-up arrangements must be completed prior to the scheduled activity.

Any extenuating circumstances that have an impact on your participation in the course should be discussed with your instructor as soon as those circumstances are known.

o Inquiries about graded material have to be turned in within 3 days of receiving a grade.

o Approximate weight of each assignment will be specified when the assignment is handed out. Assignments will be due in class on the due date.

o The instructor reserves the right to modify course policies, course calendar, course content, assignment values and due dates, as circumstances require.

o Students are strongly encouraged to attend the class. Lecture notes are intended to serve as a supplement and not as a substitute for attending class.

o You are encouraged to discuss the assignment specifications with your instructor and your fellow students. However, anything you submit for grading must be unique and should NOT be a duplicate of another source. The Department of Electrical and Computer Engineering expects all students to adhere to UofA's policies and procedures on Code of Academic Integrity. studpubs/policies/cacaint.htm


This course introduces students to computer architecture and covers topics of computer organization, microprocessors, caches and memory hierarchies, I/O, and storage. The course gives an in-depth study of microprocessor issues such as pipelining, out-of-order processors, branch prediction, instruction level parallelism, thread-level parallelism and cache coherency. Issues of performance, energy and security are raised, along with introduction to processor benchmarking. Select readings from current academic literature augment course textbook and lecture notes. Course also includes projects which introducing students to architectural simulators and the process of evaluation of different designs with benchmarking programs. Final semester project uses simulation (or FPGA for students with more hardware interest and background) and focuses on design, implementation, and evaluation of a processor architecture modifications which students propose.


Collaboration policy: Collaboration can be a great learning tool, so students are encouraged to study together and help each other out. However, unless otherwise stated, all work is individual. Do not copy other's homework or code. Violations of this policy will not be tolerated and referred to the Dean.


Attendance:You are responsible for all the material covered in class. The course covers materials that may not all be in the textbook or printed handouts, so attendance is crucial for good performance in the course. If you miss a class, please get a Dean's excuse and make an appointment with the instructor to go over the material that was in the missed lecture.


Special Accommodations:If you require any special accommodations please notify the instructor as soon as possible. This includes any religious practice which may interfere with completion of a scheduled examination, project or homework. Please contact instructor early on in the course to arrange a meeting where we can plan for any needed accommodations.


Academic integrity is a core institutional value at Yale. It means, among other things, truth in presentation, diligence and precision in citing works and ideas we have used, and acknowledging our collaborations with others. In view of our commitment to maintaining the highest standards of academic integrity, the Graduate School Code of Conduct specifically prohibits the following forms of behavior: cheating on examinations, problem sets and all other forms of assessment; falsification and/or fabrication of data; plagiarism, that is, the failure in a dissertation, essay or other written exercise to acknowledge ideas, research, or language taken from others; and multiple submission of the same work without obtaining explicit written permission from both instructors before the material is submitted. Students found guilty of violations of academic integrity are subject to one or more of the following penalties: written reprimand, probation, suspension (noted on a student's transcript) or dismissal (noted on a student's transcript).

3a8082e126
Reply all
Reply to author
Forward
0 new messages