High Performance Computing
September 11-12
Faster and more efficient computing is more and more important as we are facing the “big data” problem. Generations of supercomputing experts use innovative designs and parallelism to achieve superior computational peak performance. This track will feature speakers from the creator of the Condor system, to the hardware designer who knows the architectural design of the fastest computer in the world – the K computer, as well as presentations on the smart algorithms and software that enable data-intensive computing in life science.
Day 1 | Day 2
Tuesday, September 11
7:30 am Registration and Morning Coffee
8:15 Chairperson’s Opening Remarks
Kevin Davies, Ph.D., Editor-in-Chief, Bio-IT World
» Keynote Presentation
8:30 High-Throughput Computing with Cloud Resources
Miron Livny, Ph.D., Professor, Computer Sciences Department, University of Wisconsin
Since the mid 80’s, the Condor project has been engaged in supporting the high-throughput computing (HTC) needs of scientific and commercial applications. Universities, research laboratories and enterprises have adopted the Condor distributed resource management system to harness all available capacity of large, dynamic and heterogeneous collections of computing resources. The computing capabilities of clouds are a natural fit to the HTC model and are therefore used by a growing number of our users to increase their computational throughput.
|
9:00 K Computer and Its Application to Life Sciences
Makoto Taiji, Ph.D., Director, Computational Biology Core, RIKEN Quantitative Biology Center; Team Leader, Processor Research Team, RIKEN Advanced Institute for Computational Sciences
The K computer is the fastest high-performance computer in the world with 10-PFLOPS nominal peak performance. We are developing the software for computational life science including molecular simulations, drug developments, cellular simulations, brain simulations, and bioinformatics. The application plans in life science field will be discussed.
9:30 Gordon - A Flash-based Supercomputer for Data Intensive Computing
Robert Sinkovits, Ph.D., Gordon Applications Lead, San Diego Supercomputer Center, University of California San Diego
The Gordon system at the San Diego Supercomputer Center was designed from the ground up to solve data and memory intensive problems. Each of Gordon’s 1024 compute nodes contains two Intel Sandy Bridge octo-core processors and 64 GB of memory. The nodes are connected via a dual-rail 3D torus network based on Mellanox QDR Infiniband hardware and can access a 4 PB Lustre-based parallel file system capable of delivering up to 100 GB/s of sequential bandwidth. Two novel features of Gordon though make it particularly well suited for data intensive problems. To bridge the large latency gap between remote memory and spinning disk, Gordon contains 300 TB of high performance Intel 710 series solid-state storage. Gordon also deploys a number of “supernodes”, based on ScaleMP’s vSMP foundation software, which can provide users with up to 2 TB of virtual shared memory. This talk will cover the Gordon architecture, our motivation for building the system, and a summary of recent success stories on Gordon spanning a number of domains.
10:00 Coffee Break in the Exhibit Hall with Poster Viewing
10:30 Introduction of Massively Parallel Computing Applications on TH-1A System
Nan Li, Ph.D., Professor, Vice Dean, School of Computing, National University of Defense Technology, China
TH-1A is China’s first petaflops supercomputer and is now installed in the National Supercomputing Center in Tianjin, which was ranked No.1 on the TOP500 list released in Nov. 2010. TH-1A system adopts a hybrid architecture of heterogeneous integration of CPU+GPU and self-intellectual high-speed interconnection system, and shows distinguished system usability, performance stability and application scalability in quite a few high performance computing areas, provides an important platform for science research and technology innovation. In this presentation, several large-scale application tests on TH-1A are introduced, such as the oil seismic data processing, aircraft flow field simulation, biomolecular dynamics simulation, magnetic confinement fusion numerical simulation, turbulent flow simulation, crystal silicon molecular dynamics simulation, fully implicit simulation of global atmospheric shallow wave, and heat flow simulation of earth’s outer core. These results show that TH-1A has good parallel efficiency and scalability in practical applications.
11:00 Panel Discussion: Applications of Supercomputers in Life Sciences
Moderator: Kevin Davies, Ph.D., Editor-in-Chief, Bio-IT World
Panelists:
Miron Livny, Ph.D., Professor, Computer Sciences Department, University of Wisconsin
Makoto Taiji, Ph.D., Director, Computational Biology Core, RIKEN Quantitative Biology Center; Team Leader, Processor Research Team, RIKEN Advanced Institute for Computational Sciences
Nan Li, Ph.D., Professor, Vice Dean, School of Computing, National University of Defense Technology, China
Robert Sinkovits, Ph.D., Gordon Applications Lead, San Diego Supercomputer Center, University of California San Diego
12:00 pm Close of Session
12:15 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own
2:00 Chairperson’s Remarks
D Akira Robinson, Ph.D., Consulting Computer Scientist,Neuro-Epigenomics.Com
2:05 RDMA -- A Concept for Low-Latency, High-Throughput Data Movement
Robert D. Russell, Ph.D., Associate Professor, University of New Hampshire InterOperability Laboratory
Remote Direct Memory Access (RDMA) transfers data without operating system intervention directly between the virtual memories of processes on different network nodes. RDMA protocols avoid the extra data copying of traditional TCP/IP and UDP/IP, resulting in low latency for short messages, high bandwidth for large messages, and low CPU utilization for both. This talk gives a brief introduction to the 3 current RDMA technologies (InfiniBand, RoCE, iWARP), presents some performance measurements, and discusses examples of how and where RDMA is being used in HPC today.
2:35 Understanding Cancer: Trillions of Data Points, Petabytes of Storage
Gary Stiehr, Group Leader, Information Systems, The Genome Institute, Washington University
Recent advances in DNA sequencing technologies have dramatically changed the scale at which we can analyze and understand individuals at a genetic level--and that’s making a huge impact on cancer research. To enable these discoveries, however, it has been essential to leverage High Performance Computing technologies. These projects involve trillions of data points moving through sophisticated bioinformatics pipelines and require petabytes of high performance storage and thousands of CPU cores. We’ll discuss the challenges faced in such an environment along with a few approaches to handling those challenges.
3:05 Refreshment Break in the Exhibit Hall with Poster Viewing
3:45 Supporting Large Scale Data Access and Analysis through Shared Cloud Resources
Weijia Xu, Ph.D., Center for Computational Biology & Bioinformatics, University of Texas Austin
Data intensive computing tasks present different challenges and requirements than computational intensive tasks. In this talk, I will give an introduction on high performance computing resources at Texas advanced computing center. I will present how data intensive computations can be supported by provisions of dynamic cloud environment using existing high performance computing clusters through couple of ongoing projects.
4:15 From an Algorithm to the Spreadsheet into the Cloud
Hans-Henning Gabriel, Data Scientist, Datameer
The Apache Hadoop project has become an important tool for data analytics. Utilizing the MapReduce paradigm, it enables scientists to parallelize their computations on a large cluster of inexpensive machines and scale on demand. This talk will explain what the MapReduce paradigm is and how it can directly be applied to tackle a biological challenge. We will show a simple way to execute a computationally intense application in the cloud on demand, through a generic spreadsheet approach that hides the complexity of parallelizing algorithms. This allows researchers to concentrate on their research, rather than on the IT and analytics infrastructure traditionally associated with data analysis.
4:45 Rapid False-Discovery Rate Estimation for Bioinformatics Data
Mark Seligman, Principal Investigator, Insilicos LLC
The problem of multiple comparisons poses significant challenges to bioinformatics practitioners. Efforts to compensate can be so conservative as to severely constrain large studies. False-discovery rate estimation is one alternative, but the more powerful versions of this approach impose heavy computational costs. For the case of linear regression, we demonstrate significant acceleration of these methods using GPUs.
5:15 Welcome Reception in the Exhibit Hall with Poster Viewing
6:15 Close of Day
Day 1 | Day 2
*IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.