Cloud-Optimized Networks

 

Moving big data to the cloud gives researchers access to high-end platforms, often at a lower cost than buying infrastructure but one must consider when you’re putting data in the cloud, where is the data originating, and how it is transported, stored, managed, and utilized? BioIT World is proud to launch the Inaugural Cloud-Optimized Networks meeting where unique perspectives and solutions from a variety of researchers, bioinformaticists, architects, engineers, software developers, IT managers, and systems administrators are shared. This team of scientists will present cases creating the common ground to explore how to best transport, store, manage, and utilize big data in the cloud with the goal of turning data into knowledge.

Day 1 | Day 2 

Tuesday, September 11

7:30 am Registration and Morning Coffee


Big Data in Industry 

8:15 Chairperson’s Opening Remarks

Joe Mambretti, Director, International Center for Advanced Internet Research, Northwestern University


» Kick-Off Keynote Presentation 

8:30 Extreme Data at eBay

Scott PrevostScott Prevost, Ph.D., VP, Search, Data, and Buyer Experience, eBay, Inc.

This presentation will take you on a tour of eBay’s extreme data, and explain how data drives product, engineering, and customer insights at eBay. Dr. Prevost will also offer insights into the steps an organization goes through as it adopts big data systems, and explain how to ready your organization for storing, processing, and driving its business with data.

Annai Systems9:00 Challenges in Petabyte Scale Genomics Networks

Dan Maltbie, CTO, Annai Systems
Dan Maltbie will be presenting an overview of the challenges presented by emerging petabyte scale networks supporting   genomic  research and emerging clinical requirements.  Dan will discuss very recent Annai experiences in the deployment, management, transport, storage, and collaborative analysis  of emerging large scale data in genomics research consortia.

Aspera9:30 Moving Big Data to and from the Cloud at High Speeds

Jay Migliaccio, Director, Cloud Platforms & Services, Aspera on Demand
While Cloud computing platforms provide an environment for productive and innovative scientific research, they also create a few challenges.  By design the cloud is far from everyone, so one of the biggest challenges is in actually moving big data in and out of the cloud. Come to this talk to learn how Aspera solutions are helping the scientific community transform their research by moving large volumes of next-gen sequencing and research data at high-speed over vast distances, as well as into and out of the cloud.

10:00 Coffee Break in the Exhibit Hall with Poster Viewing


Supporting Data Intensive Life Science Research 

10:30 21st Century High-Performance Environments for Data Intensive Knowledge Discovery

Joe MambrettiJoe Mambretti, Director, International Center for Advanced Internet Research, Northwestern University

Increasingly, scientific research requires gathering, analyzing, and transporting extremely large volumes of data among multiple locations worldwide, with requirements far exceeding the capacity of commonly implemented facilities. In response, researchers are designing and implementing advanced knowledge discovery environments that can be extended to the petascale level. These advanced facilities are being developed to exclusively support large-scale data-intensive scientific research, including macro bioinformatics. Beyond providing significant capacity, these facilities support dynamic and complex work flows required by a variety of scientific communities, with highly flexible control capabilities, including those related to high performance, advanced programmability, dynamic provisioning, support for specialized protocols, and extensions to computational cloud facilities designed specifically for data intensive science.

11:00 Deploying a Hybrid High-Performance Computing and Cloud Infrastructure for Life Sciences Research

Ron HawkinsRon Hawkins, Director, Industry Relations, San Diego Supercomputer Center, University of California, San Diego

The increasingly data-intensive nature of life sciences research demands access to both high-performance computing (HPC) and storage resources in order to achieve scientific breakthroughs. This presentation will describe the San Diego Supercomputer Center’s experience in leveraging technologies from the HPC and Cloud domains to support life sciences research at the University of California and beyond. The benefits of combining cloud storage with other technologies to create a high performance storage hierarchy will be discussed, as well as issues associated with data transport, management, storage, and utilization in the hybrid environment.

11:30 Practical Supercomputing in the Cloud for Genentech Research

Hubert Pun, Senior Systems Engineer, Genentech, Inc.

Hear our story for how we successfully completed a proof of concept project to utilize cloud resources for burst capacity for high-performance computing. Our HPC workloads included molecular dynamics simulations and protein engineering modeling.

12:00 pm Close of Session

12:15 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own


Big Data Transport 

2:00 Chairperson’s Remarks

Chandra Krintz, Ph.D., Professor, Department of Computer Science, University of California, Santa Barbara

2:05 Strategies and Tools for Big Data Transport across Wide Area Networks at Hundred-Gigabit Speeds

Michael SullivanMichael Sullivan, M.D., Associate Director, Health Sciences, Internet2

Life science researchers can leverage work from the physics and networking communities to move Big Data across wide area networks at gigabits per second. While cloud computing may reduce the need for data transport, many scenarios still require moving sequence data to the cloud or downloading large genomics databases. Genomics centers can use performance monitoring, troubleshooting tools, a dedicated data transfer node, segregated security policies, and advanced networking to enable ultra high-speed data transport. The Internet2 Network backbone is being upgraded to 100 Gbps, and new networking technologies like software-defined networking (e.g. OpenFlow) and advanced layer 2 services will support highly reliable and flexible connectivity between researchers around the world.

Life in the Fast Lane: Genomes Race to the Cloud featuring Michael Sullivan and Steven Simms 

Click to Download 

 

2:35 Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network

Kurt Seiffert, Manager, Research Storage, Indiana University

Managing the profusion and accumulated volumes of life-science data is cumbersome; transferring them can require anything from shipping a hard drive to paying a graduate student to babysit transfers. Indiana University’s Data Capacitor solves this problem by exporting a high-performance Lustre file system across wide area networks to multiple locations. A mounted file system lets researchers run simple and familiar commands without having to contend with special tools for data transfer. Moreover, multiple mounts let researchers compute against their data from anywhere. To meet the insatiable bandwidth demands of life scientists, network infrastructure providers are increasingly offering 100 Gigabit circuits. IU recently used Lustre across a 100 Gigabit network spanning 2,300 miles to demonstrate application performance across a great distance. This presentation will describe the Data Capacitor cyber infrastructure and associated work, explore future use cases applicable to bioinformatics, and explain how the National Center for Genome Analysis Support (NCGAS) at Indiana University intends to integrate the Data Capacitor into their workflows.

3:05 Refreshment Break in the Exhibit Hall with Poster Viewing


Big Data Processing and Analytics 

3:45 AppScale Platform-as-a-Service (PaaS): Simplifying the Use of Cloud Fabrics

Chandra KrintzChandra Krintz, Ph.D., Professor, Department of Computer Science, University of California, Santa Barbara

In this talk, we will overview the challenges of using extant cloud infrastructures and services for high performance computing and large-scale data analytics. We will then describe how our open-source platform-as-a-service (PaaS) offering, called AppScale addresses these challenges to simplify the use of and to ease the transition to cloud use by applications developers. We will also show how our PaaS system facilitates portability across cloud systems to avoid “lock-in” to any single public cloud vendor while enabling use of different cloud fabrics and services with little or no application modification.

4:15 Hadoop, HBase, and Big Data Analytics in Bioinformatics

Ronald TaylorRonald Taylor, Ph.D., Research Scientist, Computational Biology & Bioinformatics Group, Pacific Northwest National Laboratory (U.S. Dept. of Energy/Battelle)

Basic concepts behind Apache Hadoop and HBase, as well as associated open source software projects in the Hadoop ecosystem, will be described. An overview of current usage within the bioinformatics community will be given, focusing on next-generation sequencing. Also, a case study of a biological data warehouse in HBase on a large cluster using the Lustre file system (with loss of data locality) and the current status of Hadoop-based cloud applications in the new Dept. of Energy sysbio knowledgebase will be presented.

4:45 Experimentation on Cloud Databases to Handle Genomic Big Data

Abraham GomezAbraham Gomez, Researcher in Computer Science and Java Developer, École de Technologie Supérieure (ÉTS), University of Quebec

Now we are confronted with a twofold problem: the current rate of information growth and the processing of this information. Cloud computing databases seems to be the solution, and developments in open source software, e.g. the Hadoop project and HBase, provides a completely platform for deal with petabyte scale data. On this path the first task probably will be data migration, therefore, this presentation explores a novel approach that allows the migration from relational databases into cloud computing databases.

5:15 Welcome Reception in the Exhibit Hall with Poster Viewing

6:15 Close of Day



Day 1 | Day 2 




Premier Sponsors

Annai Systems

Aspera 
 

Cycle Computing
 

DNAnexus
 

IBM 

Official Media Partner

Bio-IT World 

View All Sponsors 

View Media Partners 

Cloud Usage Study 

* IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.