Data-Focused Cloud Applications

September 12-13

Leveraging the advantage of distributed, on-demand, virtually limitless Cloud computing resources, scientists are able to bring big data applications to everyone's fingertip. Mobile applications are being integrated into drug discovery and healthcare processes like never before. We are witnessing a shift to truly democratize big data analytics, and enable ever-increasing amount of data capture at the same time. This track showcases cloud-based applications that facilitate collaboration, translational research, and big data analysis in life sciences while maintaining data security.

Day 1 |Day 2

Wednesday, September 12

12:30 pm Conference Registration

Collaborations for Data-Intensive Applications

2:00 Chairperson's Opening Remarks

Chris Smith, Technical Director, Distributed Bio LLC


»Keynote Presentation

2:05 Using the Cloud for Rapid Prediction of Protein Thermodynamics and Kinetics

Vijay PandeVijay Pande, Ph.D., Professor, Chemistry, Structural Biology, Computer Science; Director, Biophysics Program; Director, Folding@home Distributed Computing Project

I will present two new sampling schemes that, especially when coupled to cloud computing resources, can dramatically speed calculations. First, I'll talk about our approaches for conformational search, especially relevant for predicting peptide dynamics and thermodynamics as well as small-scale conformational changes in proteins. Next, I will talk about our approaches for rapid search of chemical space.


PodcastFolding in The Cloud featuring Vijay Pande

Click to Download

2:35 Doing Genomics in a Traditional HPC Environment

Cornelis Victor JongeneelCornelis Victor Jongeneel, Ph.D., Director, Bioinformatics & Biomedical Informatics, University of Illinois Urbana

The National Center for Supercomputing Applications (NCSA) is an NSF-funded facility that has deployed several large-scale computers including Blue Waters, which will soon be the largest academic computational facility in the country. NCSA computers have been designed primarily to serve traditional NSF users, who run large, computationally intensive jobs. I will discuss our experiences, both good and bad, in using such an environment for doing computational genomics and serve a diverse community of biomedical scientists.

Complete Genomics3:05 Building a Centralized Cloud Solution to Advance the Use of Whole Human Genome Sequencing Information

Mirko Buholzer, Director, Cloud Applications, Complete Genomics

As a provider of whole human genome sequencing services, Complete Genomics is leveraging cloud infrastructure to enable researchers to store, share, transfer, analyze and manage terabytes of genomic data. We will focus on the major issues of controlling access to data, efficiently moving large data sets and easily managing large collaborative projects over time.

3:20 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 Construction of a Knowledge-Base Index System for Human Embryonic Stem Cells and Their Derivates via High-Throughput Computing

Victor RuottiVictor M. Ruotti, Computational Biologist, Morgridge Institute for Research

The ability to create a meta index (knowledge base) of tissues and cell types from the human body will allow biologists to get a better understanding on how transcription factors unlock the full potential of stem cells.

4:30 R in Big-Data Environments

Nachum Shacham, Ph.D., Data Scientist, eBay

Modern massively parallel processing (MPP) platforms like Hadoop and RDBMS, make Petabytes of structured and semi-structured records readily available for analysis by hundreds or thousands of processors. Running R on such platforms enables the processing of much larger datasets than by the traditional single-server, RAM-based operation. We'll describe our experience in using R to process big data on Teradata Enterprise Data Warehouse and on a large Hadoop cluster. We'll review the challenges of running R in conjunction with these platforms and describe methods for accomplishing this task. The teradataR package which enables a PC-based R to use the warehouse resources to run statistical functions, e.g., correlation, and parametric and nonparametric tests on warehouse-resident tables, while minimizing data transfer will be described. We'll also review our experience in using R for map- reduce processing of unstructured text as well as tabular data on a 1000-node Hadoop cluster. Finally, we'll review a case study, implemented in R, of comparative analysis of the cost of running identical jobs on Teradata and Hadoop.

5:00 Building a World-Wide Data Sharing Platform for Neuroinformatics Research

Chris SmithChris Smith, Technical Director, Distributed Bio LLC

The International Neuroinformatics Coordinating Facility(INCF) develops collaborative neuroinformatics infrastructure and promotes the sharing of data and computing resources to the international research community. The INCF Dataspace (IDS) is a world-wide data federation based on iRODS, and designed to enable data sharing for the individuals, research groups, and organizations that make up the membership of the INCF, whether a collaboration is between individuals or INCF-wide. A uniform set of policies is intended to enable the easy sharing of data, while allowing individual sites to exercise control over their own data resources. This talk will describe the design of the IDS, and how iRODS enables the desired features for both dissemination and control.

5:30 Close of Session

Day 1 |Day 2

Premier Sponsors

Annai Systems


Cycle Computing



Official Media Partner

Bio-IT World 

View All Sponsors 

View Media Partners 

Cloud Usage Study 

* IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.