Poster Track

TeraGrid'10 science and technology posters present new results or promising works in progress dealing with the use of the TeraGrid for scientific research and/or the development of new technologies for scientific computing.

Robyn Evans and JerNettie Burney. A Comparative Analysis of Localized Command Line Execution, Remote Execution through Command Line, and Torque Submissions od Matlab Scripts for the Charting of CReSIS Flight Path Data
Abstract: The Polar Grid team was tasked with providing the Center for the Remote Sensing of Ice Sheets (CReSIS) with data that would allow signal processing through the CReSIS Synthetic Aperture RADAR Processor (CSARP) to utilize clustered computing resources without the need of MATLAB’s® proprietary Distributed Computing Environment. This research centered on the use of MATLAB® through command line, and scripted distribution through TORQUE high performance computing scheduling.
The team used flight path information from the Greenland 2007 field deployment. This data was imported into MATLAB® so that they could be converted from text files into actual MATLAB® script files. With these MEX files, the team was able to create a script within MATLAB® that could plot the flight path data into a graph with the axes of the graph being labeled latitude for the x-axis and longitude for the y-axis.
The team took the master script for the creation of the chart and ran jobs through the command line of MATLAB® to Madogo [Elizabeth City State University’s Cluster] and Quarry [Indiana University’s Cluster]. The team was then able to compare execution times from the jobs of Madogo versus Quarry. A second comparison was then tested with TORQUE job submission versus MATLAB® submission to see which performed with greater efficiency. Lastly the average execution times of all three data sets were statistically compared with a 5% significance level to determine if there was a statistically significant difference between the use of command line jobs verses TORQUE submissions. The paper focuses upon the procedure used in order to complete the research along with the conclusion reached.

Margaret Wismer. Simulation of bone-conducted sound pathways to the outer and middle ear

Download the poster(PDF)
Abstract: A project to determine bone conducted sound pathways to the inner ear has been sponsored by the Air Force in an effort to reduce hearing loss in personnel working in high noise environments. The study includes a computer simulation, in 3D, of an acoustic pulse wave propagating through and around a human skull. Numerical 3D simulations are benchmarked with known analytic results in order to verify accuracy. The pulse center frequency can be varied in order to identify how frequencies affect the vibrations of the skull. The program is used to validate an experimental mapping of the skull in which the response in the inner ear, with and without hearing protection, is measured as function of transducer input location at different external head locations and center frequency. An FETD algorithm, written in MPI and executed on the Ranger HPC at TACC, is used to achieve these results efficiently. The mesh, on which the code operates, is a uniform grid of 8 node brick elements. Thus the program can directly operate on any digitized image (in 2D) or a list of digitized images (in 3D) in which each image is a single slice of a 3D volume.

Michael Braden. Membrane-Solvated Molecular Dynamics on Neurotransmitter Transporter Homology Models Using the TeraGrid

Download the Poster(PDF)
Abstract: Neurotransmitter transporters are considered an important target for the detection and/or treatment of several neuropathologies, including neurodegenerative diseases, depression, epilepsy, and ADHD, amongst other conditions. No empirically derived structure of a mammalian transporter from this protein family has been determined. The X-ray crystal structure of a related bacterial amino acid transporter was recently elucidated and numerous groups, including ours, have used this template to thread homology models. While these proteins are all expected to share a significant amount of 3D structural similarity, their sequence identity is not always so high. Energy minimization and molecular dynamics simulations may allow the homology model structure to settle into a lower energy and hopefully more accurate reflection of a single state of the protein being studied. As these proteins are membrane bound, it is most relevant to run these simulations in a simulated membrane environment. Similar work has been done for the serotonin transporter by another lab. The use of TeraGrid facilities for this project has allowed for energy minimization and molecular dynamics simulations on a human norepinephrine neurotransmitter protein homology model structure in a simulated explicit membrane environment. This startup allocation allowed for a single production run molecular dynamics simulation to be performed for 15 simulated nanoseconds on a system of 170,317 atoms. Protein stability was monitored by protein backbone RMSD and ionic/hydrogen bond lengths of residues believed to be involved in coordination of one of the bound sodium atoms. Apparent equilibrium seems to have been reached after ca. 7.5 ns simulated time. Sodium coordination remained stable over this time. Experience from this TeraGrid Pathways Fellowship in how to utilize the TeraGrid has and will be passed on to support the burgeoning research computing at the University of Montana. Future TeraGrid Research Allocation requests for this project are anticipated as well.

Dana Rowland and David Toth. A Computational Approach to Knotting in Complete Graphs

Download the poster(PPTX)
Abstract: Take a piece of string, stretch it, tangle and twist it up, and then attach the ends together. The result is a mathematical knot. Two such knots are considered to be equivalent if one can be deformed and stretched into the same position as the other, without cutting the string. If the “knot” is just an unknotted circle, we say it is the trivial knot. The simplest non-trivial knot is a trefoil knot, shown in Figure 1. Determining when two given knots are equivalent is one of the central problems in knot theory.
A graph is a set of vertices, where some of the vertices are connected to each other by edges. A complete graph is a graph where every pair of vertices is connected by an edge. An embedding of a graph is a particular way to connect the vertices in three dimensional space. A Hamiltonian cycle is a closed loop in the graph that passes through each vertex exactly once and returns to the starting vertex. Each Hamiltonian cycle in an embedding gives us a mathematical knot. Figure 2 illustrates such a 3-dimensional graph; the gaps, or crossings, denote where one edge passes “over” another one. Note that changing which edge is on top at a crossing gives a different embedding of the graph, and can change the knot type of a Hamiltonian cycle that contains that crossing.
Figure 1 - A Trefoil Knot
Figure 2 - An Embedding of the Complete Graph on 9 Vertices
There are 20,160 Hamiltonian cycles in Figure 2. Some of them, such as the roundtrip path which follows the vertices in clockwise order, can be deformed into an unknotted circle. Other paths, however, cannot be untangled. In fact, work of Conway and Gordon [1] implies that no matter how one initially connects the pairs of points in a complete graph on nine vertices, at least eight of the resulting Hamiltonian cycles are knotted. However, no known example of an embedding realizes this lower bound.
We are interested in finding an embedding of the complete graph on nine vertices that is minimally knotted. We are searching for the minimum number of knotted cycles which have particular properties; for example, we conjecture that every embedding of a complete graph with nine vertices contains at least one knot that is more complicated than the trefoil knot. Checking all of the embeddings obtained by changing crossings requires significant computing power which is far beyond what is available on our campus. We have been using the TeraGrid to test embeddings and have checked 236 embeddings, all of which have contained a non-trivial, non-trefoil knot, which supports our conjecture. This has been accomplished in a fraction of the time it would have taken to do so with the compute power on our campus. We will continue to use the TeraGrid to explore the minimum number of knotted cycles. References
[1] John H. Conway and Cameron Gordon. Knots and links in spatial graphs. J. Graph Theory, 7(4):445-453, 1983.

Sophia White and Peter P. Gaspar. Computational Investigation of Four Valence Electron Silicon Reactive Intermediates

Download the poster(PPT)
Abstract: Silyliumylidenes (R-Si:+), four-valence electron reactive intermediates, are investigated. These species have two nearly degenerate LUMOs in addition to their HOMO and this gives them the potential of forming one, two or three bonds, step-wise or concertedly.
One route to the generation of these low electron valence reactive intermediates utilizes the stable 2,3-benzo-1,4,5,6,7-pentaphenyl-7-silanorbornadiene structure, first generated by Müller and coworkers, as a precursor. Upon fragmentation of the silanorbornadiene, induced by hydride abstraction from the bridging silicon, a 1,2,3,4-tetraphenyl naphthalene (TPN) and “Ph-Si:+” may be formed. Borates, containing one or more fluorine atoms, are considered as potential trapping agents due to the strong Si-F bonds that ultimately result.
Stabilities, mechanisms and reactivities are examined computationally at the B3LYP/6-31G(d) level of density functional theory. The trapping reaction of Ph-Si:+ with BF4- is predicted to be a feasible process with a modest (17.4 kcal/mol) activation barrier as a stepwise reaction. Thus it is predicted that silyliumylidenes may become valuable synthetic reagents.

Wen Duan and Shanzhong Duan. Parallel Implementation and Code Development of Efficient Multibody Algorithm for Motion Simulation of Molecular Structures on TeraGrid Supercomputers
Abstract: Abstract for Student Research Poster There are two major computational costs associated with computer simulation of atomistic molecular dynamics. They are calculation of the interaction forces and formation/ solution of equations of motion. In this research, an efficient parallelizable algorithm will be introduced and implemented for formation/solution of equations of motion and calculation of interaction forces associated with multibody molecular systems on TeraGrid supercomputers. The algorithm is based on placing hard constraints on bonds with high frequency vibration modes in a molecular structure as shown in Fig. 1 (a) so that a multibody model of the molecular system can be formed as shown in Fig. 1 (b). Then motion simulation methods and joint cut techniques for multibody dynamical systems at macro level can directly apply to the micro molecular structures as shown in Fig. 1 (b). After joint cutting, constraint forces appear between adjacent subsets as shown in Fig 1 (c), which makes the reduced molecular system behavior like the original system. Each subset in Fig. 1 (c) then can be assigned to a processor for concurrent simulation as shown in Fig. 1 (d). Thus sequential O(N) method is performed within each processor and parallel computing techniques apply between processor to obtain high computing efficiency (N: total number of subsets).
Figure 1: Development of Model & Parallel Computing Model
Based on the algorithm, Matlab codes have been developed and implemented sequentially for small size systems on a single processor. Based on Matlab codes, the authors will develop C and MPI codes, and implement on TeraGrid supercomputers. C and MPI codes contains four modules. They are basic operation, kinematical calculation, force and kinetic term calculation, and O(N) method modules. Message passing techniques such as buffered message passing will integrated with each module for parallel implementation of the codes in TeraGrid. This research is supported by TeraGrid Pathways Fellowship program. The authors acknowledge and thank financial support from TeraGrid Pathways Fellowship program. The authors also sincerely thank for all consulting services and system support provided by the scientists and experts from TeraGrid.

Abstract: Molecular dynamics (MD) simulation was carried out for the antimicrobial peptide Cecropin P1 C in solution, Cecropin P1 C adsorbed onto silica surface as well as for Cecropin P1 C tethered to silica surface with a polyethylene oxide (PEO) linker. The simulation results show an equilibrium structure consisting of two a helix regions with a sharp bend for Cecropin P1 C in solution consistent with the available structures of other antimicrobial peptides. The percent a helix of the equilibrium structure is higher (40%) for Cecropin P1 in the presence of 0.12 M salt as a result of shielding of electrostatic interactions, compared to Cecropin P1 C (30%) and Cecropin P1 (25%) both in the absence of salt. At higher temperatures there was a loss of a helical content resulting in unfolding of the hinge. The conformation of adsorbed Cecropin P1 C on silica surface indicated a a helix content of around 10% as opposed to a value of around 30% in solution, with the end-to-end distance for the former being around 4 nm compared to a value of 1.5 to 2 nm for the latter. The equilibrium conformation of Cecropin P1 C tethered to silica surface with a PEO linker is found to have even lower a helical content (4.5%) compared to the adsorbed polypeptide (7.8%) though the former was more compact (end to end distance of 2.0 nm) than the latter (end to end distance of 4.1 nm). Tethered Cecropin P1 C is found to have two a helical regions (residues 2 to 8 and 24 to 28) as opposed to adsorbed Cecropin P1 C which is found to exhibit only one a helical region (residues 17 to 25).

Mikhail Sekachev and Kwai Wong. Benchmark solutions for the Incompressible Navier-Stokes Equations Using a Parallel Consistent Splitting Scheme
Abstract: A large scale numerical simulation of a three-dimensional non-steady incompressible fluid flow is performed using one of the schemes in the general class of projection methods - consistent splitting scheme(CSS). This class of projection schemes for incompressible flows is first introduced by Guermond and Shen [1]. Just like pressure-correction and velocity-correction schemes the CSSs are based on a weak form of the pressure Poisson equation, which is decoupled from the momentum equations. However the main advantage of the consistent splitting schemes over the pressurecorrection and the velocity-correction schemes is that the second-order consistent splitting scheme enjoys many desirable properties such as decoupling, unconditionally stability, truly second-order accuracy (for the velocity and the pressure in both L2 and H1 norms), and in most cases the freedom of inf-sup condition [1]. A parallel implementation has been done as described in [2]. Computations have been carried out on the CRAY XT5 peta-scale computing system (Kraken) at the National Institute for Computational Sciences. Two benchmark problems have been used to demonstrate the versatility of the code and perform the scaling analysis. The first one is the lid-driven cavity flow benchmark which describes a flow in a 3-D enclosure with a moving top wall. The second benchmark is the buoyancydriven cavity flow which represents a thermally driven flow in a differentially heated closed reservoir. A CFD code based on collaborative work named Parallel Interoperable Computational Mechanics Simulation System (PICMSS) have been used to perform the calculations [3]. PICMSS is a fully parallel finite-element computational platform for solving CFD problems. It employs Trilinos iterative library for solving the linear systems of equations generated by FEM and is capable of admitting various formulations of fluid flow simulations, directly written in partial differential equation (PDE) form. The verification results based on a lid-driven cavity flow analysis using PICMSS are shown in figure below...
[1] Guermond JL, Shen J. A new class of truly consistent splitting schemes for incompressible flows. J. Comp. Phys. 2003;192(1):262–276.
[2] Kuo YH, Wong KL, Chak-Fu J. Investigation of Taylor-G¨ ortler-like Vortices Using the Parallel Consistent Splitting Scheme. Adv. Appl. Math. Mech. 2009;1(6):799-815.
[3] Wong KL, Baker AJ. A Modular Collaborative Parallel CFD Workbench. The Journal of Supercomputing. 2002;22:45-53.

Shawn Duan and Abdul Muqtadir Mohammed. Efficient Algorithm for Virtual Prototyping of Large-Sized Multibody Dynamical Systems on TeraGrid
Abstract: Despite the great growth in capability of computer hardware, the system size, complexity of structures, and time scales present in virtual prototyping of multibody dynamical systems will continue to challenge the field of computational multibody dynamics for the foreseeable future. In this project funded by TeraGrid Pathways Fellowship, the scientific problems in virtual prototyping of multibody dynamical systems are articulated. An efficient parallelizable algorithm is then introduced to address these problems. The procedure has been developed through hybridization between sequential order N or O(N) procedure (N: total number of bodies in a multibody system) and parallel computing techniques, and hybridization between direct methods and iterative methods. The implementation of the algorithm on TeraGrid computing systems for large-sized multibody systems is further discussed. The algorithm is coded in C and MPI. Some MPI message passing techniques such as buffered message passing strategy have been used to reduce communication overhead. Correctness of the codes has been verified and compared with baseline bench mark cases. Various simulation cases and computing results are presented to demonstrate impact of the TeraGrid to the performance of the algorithm. TeraGrid provides computing facilities and resources for implementation of the algorithm for virtual prototyping of large-sized multibody dynamic systems, which is not possible with the cluster-based computers at the authors’ institution. TeraGrid offers higher computing efficiency than the cluster commuting systems.
Specifically simulation cases and results are further presented in detail for demonstration of scaling, computing efficiency, and flexibility of the algorithm on TeraGrid computers. An N body tether has been used as a case study to carry out simulation with number of bodies ranging from 16 to 512. The bodies with computational loads were distributed among processors of a TeraGrid computing machine. In one extreme case, all bodies were assigned to one processor so that the sequential O(N) performance was achieved. In another extreme case, each body was assigned to one processor individually for concurrent computing so that a parallel O(log2N) performance was achieved. Generally, the algorithm will give a computational performance between O(N) and O(log2N).
Authors acknowledge and thank financial support from TeraGrid Pathways Fellowship program. Authors also sincerely thank for all consulting services and system support provided by the scientists and experts from TeraGrid.

Linda Hayden, Jeaime Powell, Felicia Doswell and Kaiem Frink. Mentoring Minority Undergraduates in their efforts to Implement a LAMP Documentation Server for a Condor-based Grid
Abstract: This paper describes efforts to lead underrepresented undergraduate students to setup a documentation platform for a Condor-based GRID to be established at Elizabeth City State University. This documentation platform consists of a Linux based web server that utilizes Web 2.0 standards to create a virtual documentation web portal. Grid computing itself is the creation of a "virtual supercomputer" using a network of geographically dispersed computers. In order to produce such a network infrastructure, documentation is critical to allow communication with the users and the administrators of the systems.
The documentation distribution center developed in this research incorporates an Ubuntu Linux kernel with an Apache web server, a MySQL Database, a PHP scripting package, and a MediaWiki web interface. This particular configuration is called a LAMP server. LAMP is the acronym for Linux, Apache, MySQL, and PHP, which are all open source applications. The combination of these LAMP applications allows MediaWiki to function as a collaborative editing tool.
This paper details successful schemes for engaging and retaining underrepresented communities in high-performance computing. Included in the paper are the demographics of the minority students involved in the project, the structure of the team and the organization of the research training activities.

Gaurang Mehta, Ewa Deelman, Karan Vahi, Gideon Juve, Philip Maechling, Thomas Jordan, Scott Callaghan, Miron Livny and Kent Wenger. Pegasus WMS - Bridging the National CyberInfrastructure Divide to Run Large Scale Scientific Workflows

Download the poster(PDF)
Abstract: Pegasus WMS is a Workflow Management System that can manage large-scale scientific workflows across Grid, local and Cloud resources. This poster will introduce the capabilities of managing these workflows on diverse national cyberinfrastructure like OSG (Open Science Grid) and TeraGrid in an efficient, reliable and automated fashion. Pegasus WMS is developed at the University of Southern California, Information Sciences Institute in collaboration with the Condor project at the University of Wisconsin Madison.
Our different national cyberinfrastructures that have been developed over the past decade offer different styles of high performance computing. Although much work has been done in supporting running applications on a specific resource, not much effort has been devoted to bridging the divide by running a single application across multiple resources in a easy and efficient manner. There is still a gap between the needs of the scientific applications and the capabilities provided by the resources. Leadership-class systems, such as many TeraGrid resources, are optimized for highly parallel, tightly coupled applications, whereas collaborative systems like OSG cater to high throughput loosely coupled applications. Some scientific applications, however, are composed of a few long running tightly coupled components performing data generation or mapping, followed by a large number of loosely-coupled individual components, many with data and control dependencies. Managing such complex, many-step workflows easily and reliably still poses difficulties on today’s cyberinfrastructures.
Pegasus WMS was initially developed as part of the GriPhyN project to support large scale high energy physics and astrophysics experiments. Direct funding from the NSF enabled support for a wide variety of applications from diverse domains including earthquake simulation, epigenomics, chemistry and ocean modeling. Pegasus WMS provides a means for representing the workflow of an application in an abstract form, agnostic of the resources available to run it and the location of data and executables. It then complies these workflows into executable workflows by querying catalogs and farming the computations across the different resources using the Condor DAGman and Condor-G as a dispatcher.
Pegasus WMS was recently used in a large scale production run in May 2009 by the Southern California Earthquake Centre to run 190 million loosely coupled tasks and about 2000 tightly coupled MPI style tasks on TeraGrid for generating a probabilistic seismic hazard map of the Southern California region. In early January 2010 we ran a few of these large scale SCEC workflows in an automated fashion by running the tightly coupled jobs on TeraGrid and the automatically moving the required data to OSG to run the post processing loosely coupled part of the workflow thus making optimal use of the strengths of these two resources.
The aim of this poster is to highlight the capabilities of running workflows across local and distributed computing resources, the major national cyberinfrastructure providers OSG and TeraGrid, as well as emerging commercial and community cloud environments.

Mats Rynge, Ewa Deelman, Gideon Juve, Burt Holzman, Krista Larson, Frank Wuerthwein and Igor Sfiligoi. CorralWMS - Integrated Resource Provisioning Across the National Cyberinfrastructure in Support of Scientific Workloads

Download the poster(PDF)
Abstract: This poster will introduce CorralWMS, a project to integrate two existing resource provisioning systems: Corral and glideinWMS. The project is a collaboration between Fermi National Laboratory, UC San Diego, and USC Information Sciences Institute.
Although much work has been done in developing the national cyberinfrastructure in support of science, there is still a gap between the needs of the scientific applications and the capabilities provided by the resources. Leadership-class systems are optimized for highly-parallel, tightly coupled applications. Many scientific applications, however, are composed of a large number of loosely-coupled individual components, many with data and control dependencies. Running these complex, many-step workflows robustly and easily still poses difficulties on today’s cyberinfrastructure. One effective solution that allows applications to efficiently use the current cyberinfrastructure is resource provisioning using Condor Glideins.
Corral and glideinWMS currently operate as standalone resource provisioning systems. GlideinWMS was initially developed to meet the needs of the CMS (Compact Muon Solenoid) experiment at the Large Hadron Collider (LHC) at CERN. It generalizes a Condor GlideIn system developed for CDF (The Collider Detector at Fermilab) and first deployed for production in 2003. It has been in production across the Worldwide LHC Computing Grid (WLCG), with major contributions from the Open Science Grid (OSG) in support of CMS for the past six months, and is being adopted for user analysis in time for data acquisition in October 2009. GlideinWMS also has been adopted by the CDF and MINOS experiments, and is being evaluated by the DZero experiment. GlideinWMS has been used in production with 8,000 concurrently running jobs, totaling more than six million jobs executed over the last year, consuming more than 2000 CPU years.
Corral, a tool developed to complement the Pegasus Workflow Management System was recently built to meet the needs of workflow-based applications running on the TeraGrid. It is being used today by the Southern California Earthquake Center (SCEC) CyberShake application. In a period of 10 days in May 2009, SCEC used Corral to provision a total of 33,600 cores and used them to execute 50 workflows, each containing approximately 800,000 application tasks, which corresponded to 852,120 individual jobs executed on the TeraGrid Ranger system. The 50-fold reduction from the number of workflow tasks to the number of jobs is due to job-clustering features within Pegasus designed to improve overall performance for workflows with short duration tasks.
The integrated CorralWMS system will provide a robust, and scalable resource provisioning services that supports a broad set of domain application workflow and workload execution environments. The aim is to integrate and enable these services across local and distributed computing resources, the major national cyberinfrastructure providers, Open Science Grid and TeraGrid, as well as emerging commercial and community cloud environments.

Igor Sfiligoi, Frank Wuerthwein and Christopher Theissen. GlideTester - A framework for distributed testing of network-facing services using Condor glideins on Grid resources

Download the poster(PDF)
Abstract: This poster will introduce the glideTester framework, a Condor glidein based system aimed at providing an easy to use infrastructure for distributed testing of network-facing services using Grid resources.
Many compute environments rely on centralized, network-facing services in order to work properly. In the commercial world an obvious example are Web servers. In the Grid world, there are Compute Elements (CE), Storage Elements (SE) and information systems. All of them receive requests from a high number of geographically distributed clients. In order to test the scalability and reliability of such services, the test environment must mimic the expected access pattern; this means we need test clients distributed over a large geographical area all concurrently talking to the tested service.
The reason for scalability and reliability testing is to assess how these services scale as we add more clients. Does the total transaction rate scale linearly with the number of clients? Is there a limit where the service gets saturated? How is this limit approached? Is it linear, exponential, or something else? What happens when we reach it? Does it stabilize, degrade, fail completely or something in between? Answering such questions helps both in choosing the right product for the situation, to get the right amount of hardware for a specific product, as well as a feedback to the software developers so they can improve their products.
In order to get a meaningful data set, the tests must be spread over a widely geographically distributed area; many services behave differently when accessed from a local, reliable and low latency network, and when accessed from clients connected over a high latency, possibly unreliable wide area network. Moreover, the tests should ideally be performed at many different concurrency points and using various access patterns; and at each point the same kind of tests should be executed every time.
The glideTester framework provides an automated environment for performing the above-mentioned tests. It does this on Grid resources using the Condor glidein paradigm. Implementation-wise, it is an extension of the glideinWMS, a glidein-based system currently in use on the Open Science Grid (OSG). The glideTester framework is currently being primarily used by the OSG scalability area group to test various OSG services, but it is generic enough to be used by the Teragrid community as well.

Adam Sullivan, Kwai Wong, Chunlei Su and Xiaopeng Zhao. A Parallel Agent Based Model to Describe Host-Pathogen Interaction for Toxoplasma Gondii
Abstract: Toxoplasma gondii (T. gondii) is considered as one of the most successful parasites for its unusual ability to infect a wide range of intermediate hosts, including mammals and birds. Up to 20% of the human population in the US and 30% in the world are chronically infected. Toxoplasma infection can cause life-threatening encephalitis in immunocompromised persons such as AIDS patients and recipients of organ transplants and cancer chemotherapy. T. gondii has a complex life cycle that involves many dynamic processes. In this work, we develop an agent based model to describe the replication cycle of Toxoplasma within a mouse. The model is implemented on the supercomputer Kraken using the Flexible Large-scale Agent based Modeling Environment. Numerical analyses of the model shed insights on the significances of various regulatory biological parameters.

Liwen Shih. Space Radiation Estimation and Prediction on TeraGrid

Abstract: Space radiation is likely to be the ultimate limiting factor for future deep space exploration, with great challenge even at the very first step onto the radiation-harsh Moon base, lacking protection of earth-like magnetic field and atmosphere. Understanding and predicting the space radiation environment are essential for risk assessment of orbit/crew selection, mission schedule/duration, and provides scientific basis of countermeasures of shielding materials (affecting spaceflight weight), radioprotectants, and pharmaceuticals. Our goal is to help NASA provide better/faster solar radiation event monitoring/remedy/reaction and more accurate space weather analysis with radiation transport computation optimization and solar storm prediction. By applying artificial neural network prediction and other data mining technologies in a similar manner to tropical storm, earth quake and commodity price predictions, we are working with NASA to collect, organize and analyze past solar radiation flux data and images for future solar particle event forecast to help NASA achieve better radiation monitoring to protect our astronauts and space missions. NASA radiation dosage/flux estimation software HZETRN (High Z Charge & Energy Transport) currently underestimates by 15-30%, and still runs about 8+ hours for one solar radiation particle event, which is inadequate for real-time space weather monitoring and astronauts/mission protection to Moon and Mars. Although our previous HZETRN bottle-neck routine prototype has shown 325 to 600 times speedup with FPGA boards alone, a new computer cluster with built-in FPGAs would provide the necessary fast communication interface between the main program running on computer cluster, calling frequently used core routines running on FPGA co-processors. Due to our newly awarded PetaFLOPS computation resource allocations, we are also able to adapt HZETRN to run on the new UT TACC Ranger Sun Constellation Cluster, TeraGrid roaming 11 supercomputing centers, Perdue’s Brutus (SGI Cluster with 4 FPGAs) and Rice’s Cray XD-1 with FPGAs. With the newly emerging technology in space weather prediction, large-scaled parallel computer clusters and reconfigurable FPGA, it is highly promising that a high-performance improvement of the highly complex real-time space weather analysis, monitoring and prediction can be developed that will bring us to Moon and Mars sooner/safer/lighter/cheaper.

Li Sun and Liwen Shih. Multi-scaled Properties Simulation of Carbon Nanofiber Reinforced Space Polymer Composites
Abstract: Continuous fiber reinforced polymer matrix composites (PMCs) have been extensively used in space vehicles and aerospace structures due to their light weight, high stiffness, tough strength and corrosion/fatigue resistance. We recently developed polymer nanocomposites reinforced by low-cost pre-assembled interconnect carbon nanofibers (CNFs) which could deliver additional dynamic damping and electrical/thermal conductivity improvement. These properties can lead to the development of new space materials with significantly improved vibration/noise reduction, Lighting Strike Protection (LSP) and Electro-Magnetic Immunity (EMI) performances. For multi-component complex system such as nanocomposite, computer simulation is the only effective approach to reveal structure-property correlations. In our simulation approach, three-dimensional Monte Carlo method is used to describe nanomaterial distribution with periodic boundary condition. To quantify composite electrical conductivity, a resistor network representation formed by CNF segments and fiber-matrix-fiber junctions has been established. Dramatic improvement in computational efficiency in calculating the electrical behavior of resistor network can be realized when the contribution of fiber resistances can be neglected when they are much smaller than the junction resistance. We also evaluated the contact pressure at the fibermatrix interfaces by solving the radial displacement problem of two concentric cylinders under lateral pressure. This helps us to quantify the interfacial friction energy dissipation during a dynamic loading and the strain, strain rate and frequency dependences of energy dissipation in polymer nanocomposites can be revealed. Base on two UHCL-UH teams’ expertise in nanomechanics, nanomaterial synthesis, high performance computing and machine intelligence, this collaborative research can provide fundamental understanding of nanocomposite properties and help the design and synthesis for the next generation high performance, multi-functional nanocomposite for space applications.

Robert Budden, josephine palencia and Paul Nowoczynski. Konfuse - Kerberos Over the Network via FUSE
Abstract: The growing popularity for wide area network (WAN) file systems has sparked numerous projects throughout the grid computing industry. Grid computing sites are deploying WAN based file systems for backend storage and computational resources.
One of the major concerns with deploying a WAN based file system is security. Lustre 2.0 provides a Kerberos infrastructure to validate servers, clients, and users.
The addition of Kerberos adds a layer of security but also adds a layer of complexity. Batch and daemon operations that ordinarily required little file system authentication besides UID/GID lookups are now required to present Kerberos credentials. Data transfer services such as GridFTP and SCP are now required to have server side credential caches to function properly.
Konfuse acts as the delegator between users and remote services. Konfuse monitors local credentials and propagates Kerberos credentials to server side daemons. Using these credentials data transfer daemons can perform authenticated reads and writes to the Lustre file system.

Pierre Riteau, Mauricio Tsugawa, Andrea Matsunaga, José Fortes and Kate Keahey. Sky Computing on FutureGrid and Grid'5000

Download the poster(PPT)
Abstract: Sky computing is an emerging computing model where resources from multiple cloud providers are leveraged to create large scale distributed infrastructures. These infrastructures provide resources to execute computations requiring large computational power, such as scientific software. Establishing a sky computing system is challenging due to differences among providers in terms of hardware, resource management, and connectivity. Furthermore, scalability, balanced distribution of computation and measures to recover from faults are essential for applications to achieve good performance. This work shows how resources across two experimental projects: the FutureGrid experimental testbed in the United States and Grid'5000, an infrastructure for large scale parallel and distributed computing research composed of 9 sites in France, can be combined and used to support large scale, distributed experiments. This showcases not only the capabilities of the experimental platforms, but also their emerging collaboration.
Several open source technologies are integrated to address these challenges. Xen machine virtualization is used to minimize platform (hardware and operating system stack) differences. Nimbus, which offers VM provisioning and contextualization services, is used for resource and VM management. Nimbus allows turning a cluster into an Infrastructure-as-a-Service cloud. By deploying Nimbus on the FutureGrid and Grid’5000 platforms, we provide an identical interface for requesting resources on these different testbeds, effectively rendering interoperability possible. We also leverage the contextualization services offered by Nimbus to automatically configure the provisioned virtual machines. In our context, contextualization enables new resources to join the virtual cluster without any manual intervention. Commercial clouds and scientific testbeds limit the network connectivity of virtual machines, effectively rendering all-to-all communication, which is required by many scientific applications, impossible. ViNe, a virtual network based on an IP-overlay, allows us to enable all-to-all communication between virtual machines involved in a virtual cluster spread across multiple clouds. In the context of scientific testbeds such as FutureGrid and Grid’5000, it allows us to connect the two testbeds with minimal intrusion in their security policies. Additionally, we use Hadoop for parallel fault- tolerant execution of a popular embarrassingly parallel bioinformatics application (BLAST). In particular, we leverage the dynamic cluster extension feature of Hadoop to enable resources from Grid’5000 to merge with resources from FutureGrid in a single virtual cluster while computation is under progress. After extension, Map and Reduce tasks are distributed among all resources, speeding up the computation process. Finally, to accelerate the provisioning of additional Hadoop workers (deployed as VMs), an extension to Nimbus taking advantage of Xen copy-on-write image capabilities has been developed. The extension decreases the VM instantiation time from minutes to just a few seconds.
The elasticity of this approach has been showcased as a demo presented at OGF-29. It includes elements from the Scaling-out CloudBLAST: Combining Technologies to BLAST on the Sky demo performed at CCGrid2010 and from the entry that won the Grid'5000 Large Scale Deployment Challenge at the 2010 Grid’5000 Spring School: Deployment of Nimbus Clouds on Grid’5000.

Rion Dooley, Maytal Dahan, Matthew Hanlon, Stephen Mock and Praveen Nuthulapati. Comprehensive File Management in the TeraGrid
Abstract: Data management continues to be a challenge for new and existing TeraGrid users. In 2009, the TeraGrid User Portal (TGUP) team released the File Manager application. Since then, the TGUP team has continued to expand the suite of file management services available to the user community. Today, users have three ways to access their data and one way to share with their colleagues and the world around them. In this poster, we summarize each of these services and illustrate their ideal use cases.
The TGUP File Manager is a java applet available through the TGUP. It enables access desktop files, remote TeraGrid systems, cloud storage systems (specifically Eucalyptus and Amazon’s S3), and external systems defined by users. It supports third-party transfers, bookmarking, and remote searching. The TGUP File Manager was designed to be a common, intuitive interface for both novice and expert users. As such, it is ideal for efficiently moving all types of data into and out of the TeraGrid on a high-speed, optimized network.
The Mobile File Manager is part of the TGUP Mobile web application. Like the TGUP File Manager, the Mobile File Manager allows browsing of all TeraGrid systems, file downloading, and third-party transfers. In addition, it provides several convenience features for mobile users such as one touch file publishing to the user’s public folder, simple creation of shared groups for any file/folder, and one click permission management. The Mobile File Manager is ideal for collaboration and quick, mobile access to data on the go.
TeraGrid Virtual Home Space (VHS) is a virtual file space for users. It brokers access to the TeraGrid's HPC file systems by abstracting the discovery and communication with specific resources behind a FTP interface. Because TeraGrid VHS is implemented as an FTP service, users can mount it as a network drive any modern operating system and access their virtual home space using the native file browser of their local operating system (ie. Windows Explorer, OS X Finder, Konqueror, etc). TeraGrid VHS is idea for moving data between the user’s desktop and TeraGrid systems and well as into and out of shared folders.
TeraGrid Share is a new collaborative file service that provides every TeraGrid user with at least 2GB of personal web space and gives them the ability to share data with other TeraGrid users and the world. The service is available through the TGUP File Manager, Mobile File Manager, and TeraGrid VHS by default. There are also droplets available for Mac, Windows, and Linux operating systems to enable one-touch publishing of any TeraGrid user file/folder to their public web space. TeraGrid Share is ideal for sharing your work with your world.

Beth Plale, Craig Mattocks, Keith Brewster, Eran Chinthaka, Jeff Cox, Chathura Herath, Scott Jensen, Yuan Lao, Yiming Sun, Felix Terkhorn, Ashish Bhangale, Kavitha Chandrasekar, Prashant Sabhnani, and Robert Ping. LEAD II : hybrid workflows in atmospheric science
Abstract: With the introduction of solid performing Windows cluster operating system, Windows HPC Server 2008, we undertook a study to execute parts of a Linux-based scientific workflow on an HPC Server 16-core (duo core, quad processor) machine. At Supercomputing 2009 we demonstrated workflow execution of the Weather Research Forecast model (WRF), version 3.1 through the Trident Scientific Workflow Workbench, a Windows desktop workflow tool. WRF results were passed to a set of NCAR NCL scripts that were executed through Cygwin. LEAD II takes this one step further in its spring support for the NSF-funded Vortex2 field experiment to study tornadoes. Over the six weeks in late Spring 2010, the LEAD II system generated 252 short-term forecasts and created 9000+ visualization products. In support of this effort we added the feature of delegating a sub workflow that can be passed to and invoked by a Linux workflow orchestrator, Apache ODE, which executes the subworkflow on TeraGrid.

Student Posters:

Jernettie Burney and Michael Austin. "A Comparison of Job Duration Utilizing High Performance Computing on a Distributed Grid"
Abstract: The Polar Grid team was tasked with testing the central manager system on Elizabeth City State University to ensure that it was prepared for grid computing. This was achieved by installing the Condor 7.4.0 client on iMac workstations computers located in Dixon Hall, Lane Hall, and E.V. Wilkins on the campus of Elizabeth City State University. Condor allowed jobs to be submitted to the central manager and distributed to one or more nodes. The job that the team submitted to Condor was compiled Sieve of Eratosthenes in C++ code. This code generated prime numbers from 0 to 500,000 and was essential in testing the job submission process. The compiled code that was used in the script files was submitted to the central manager through Condor. These jobs were then distributed to available nodes for processing.
After each successful job submission, log files were created to record statistical data. The data was of the elapsed time it took to process each individual job. The data from these tables were imported to Minitab, which was a statistical analysis software package. An analysis of variance (ANOVA) was then performed to determine if the elapsed times of the submissions varied within a 5 percent level of significance. From ANOVA, statistical evidence proved that by increasing the number of nodes, the elapsed time would decrease; therefore showing a performance increase.

Joshua Tepper. Interactive Visualization of Multi-Terabyte Cosmological Data Sets
Abstract: While highly scalable code exists for running cosmological simulations, much of the analysis of the results has been done in serial on considerably less powerful computers. As the size of the data-sets grows, this approach quickly becomes impractical. One thing we would like to do is to visualize snapshots of the simulation, which in recent runs have grown to about 3TB (per snapshot). We are currently developing tools that enable interactive visualization of such particulate, SPH, cosmological data. This is accomplished through two components: (1) a client application running on the researcher's desktop which can communicate with (2) an MPI based application running on an HPC system. Here we present a summary of this work.

Maxwell Hutchinson. Entropy of octagonal tiling representations of quasicrystals

Download the poster(PDF)
Abstract: Quasicrystals can be modeled as rhombic tilings of octagonal spaces in two dimensions. The size of the state-space of tilings is related to the entropy of the quasicrystal, making their enumeration of valid tilings as relevant as it is challenging. Various computational techniques yield improvement in pre-existing algorithms, exposing previously intractable problem sizes. The entropy density converges in the limit of large tilings of constant shape, allowing the practical application of these techniques to large systems. The relationship of the entropy density to the shape is found to be approximately Gaussian with respect to the logarithm of the ratio of edge lengths.

Andrew Pfeifer. A Dynamic Agent-Based Data Distribution Framework
Abstract: In all computation-based fields, it is becoming increasingly important to be able to distribute large amounts of data among multiple locations. However, many of the processes which have been developed to perform these tasks are often overly complicated and unable to adapt to changing environments. This project concerned the creation of a method that was both efficient and adaptable for the dissemination of data among many physical or software locations or entities. The method was based upon the interaction among semi-autonomous software agents, one of which was programmed with a centralized fuzzy logic controller. Although no explicit social hierarchy was utilized, a system was developed such that data distribution could occur in a manner well-regulated by one of the dedicated agents. Overall, this framework was proven to be highly efficient in distributing large amounts of data relative to comparable procedures, to greatly reduce the amount of wasted bandwidth that developed during data distribution, and to enhance the overall environmental autonomy of the distribution system.

Lu Liu. Analyzing Projected Changes and Trends of Temperature and Precipitation in SCIPP Area from Sixteen Downscaled Global Climate Models under Different Emission Scenarios

Download the poster(PDF)
Abstract: This study examines how future climate, temperature and precipitation specifically, will change under A2, A1B and B1 scenarios over the six states designated by SCIPP. Statistically downscaled data from 16 GCMs were applied for SCIPP climate change prediction. Temperature will be increasing, varying from 1.95 K to 2.71 K under different scenarios. Temperature will rise spatially uniformly, suggesting overall increase in SCIPP area. Precipitation will also be increasing in the 21st century, and the second half century (2050-2099) will be expecting more precipitation than the first half (2000-2049). The coastal region and the state of Tennessee are likely to experience great increase of rainfall in the second half of the century.

Libin Sun, Cyrus Stoller and Tia Newhall. A Hybrid MPI and GPU Approach to Efficiently Solving for Solving Large-sized kNN Problems
Abstract: We present a novel technique for solving large-sized k-Nearest Neighbor (kNN) problems. Our solution scales to problems involving large, multi-dimensional data sets and large numbers of queries by using a hybrid MPI and GPU approach to parallelization. Tests of our solution running on the NCSA's Lincoln cluster on the TeraGrid show speedups of up to 90 over a pure GPU approach to parallelizing kNN. We can solve kNN problems on 76 gigabytes of data in 17 minutes, and on 20 gigabytes in just a few minutes. Our scalability studies indicate that our solution should continue to scale to larger sized clusters and larger sized data sets.
k-Nearest Neighbors is an instance-based classification algorithm for pattern recognition problems and is often used as a benchmark in machine learning. kNN is commonly used in computer vision, information retrieval, natural language processing, and speech recognition. The algorithm consists of a training set of classified data and a test set of data objects to classify. The algorithm finds an object's classification based on its k nearest neighbors in the training set. The nearest neighbors are found using some distance metric (ex. Euclidean distance and KL divergence).
A common problem with instance-based machine learning algorithms is that they are very computationally and memory intensive. Improving the accuracy of the results requires larger training sets of finer granularity and higher dimensional feature vectors. However, these result in an explosion in the amount of data and the amount of computation necessary to classify objects. Techniques using data structures such as kd-trees and quad-trees can be used to improve computation time, but the scalability of these techniques is limited by their sequential nature. Recent work in using CUDA on a Graphics Processing Units (GPU), allows for a parallel kNN solution that is scalable to the limits of the GPU memory size. There has also been work using MPI to parallelize kNN computation, but no work that we are aware of that combines both CUDA and MPI.
We use two layers of parallelization in our solution. Training data are distributed across a cluster of M nodes using MPI. Each node locally uses CUDA to compute the set of k nearest neighbors for each query on its portion of the training data. The top k results for each query on each node are then combined using a reduction merge sort across the MPI nodes to obtain the overall k nearest neighbors for each query.
Our approach is particularly powerful because it can handle much larger data sets than a single computer running leading CUDA implementations of kNN. Moreover, our implementation is faster than leading MPI implementations because each node in our cluster uses the parallelism of the GPU to significantly outperform the sequential local computation of other approaches. As a result, our algorithm scales to larger data sets and can run much more efficiently than previous solutions.

Abdul Muqtadir Mohammed and Shawn Duan. Parallel Simulation and Virtual Prototyping of Multibody Dynamical Systems on Teragrid Supercomputers

Download the poster(PDF)
Abstract: In this research, a new hybrid algorithm has been introduced for computer simulation and virtual prototyping of motion behaviors of multibody dynamical systems on TeraGrid supercomputers. It is easy to carry out simulations for simple structures on small scale, but when it comes to large and complex multibody structures, computational efficiency is a challenge. The new algorithm, which is called Hybrid Direct-Iterative Algorithm (HDIA), is one of the solutions to the challenge. The algorithm is based on cutting interbody joints so that a system of largely independent multibody subchains is formed. The increased parallelism is obtained through cutting joints, assigning each subchain to one processor and the explicit determination of associated constraint forces combined with a sequential O(N) procedure for formation and solution of equations of motion and constraint equations (N is total number of bodies in a multibody system). The algorithm has been coded in C and MPI, and implemented on TeraGrid supercomputing machines such as Bigben and Big Red. Simulation case studies and performance comparisons are presented to demonstrate impact of the TeraGrid to the performance of the algorithm. TeraGrid provides computing facilities and resources to make implementation of the HDIA for virtual prototyping of large scale multibody dynamic systems become possible. TeraGrid offers higher computing efficiency than the IBM cluster, IBM SP2, and SIG ONYX available at the authors’ institution. The graduate students expose themselves to state-of- the-art of TeraGrid computing systems, which truly enhances their research skills and abilities. This research is supported by TeraGrid Pathways Fellowship program. The authors acknowledge and thank financial support from TeraGrid Pathways Fellowship program. The authours also sincerely thank for all consulting services and system support provided by the scientists and experts from TeraGrid.

Laurentiu Marinovici. Stability of Networked Feedback Control Systems under Quantization

Download the poster(PDF)
Abstract: From modeling the behavior of physical systems, to simulating them, and then analyzing the results through tables, charts or graphs, computational science has always been an indispensable field for researchers and engineers in the control systems area. Nowadays, due to the advances in wireless communication, Networked Control Systems (NCSs) seem to become more and more interesting and efficient. It is important to notice that NCSs research is at the crossroads of three research areas: control systems, communication networks and information theory, and computer science. Applications like mobile telephony, sensor networks, micro-electromechanical systems, or industrial control networks are supposed to simultaneously control a series of dynamical systems using multiple actuators and sensors. For such distributed systems, mathematical models turn out to be of high order. Implementing and analyzing the behavior of this kind of complex systems require a large amount of calculations. Hence, all the advantages offered by computational science can be employed to study NCSs.
This poster aims to show how some of the key issues in NCSs can be modeled, simulated, and analyzed using one of the most common computational science softwares, MATLAB . NCSs are highly influenced by signal sampling and delay, which need to be adequately handled to ensure no degradation in performance and stability. Also, due to band-limited communication channels, control signals are quantized to assure a fluent traffic through communication network. Sometimes, the low resolution of the transmitted data impairs the feedback system stability, and even leads to limit cycles and chaotic behaviors. Therefore, finding the coarsest quantization levels that will result in less traffic jams while providing sufficient information for reliable control is of great importance. As an attempt to improve results, quantizations of both measurement and control signals are modeled as structured uncertainties and the classical output feedback law is augmented to include the quantized control signal as well. Thus, supplying more information about the quantization error to the controller empirically proves that much coarser quantizers can be used. To accomplish this, robust control approaches, such as $\mathcal{H}_{\infty}$ and $\mu$-synthesis, are applied, and Mathworks MATLAB is employed to implement algorithms and visualize results.

Parastou Sadatmousavi and Ross Walker. Calculating Activation Pathways of Adenovirus Protease Enzymes using the AMBER Molecular Dynamics Package on TeraGrid Resources

Download the poster(PPT)
Abstract: A detailed understanding of the reaction and activation pathways of enzymes is critical to the development of next generation pharmaceuticals. The ability to target specific locations on the activation pathway of an enzyme provides promising targets for novel drugs. The development of low energy pathway sampling methods, such as the nudged elastic band algorithm, has provided a mechanism by which constrained molecular dynamics simulations can be used to determine and then study enzyme activation pathways on an atomistic level. This work focuses on developing these methods, which require substantial computational resources and tightly coupled parallel HPC machines, and applying them to the determination of the activation pathway of the adenovirus protease enzyme. The adenovirus protease (AVP) is essential for adeno virus replication and thus is a target for antiviral drugs aimed at treating infections such as avian and swine-flu. The enzyme is activated upon the binding
of a small peptide via a 53 amino acid signal transduction pathway. Recently obtained crystal structures of both the inactive and active forms of AVP provide the two end points of this pathway. This poster will highlight continuing efforts to determine this pathway computationally and then the identification of potential drug binding locations along this pathway. The high computational complexity of these calculations has meant that the use of the TeraGrid has been vital.

Rijan Karkee. Waste Water Utilization into trasportation
Abstract: This is the method or formula to utilize waste water or waste liquid into transportation. It is a environment friendly process.
Utilization of geothermal energy to river is the best for this process but it is more costly.

Stephan Krach, David Toth and Michael Bradley. A Computational Approach to Ramsey Theory

Download the poster(PPT)
Abstract: The Party Problem is a problem in Ramsey Theory where the goal is to find the number of people that must be in a room so that there must be n people who all know one another or n people who do not know any of those other n-1 people [1]. This situation can be modeled as a complete graph with vertices representing people. The edges connecting people who know each other are colored blue while edges connecting people who do not know each other are colored red. Using this model, if the graph contains a subgraph that is a complete graph on n vertices (Kn) such that all the edges in this subgraph are red or all the edges in this subgraph are blue, then the situation where n people all know each other or n people do not know any of the other n-1 people exists. The notation for the party problem where we want to find the number of guests that must be present for 5 to all know each other or 5 to not know any of the other 4 is R(5, 5). R(5, 5) is an open problem, although it is bounded between 43 and 49 [2, 1].
To make progress in solving the R(5, 5) instance of the Party Problem, one must demonstrate that every graph with x vertices has either a blue K5 or a red K5 where 43 < x < 49. Finding an x where the aforementioned conditions hold or do not hold will tighten the bound on R(5, 5). Finding such an x where the conditions hold and an x-1 where the conditions do not hold will solve the R(5, 5) problem. A Kn contains n(n-1) / 2 edges. Thus, K43 contains 903 edges. Since every edge in this problem is colored blue or red, ignoring symmetry, there are 2903 different graphs to test in order demonstrate that every possible coloring of the edges of a K43 contains either a blue K5 subgraph or a red K5 subgraph. By using mathematical techniques to eliminate the need to test many graphs, we hope to bring the number of graphs that need to tested to a number that a supercomputer could test and use the available TeraGrid resources to make progress on this problem.
[1] Ramsey Number – from Wolfram MathWorld. /RamseyNumber.html. Accessed 5/11/10.
[2] S. P. Radziszowski. (Originally published July 3, 1994. Last updated August 4, 2009). Small Ramsey Numbers. The Electronic Journal of Combinatorics. DS1.10. [Online]. Available: Accessed 5/11/10.

Wen Duan and Shanzhong Duan. Parallel Implementation and Code Development of Efficient Multibody Algorithm for Motion Simulation of Molecular Structures on TeraGrid Supercomputers
Abstract: There are two major computational costs associated with computer simulation of atomistic molecular dynamics. They are calculation of the interaction forces and formation/ solution of equations of motion. In this research, an efficient parallelizable algorithm will be introduced and implemented for formation/solution of equations of motion and calculation of interaction forces associated with multibody molecular systems on TeraGrid supercomputers. The algorithm is based on placing hard constraints on bonds with high frequency vibration modes in a molecular structure as shown in Fig. 1 (a) so that a multibody model of the molecular system can be formed as shown in Fig. 1 (b). Then motion simulation methods and joint cut techniques for multibody dynamical systems at macro level can directly apply to the micro molecular structures as shown in Fig. 1 (b). After joint cutting, constraint forces appear between adjacent subsets as shown in Fig 1 (c), which makes the reduced molecular system behavior like the original system. Each subset in Fig. 1 (c) then can be assigned to a processor for concurrent simulation as shown in Fig. 1 (d). Thus sequential O(N) method is performed within each processor and parallel computing techniques apply between processor to obtain high computing efficiency (N: total number of subsets).
Figure 1: Development of Model & Parallel Computing Model
Based on the algorithm, Matlab codes have been developed and implemented sequentially for small size systems on a single processor. Based on Matlab codes, the authors will develop C and MPI codes, and implement on TeraGrid supercomputers. C and MPI codes contains four modules. They are basic operation, kinematical calculation, force and kinetic term calculation, and O(N) method modules. Message passing techniques such as buffered message passing will integrated with each module for parallel implementation of the codes in TeraGrid. This research is supported by TeraGrid Pathways Fellowship program. The authors acknowledge and thank financial support from TeraGrid Pathways Fellowship program. The authors also sincerely thank for all consulting services and system support provided by the scientists and experts from TeraGrid.

Hooman Hemmati, Duber Gomez Fonseca and Sarah Jennisca. Tool for Visual Comparison of Massive Data Sets Through Aggregation Techniques
Abstract: The project is focused on developing a tool for aiding in quickly analyzing massive times series and other data sets. TerGrid HPCresources are capable of generating massive data sets in short periods of time, and a visualization tool can help compare and contrast different data sets. Due to the nature of massive data sets, aggregation methods can highlight specific trends and allow for effective analysis. The tool developed in this research provides a synchronization friendly multi-windowed interface to allow for interactively aggregating and visualizing data sets.

Robert Dunn Jr. and Hongmei Chi. Practical Investigations for Task-Parallel Programming
Abstract: As capacity for digital information increases, methods in which to search, retrieve, manipulate and simulate data need to be timely. For instance analyst, chemist and scientist use clusters and multi-core processor for computation- or calculation-intense computing; including climate research, weather forecasting and vehicular accident modeling. The concept behind Computer Clustering is based upon harnessing the efforts of numerous smaller and less-expensive processors so that you may meet or exceed the capabilities of a larger and more expensive processor. With the advent of the multi-core processor, pressure is being put on application programmers to become extremely knowledgeable on how to parallelize their existing applications. This has brought about many innovations within the high performance computing field, one of which being the design of task based parallelism. Task based parallelism involves the abstraction of low level multi-core platform details and threading mechanisms to enhance parallel scalability and performance. This poster will explore task based parallelism using the Intel Threaded Building Blocks implementation. We will investigate whether the utilization of this concept truly yields linear scalability as the number of processor cores increases via a set of benchmarks, such as video games and other applications.

Steven Baker, B. Ramu and P. Derosa. Molecular modeling scheme to efficiently determine the selectivity of various calix-crown molecules with Cs, K, and Na ions.
Abstract: Nuclear energy is an attractive alternative fuel; however the disposal of nuclear waste is an issue. A large quantity of inert waste, including radioisotopes with short half-life, is unnecessarily stored with the waste. One process proposed to reduce the amount of waste being stored is nano-filtration, this process requires complexation of a large molecule, calix-[n]- crowns, with the ion that needs to be separated. While nuclear waste contains many radioactive ions, only 137Cs- has a half-life long enough to require a long-term storage option. Experimental studies have already shown that certain calix-[n]- crown species are effective to bind with the 137Cs¬-. Further studies could lead to more effective species or applications; such as sensing. The calix-[n]-crown species have been optimized using hybrid DFT method B3PW91 with a LANLDZ2 basis set. The solvation and binding energies were calculated with the same method in water and CDCl3. With these energies a selectivity coefficient was determined for each calix-[n]- crown species. The molecular modeling scheme described here, did not only show similar trends on selectivity to known experimental data, but also allowed to predict selectivity on untested calix-[n]-crown species. The low cost to produce additional data also make our method desirable.

Wenjing Jia, Don Wuebbles and Xin-Zhong Liang. Using a Regional Climate Model to Simulate Heavy Precipitation Events in Central Illinois
Abstract: Mesoscale regional climate models (RCMs) are recognized as an increasingly important tool to address scientific issues associated with climate and climate changes at local-regional scales. Nowadays, with the development of high speed supercomputers, the next-generation Weather Research and Forecasting (WRF) model has been built up, and the state-of-the-art regional climate model we applied here is a climate extension of the WRF (CWRF) by implementing numerous crucial improvements, including surface-atmosphere interaction, convection-cloud-radiation interaction, and system consistency throughout all process modules etc. to enhance the capability for climate applications. As we know, precipitation is the single most difficult and often erroneously modeled parameter in numerical weather/climate models. Here we choose precipitation as the key variable to examine our regional climate model’s ability to capture the previous extreme events—“hindcast” in long term regional climate simulations. We compare our model outputs with the best available observation rainfall data in previous years for central Illinois. And the results show that significant improvements of simulation of heavy precipitation events by using our regional climate model. And further studies on future cases to “futurecast” long term climate and climate changes are needed. By improving the technology of supercomputers, we can do more simulating experiments and learn better for this climate and climate change issue.

Swathi Laxmi Dubbaka, Ram Sri Harsha Bhagawaty, Lei Jiang, Kelin Hu, Sreekanth Pothanis, Nathan Brener, Erik Schnetter, Gabrielle Allen, S. Sitharama Iyengar and Tevfik Kosar. AUTOMATED SYSTEM TO CONSTRUCT A SIMULATED HURRICANE DATABASE
Abstract: 1. Introduction
ADCIRC hurricane simulations are very large numerical simulations run on different machines. To create a simulation, the input files have to be modified depending on the type of hurricane and the machine it is running on. These modifications are usually done manually. This is a tedious low-level task that can also result in errors. To avoid these complications we have developed an infrastructure that can automate all of this, directly create a simulation, and archive the output files in the Petashare data storage system.
ADCIRC (ADvanced CIRCulation model) is a system of computer programs for solving time dependent, free surface circulation and transport problems that run on flexible and irregularly spaced grids. It is a multi-dimensional, finite-element-based hydrodynamic circulation code [1]. It implements the Generalized Wave-Continuity Equation (GWCE) formulation [2]. It is currently used in the Coastal Emergency Risk Assessment (CERA) group at Louisiana State University (LSU).
3. Tool for automating simulation runs and archiving
We have developed an infrastructure that automates the creation, submission and archiving of ADCIRC hurricane simulations. For a new hurricane simulation, our infrastructure copies all the input files to a new directory, makes the necessary modifications to them and submits the simulation to the machine. After the simulation run is complete, it archives its output files including metadata into the Petashare data storage system.
4. Conclusions and Future Work
We have tested our infrastructure on LONI and two Teragrid machines (Queen Bee and Ranger). We are using this to run ADCIRC hurricane simulations. Our infrastructure is very well equipped to run in different machines with different architectures. We plan to enhance our infrastructure to create and submit simulations to different machines at the same time.
[1] Veri-tech, inc.: adcirc/sms.php#content.
[2] Adcirc:

Zhenhua Guo and Marlon Pierce. Lightweight OGCE Gadget Portal for Science Gateways

Download the poster(PPT)
Abstract: Science gateways are frontends by which researchers interact with backend computation and storage infrastructure. The purpose of gateways is to simplify the access to these resources by end users, hiding the complexities of the environment behind Web-based graphical user interfaces. Furthermore, it is desirable to extend general purpose Grid and Cloud middleware to meet the needs of specific scientific communities. Many traditional science gateways (e.g. TeraGrid portal, LEAD gateway) are component-based and built on top of Java Portlet standard. It supports server-side integration of different applications. However, portlets require Java web programming expertise and place most integration control in the hands of the portal administrator rather than the end user. Also portlets are an aging standard; they lack some features (e.g. social networking) that are needed by modern science gateways.
We revisited the traditional mechanism and investigated some new models and standards. Two important, complementary candidates are the Google Gadget component model and the REST service style. First of all, they make science gateways much easier to develop and use. Users can develop their own gadgets and deploy them to any Gadget compliant container. So gadgets can be easily reused across containers such as iGoogle, Orkut and Hi5. Thousands of gadgets exist in gadget directory and can be added to our portal easily. Addition, removal and configuration of gadgets are completely under the control of users as opposed to portal administrator, which brings great flexibility. Apache Shindig is the open source reference implementation of the related Gadget and Open Social standards. We have developed a gadget container using this approach and are applying it to the development of science gateways.
Our OGCE gadget portal manages applications deployed by users and renders them in browsers. Different layouts and views are provided to help researchers work more efficiently. Inter-gadget and gadget-container communications make related gadgets possible to work collaboratively instead of separately. Theme can be customized to satisfy different gateway tastes. In addition to the user interface, the gadget container also has a REST-based service interface. Users can use command-line tool like curl to manipulate their layout data and add/delete an application.
We also investigated and compared security mechanisms including OpenID, OAuth, Shibboleth, SAML and GSI. The OGCE gadget container supports OpenID, making cross-domain authentication less cumbersome. Interaction with MyProxy server makes our gadget portal able to authenticate against backend Grid systems. We have further integrated widely used third-party services as gadgets, including Google Calendar, Facebook and Twitter. We use the Google FriendConnect API to build social applications. The OGCE Gadget Container is open source, packaged with Apache Maven and can be downloaded and easily installed.

Eric Seidel, Gabrielle Allen, Steven Brandt, Frank Löffler and Erik Schnetter. Simplifying Complex Software Assembly: The Component Retrieval Language and Implementation

Download the poster(PDF)
Abstract: This poster describes the Component Retrieval Language, and the associated GetComponents tool. The language and implementation are designed to abstract and automate the complex process of retrieving and assembling distributed software frameworks. The language uses simple, clear directives to establish a list of components to be retrieved, which is then processed by the GetComponents tool with minimal user input.

Michael Thomas, Erik Schnetter and Gabrielle Allen. Simulation Factory: Simplified Simulation Management

Download the poster(PDF)
Abstract: This poster describes the motivations for and implementation of Simulation Factory. Through the use of an abstraction called the Machine Database, which allows Simulation Factory to be resource agnostic, Simulation Factory addresses three main goals: (i) managing access to remote systems, (ii) deploying and building source code, and (iii) submitting and managing simulations using the built source code.

Eric Shook, Shaowen Wang and Wenwu Tang. Toward A Parallel Spatially-Explicit Agent-Based Modeling Framework
Abstract: Spatially-explicit Agent-based Models (SE-ABM) often play an important role in the study of dynamic geospatial phenomena in fields ranging from biology and ecology to the social sciences and geography, where software entities (i.e. agents) representing acting individuals in a digital landscape are developed to re-create dynamic phenomena in a simulated environment. Although the computational capabilities of desktop computers continue to increase exponentially they cannot always meet the increasing demands of large-scale models. Therefore, modelers have begun to leverage parallel and high-performance computing. We aim to develop a parallel SE-ABM framework to support the simulation of billions of agents and environment cells using thousands of processing cores in parallel.
The proposed SE-ABM framework integrates three communication-oriented methods to facilitate SE-ABM development in a parallel and high-performance computing system. The first method, referred to as groups, logically organizes agents and interaction messages to allow the framework to operate at higher-levels of abstraction while simultaneously supporting functionality such as explicit message packing. The second method is a communication-aware diffusive load-balancing strategy for rectilinear domain decomposition, which aims to balance both computation and communication workloads to increase the efficient use of parallel resources. The third method relies on entity (e.g. agent) proxies which reside in ghost zone regions and act as proxies to remote information. Entity proxies enable the framework to automatically identify and manage inter-processor communication produced by the interaction of entities that reside on separate processing cores.
The framework was developed in C and MPI and has been deployed on two TeraGrid resources, Abe and Ranger. Through a series of experiments we demonstrate that the implemented framework meets our goals of simulating billions of agents and effectively using thousands of processing cores. For example, we implemented a popular agent-based model called Sugarscape with straightforward interaction rules. Using this model we successfully simulated 234 or 17,179,869,184 agents with an environment of 131, 072x131, 072 cells on 4096 processing cores using 1.20 petabytes of memory. Based on the prototype framework we discuss potential challenges that will guide future research directions which include improved data handling and investigating asynchronous communication.

Michelle Chen, Amit Chourasia, Emily Roxworthy and Leoneil Lopez. Recreating Drama in the Delta Using 3D Modelling and Animation
Abstract: This study embarks on combining technology with history, arts and culture to explore and communicate the events at two World War II internment campsites. We seek to recreate the segregated society and cultural activities at two Japanese American internment camps during World War II in an interactive gaming environment. During the initial phase of this study we have developed 3d models for the Jerome and Rohwer camp sites in southeast Arkansas by use of existing documentation and use dramaturgical documentation and design. We have also developed character models that will be used in the game. This interdisciplinary research involves students from Computer Science, Communication and Theatre and Dance. The study engages students from high school to doctoral candidates who are participating in this project. The crosscutting nature of this project is enabling the students and mentors to learn and enrich their existing skill set. This study serves as an example purporting to bridge the technology gap between Humanities disciplines and Computing.

Oleg Korobkin, Eloisa Bentivegna and Erik Schnetter. Application-Level Debugging and Runtime Visualization Tools for Parallel Simulations within the Cactus/Carpet Framework
Abstract: Development of parallel applications on massively parallel machines requires new approaches to the problem of identification of algorithmic errors. In particular, efficient development of such applications needs debugging tools which allow developers to concentrate on principal algorithmic steps and not the low-level details. Such tools should be able to directly access the application data structures and functionality.
In this poster, we present the Alpaca toolkit, designed for application-level profiling, debugging and runtime remote visualization. Alpaca was developed for Cactus/Carpet computational framework, which is used for parallel computing in several research areas, such as numerical relativity, astrophysics, computational fluid dynamics, petroleum engeneering, coastal modelling etc.
In the Alpaca toolkit, remote debugging tools and runtime visualization tools are implemented in the form of various interfaces (web, CLI, VisIt, VTK). The toolkit is integrated into the application, which allows it to seamlessly access data structures and functionality of the application. The toolkit provides control over the execution flow, so that the simulation can be paused, checkpointed or single-stepped. In the latter case the Alpaca debugging interface provides coarse-grained control of the execution at the level of individual algorithmic steps, corresponding to scheduled function calls. The debugging toolkit also provides the capability to "steer" the simulation through the use of global parameters at runtime, and visualize specific grid functions.
The poster presents the diagram of control and data flow between local user machine and remote application, and screenshots of a few sample sessions. We also include an on-site interactive demonstration on example of a large-scale simulation running in parallel on one of the TeraGrid machines.

Ashley Zebrowski, Gabrielle Allen and Peter Diener. Intelligent Application-Level Task Migration
Abstract: Computational infrastructure for research across scientific disciplines is growing and diversifying. National resources such as the NSF TeraGrid are providing unprecedented capacity in terms of computing power, network throughput, and data storage. At the same time, there is an emergence of specialized systems with heterogeneous computing resources, data-intensive systems, and advanced services.
This research is enabling intelligent distributed resource utilization for advanced scientific applications within the framework of emerging heterogenous computing resources. Our aim is to demonstrate that application frameworks, such as Cactus, can be extended to automatically identify and move appropriate subtasks to alternative resources e.g. to improve efficiency, reduce power consumption or reduce cost. We have developed modules for Cactus which control the execution of the framework's internal task scheduling and allow asynchronous analytical tasks to be recognized, packaged and executed on remote hardware.
The task migration infrastructure has been developed and tested with use cases provided from the Einstein Toolkit Consortium, whose members use Cactus to develop models for astrophysical objects. Initial results indicate an improvement in simulation execution times when running asynchronous analysis tasks on remote machines provided that the data to be analyzed does not require a large amount of time to package. The overall success of breaking tasks down and executing them remotely lends itself to additional research and investigation, particularly in terms of algorithmic development for real time decision making.

Peter Shannon, Ryan Houlihan and Axel Kohlmeyer. An OpenMP Overdrive for LAMMPS - Improving MD Performance on the TeraGrid Through Hybrid OpenMP/MPI Parallelization

Download the poster(PDF)
Abstract: Parallel efficiency of the LAMMPS Molecular Dynamics Package suffers when run on multi-core nodes due to communication bottlenecks resulting from MPI-only parallelization. With a hybrid OpenMP/MPI parallelization scheme we can offset communication penalties by replacing multiple intra-node MPI tasks with OpenMP threads.
We have implemented this approach and found over 2x speedup and improved scaling.
In the poster we will be outlining our threading strategy, which improves parallel efficiency. We will be also presenting benchmark results for a choice of typical problems on several TeraGrid sites, documenting the substantial overall increased performance.

Sairam Tangirala and David P. Landau. Role of Diffusion in Scaling of Polymer Chain Aggregates Found in Chemical Vapor Deposition Growth Model

Download the poster(PDF)
Abstract: Linear polymer chains grown by 1+1D Monte Carlo simulations of vapor deposition polymerization (VDP) are studied. The behavior of polymer chain length distribution function (n(s,t))as a function of chain length (s)and deposition time (t) have been studied for relevant model parameters. The dynamic scaling approach is found to be sensitive to the ratio G = D/F of deposition rate (F) and free monomer diffusion (D). A systematic approach is presented to isolate the dependence of 't', 's', and 'G' on the scaling behavior of n(s,t). We report a power law increase in n(s,t) for all 's' and increasing 't' with exponent being invariant with changes in 'G'. For increasing 't' and small 's', n(s,t) distribution was found to decrease with 's' according to a power law with the exponent being independent of 'G'. Scaling functions explain data collapse of n(s,t) with 't' and 's'. However, a strong influence of 'G' is evident in the rescaled plots of n(s,t) that prevents the data collapse of rescaled n(s,t) for varying 'G'. The dependence of scaling function on 'G' is shown to be a characteristic of VDP and our findings establish the need for including 'G' in the scaling theory of polymer chains formed during VDP process.

Contact Posters Chair Honggao Liu (LSU), or Student Competition Chair Laura McGinnis (PSC).