ECSS staff share technical solutions to scientific computing challenges monthly in this open forum.
The ECSS Symposium allows the over 70 ECSS staff members to exchange on a monthly basis information about successful techniques used to address challenging science problems. Tutorials on new technologies may be featured. Two 30-minute, technically-focused talks are presented each month and include a brief question and answer period. This series is open to everyone.
Day and Time: Third Tuesdays @ 1 pm Eastern / 12 pm Central / 10 am Pacific
Add this event to your calendar.
Webinar (PC, Mac, Linux, iOS, Android): Launch Zoom webinar
iPhone one-tap (US Toll): +16468769923,,114343187# (or) +16699006833,,114343187#
Telephone (US Toll): Dial(for higher quality, dial a number based on your current location):
US: +1 646 876 9923 (or) +1 669 900 6833 (or) +1 408 638 0968
Meeting ID: 114 343 187
Upcoming events are also posted to the Training category of XSEDE News.
Due to the large number of attendees, only the presenters and host broadcast audio. Attendees may submit chat questions to the presenters through a moderator.
April 10, 2012
SCALing the knowledge base for the Never-Ending Language Learner (NELL): A step toward large-scale computing for automated learning
PRESENTER: Joel Welling (PSC)
The Never-Ending Language Learner is a system of Java programs which continuously learns new beliefs from the World Wide Web- see the twitter feed for 'cmunell' to follow its discoveries. This is a project of Tom Mitchell and William Cohen at the CMU CS department. NELL learns by repeatedly cycling through a corpus of language statistics using information from its knowledge base. The sizes of the corpus and of the knowledge base make each cycle a time-consuming process. The ultimate goal of our interaction with this group is to bring this kind of work to high performance computing, allowing larger datasets and speeding the cycle time by a factor of hundreds.
Part of this project is just bringing up the necessary Java environment on Blacklight; I will discuss lessons learned in that process. The current ECCS project involves migrating the NELL ontology software from Tokyo Cabinet, a simple record-based database, to a graph database like Neo4J. This should provide a more natural representation for the ontology and will move the software from an obsolete base to a well-supported one. I will discuss the structure of the ontology and the representations of that structure under the new and old databases.
Joel Welling can be reached via this website
Use of Global Federated File System (GFFS)
PRESENTER: Andrew Grimshaw (University of Virginia)
The GFFS was born out of a need to access and manipulate remote resources such as file systems in a federated, secure, standardized, scalable, and transparent manner without requiring either data owners or applications developers and users to change how they store and access data in any way.
The GFFS accomplishes this by employing a global path-based namespace, e.g.,/data/bio/file1. Data in existing file systems, whether they are Windows file systems, MacOS file systems, AFS, Linux, or Lustre file systems can then be exported, or linked into the global namespace. For example, a user could export a local rooted directory structure on their "C" drive, C:\work\collaboration-with-Bob, into the global namespace at /data/bio/project-Phil. Files and directories on the user's "C" drive in \work\collaboration-with-bob would then, subject to access control, be accessible to users in the GFFS via the /data/bio/project-Bob path. Transparent access to data (and resources more generally) is realized by using OS-specific file system drivers (e.g. FUSE) that understand the underlying standard security, directory, and file access protocols employed by the GFFS. These file system drivers map the GFFS global namespace onto a local file system mount. Data and other resources in the GFFS can then be accessed exactly the same way local files and directories are accessed - applications cannot tell the difference.
Three examples illustrate GFFS typical uses cases, accessing data at an NSF center from a home or campus, accessing data on a campus machine from an NSF center, and directly sharing data with a collaborator at another institution. In all three cases client access to data will be via the GFFS-aware FUSE driver.
March 13, 2012
GPU-based Radiation Treatment Planning Applications
Presenter: Dong Ju (DJ) Choi
Graphics processing unit (GPU) has been actively adopted on the developments of various applications in radiation treatment planning process. Recent progress in the GPU-based applications promise the ability to ideally deliver an optimal treatment in response to daily patient anatomic variation. In this talk we will show some of the applications developed for the research in radiation treatment planning process. We will introduce the applications with their thread parallelism scheme and computational performance improvements.
Dong Ju (DJ) Choi can be reached via this website
Presenter: Thomas Uram
GPSI (pronounced "gypsy") provides computational scientists with a general purpose workbench for developing, testing, and using complex workflows for simulation and analysis. A key aspect that differentiates GPSI from other environments that support large parallel workflows is that it's not tied to any particular science domain. GPSI provides tools that have emerged as common in many of our past science gateway efforts, doing the heavy lifting that is often required in starting a new science gateway, while facilitating customization to a particular domain or collaboration. The GPSI environment integrates support for job execution and management; data management and browsing; and application development and reuse. We are working with research groups in power grid simulation and analysis and in phylogenetics to improve productivity via GPSI.
Thomas Uram can be reached via this website
February 14, 2012
Modeling Studies of Nano and Biomolecular Systems
Presenter: Ross Walker, SDSC
Principal Investigator: Adrian Roitberg, University of Florida
Investigating large scale protein domain motions with GPU Accelerated AMD simulations: Sampling for the 99%.
This talk will cover recent developments in the acceleration of Molecular Dynamics Simulations using NVIDIA Graphics Processing units with the AMBER software package. In particular it will focus on recent algorithmic improvements aimed at accelerating the rate at which phase space is sampled. A recent success has been the reproduction of key results from the DE Shaw 1 millisecond Anton MD simulation of BPTI (Science, Vol. 330 no. 6002 pp. 341-346) with just 2.5 days of dihedral boosted AMD sampling on a single GPU workstation. These results show that with careful algorithm design it is possible to obtain key long timescale sampling data of enzymes using just a single $500 GTX580 Graphics Card.
Ross Walker can be reached via this website
Experiencing Developing Tools for Scientific Communities in the Apache Software Foundation: Beyond Open Source
Presenter: Marlon Pierce, Indiana University
Science Gateways provide Web-based environments for scientists and students to perform computational experiments online via Web interfaces using Web services and computational workflows. Gateways rely on open source software, and many gateway developers have taken the extras step to make their own software open source, using tools like SourceForge, Github, and Google Code to make their codes available, easy to find, and open licensed. However, we believe there are important steps that should be taken to go beyond basic open source to address requirements for building open software communities. In addition to licensing and support tools, open communities must have open processes for making design decisions, accepting code contributions, adding new project members, reporting and resolving problems, and making well-packaged and properly licensed software releases. The Apache Software Foundation provides the infrastructure and mentoring experience to help open source communities address these project governance issues. Additionally, Apache has an interesting requirements (such as developer diversity) that are designed to emphasize the neutrality of the code base (encouraging competitors to have a safe place to cooperate), help sustain their projects through leadership turnover and funding cycles. In this talk I present our group's efforts to convert two major pieces of the Open Gateway Computing Environments project, the Gadget Container and the Workflow Suite, into Apache Rave and Apache Airavata incubators, respectively. I discuss the implications of the Apache model, both positive and negative, on the science gateway community and cyberinfrastructure generally.
Marlon Pierce can be reached via this webite
January 10, 2012 Symposium
Presenter: Ben Cotton
Condor is a distributed batch scheduler specializing in high throughput and resource scavenging. Condor supports running thousands of jobs simultaneously across heterogeneous compute platforms. This talk will discuss general Condor usage, including job creation and submission. Specifics of Purdue's Condor resource will also be presented. Sample computation types and workflows that best fit the resource will be shown.
Ben Cotton can be reached via this website
Visualization and analysis with Nautilus: From standard tools to unusual challenges
Presenter: Amy Szczepanski
Nautilus gives researchers great flexibility in visualizing and analyzing their data. One feature that enables these possibilities is Nautilus' 4 TB of global shared memory. In addition to the opportunities for keeping large data sets in memory, this architecture enables other types of analyses that are not well-suited for distributed memory systems. NICS/RDAV has developed an open-source tool, named Eden, that allows researchers to easily and efficiently perform parameter sweeps and other analyses that require independent runs of non-MPI code. We will start with a quick overview of Nautilus, its capabilities, and some of the standard visualization and analysis tools that are available on the system and then talk in more detail about Eden and how researchers have used this tool in their computational work.
Amy Szczepanski can be reached via this website
November 1, 2011 Symposium Talks
Large Synoptic Survey Telescope Data Challenge
Presenter: Darren Adams (NCSA)
PI: Tim Axelrod (University of Arizona)
The ECSS team has collaborated with the Large Synoptic Survey Telescope (LSST) project in evaluating the Research and Education Data Depot network (REDDnet) project. LSST researchers are interested in ways to make very large datasets available to researchers in geographically distant locations. The REDDnet project has devised a system of storage "depots" that are geographically distributed. Using the Logistical Storage (Lstore) system, also developed at Vanderbilt, files can be stored, replicated and striped based on user-defined policies. Polies can be created to emphasize fault-tolerance, performance, and/or geographic availability. The ECSS collaboration included integrating the LStore server with the NCSA Mass storage system and the loading of data sets from recent LSST data challenges. New featuresi ncluding FUSE filesystem interface capabilities may make REDDnet a viable choice for the LSST team.
Darren Adams can be reached via this website
Supporting Distributed and Loosely Coupled Parallel Molecular Simulations using SAGA
Presenters: Yaakoub El Kharma (TACC), Matt McKenzie (NICS)
PI: Ronald Levy (Rutgers)
This ECSS project is supporting an intense effort to understand important aspects of the physics of protein-ligand recognition by multidimensional replica exchange (RE) computer simulations. These are compute intensive calculations which are currently not well supported on XSEDE because they require both large numbers (10^3-10^4) of loosely coupled replicas and long simulation times (days to weeks). Our effort is focused resolving architectural and scalability issues associated with these large scale/high-throughput simulations. The framework we are using is the SAGA framework (Simple API for Grid Applications) and associated pilot job implementation: BigJob. The technical difficulties involved include job coordination across multiple resources, file and data movement, dynamic coupling of replicas and dynamic scheduling of resources. We will present the progress in implementing workflow managers, system monitors and data exchange mechanisms.
Yaakoub El Kharma can be reached via this website
December 13, 2011 Symposium
XSEDE Data Movement with Globus Online
Presenter: Steve Tuecke, Deputy Director, Computation Institute at University of Chicago and Argonne National Lab (and Globus Online Project Lead)
In this session we will present Globus Online, the hosted service that underpins XSEDE's current User Access Services providing secure, reliable file transfer and user authentication. Both XSEDE users and Campus Champions will benefit from attending this session which covers the basics of how to use Globus Online and how to enable resources for file transfer using the service. We will include a brief demo covering use of the GUI and command line interface, as well as our newest tools for setting up a multi-use server faster and more easily than has been possible before. We will also leave time for Q&A so attendees can get their questions answered about how to get started.
Steve Tuecke can be reached via this website
ASTA Project: Patient-Specific Modeling of Abdominal Aortic Aneurysms
Presenter: Anirban Jana (PSC)
PRincipal Investigator: Professor Ender Finol (Carnegie MelloN)
Cardiovascular diseases are a major cause of fatalities in the world. One kind is an aneurysm, which is a local dilation and resulting weakening of an artery, creating the possibility of rupture and a speedy death. This work is on Abdominal Aortic Aneurysms (AAA), which is a dilation of the abdominal aorta just above the iliac bifurcation and below the renal arteries. AAA rupture is currently the 10th leading cause of death in the US. Computational modeling of AAAs shows promise in the future of making accurate predictions of the rupture risk, and hence proper intervention strategies. Current state of the art is patient specific computational modeling of AAAs based on medical images of patient AAAs, such as those obtained by CT or MRI techniques. This is clearly a complex task. Challenges include proper extraction of the AAA geometry from the medical images, application of proper boundary conditions, appropriate material modeling of the diseased artery, and efficient computational methodologies to correctly capture the fluid-structure interaction between the pulsatile blood flow and the flexible wall, amongst others. In this talk, I'll present some of my contributions in these research areas in collaboration with Prof Ender Finol's group. We have used primarily PSC's Pople and Blacklight for this work, as well as some local machines. The main simulations have been performed using a commercially available multiphysics package called ADINA, while preprocessing (creating the finite element models from medical imaging data) is primarily done using MATLAB codes. Several publications have already resulted from this effort, and many more are in the pipeline.
Anirban Jana can be reached via this website