Wilson, B.D., Tang, B., Manipon, G., Yunck, T., Fetzer, E., Braverman, A., and Dobinson, E. (2004). GENESIS SciFlo: Enabling Multi-Instrument Atmospheric Science Using Grid Workflows. Eos Trans. AGU, 85(47), Fall Meet. Suppl. 2004, Abstract # SF31A-0716
The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of web services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations will include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-strato-sphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we are developing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable web services and executable operators into a distributed computing flow (operator tree). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months of data. We will discuss the design issues and solutions used in the implementation of SciFlo, including XML dataflow documents, heavy use of XML datatyping & semantic web concepts, parallel dataflow execution engines, data access simply by naming, and catalog lookup of operator bundles. To illustrate the SciFlo concepts, an example dataflow will be demonstrated in which atmospheric temperature and water vapor profiles from the AIRS, GPS, and MODIS instruments are retrieved using SOAP (data access) services, co-registered, and visually & statistically compared on demand.
[Full text not yet available]