NASA’s Office of Earth Science Awards Six Grants for Advanced Information Systems Technology

The National Aeronautics and Space Administration (NASA) has awarded funding for six new investigations for information systems technology development, under the Advanced Information Systems Technology (AIST) Program, which supports NASA’s mission to understand and protect our home planet. The proposals, selected from a field of 30 submitted proposals, focus on high-priority information technology areas: tools for warehousing, data mining, and knowledge discovery; technologies to facilitate queries/access of multi-disciplinary data; and techniques to facilitate customized data services. The data mining technologies sought address two challenge areas: ocean biology and biogeochemistry data mining, and data mining for climate and weather models. The total funding for these investigations, over a period of two years, is approximately $1.9 million; investigators hail from 7 states.

The main purpose of AIST is to invest in research and development of new and innovative information technologies to support and enhance the Earth science capability. AIST focuses on creating mature technologies leading to smaller, less resource-intensive and less expensive flight systems that can be built quickly and efficiently, and on more-efficient ground-based processing and modeling systems that improve the use of Earth science data.

The technologies selected include a statistical data mining and machine learning toolkit whose development will enable scaling of global data sets and integration of heterogeneous data sources to evaluate/predict the effects of varying weather patterns on agricultural crop yields. A spatiotemporal data mining tool will enable monitoring and modeling for multiple oceanographic objects, such as river-based plume and harmful algae blooms.

Technologies to improve the utilization of large heterogeneous data sets will also be developed. These include the modification of data compression techniques for use as a data reduction method to create small summary data sets that are substantially reduced in volume and complexity, and the wavelet analysis of local information content in a data scene to intelligently select the density of observations to use for weather and climate modeling.

Climate modeling and prediction techniques will be further enhanced through the development of data mining and knowledge discovery tools. A suite of data mining tools based on new information-theoretic techniques will enable rapid identification, characterization and quantification of causal interactions among relevant climate variables in large distributed data sets, allowing evaluation and prediction of climate and climate subsystem changes over time in response to natural and human-induced changes. Data mining and knowledge discovery techniques will facilitate analysis, visualization, and modeling of land-surface variables obtained from the TERRA and AQUA platforms in support of climate and weather applications to enable better parameterization of the relevant processes in forecast models for weather and inter-annual climate prediction.

The investigations selected by NASA's Earth Science Technology Office are

Braverman, Amy (Jet Propulsion Laboratory (JPL), Pasadena, CA):
Mining Massive Earth Science Data Sets for Climate and Weather Forecast Models
Cai, Yang (Carnegie Mellon University, Pittsburgh, PA):
Data Mining System for Tracking and Modeling Ocean Object Movement
Hoffman, Ross (Atmospheric and Environmental Research (AER) Incorporated, Lexington, MA):
Selection Technique for Thinning Satellite Data for Numerical Weather Prediction
Knuth, Kevin (NASA Ames Research Center, Moffett Field, CA):
Rapid Characterization of Causal Interactions among Climate/Weather System Variables: An Advanced Information-Theoretic Technique
Kumar, Praveen (University of Illinois, Urbana, IL):
Data Mining for Understanding the Dynamic Evolution of Land-Surface Variables: Technology Demonstration Using the D2K Platform
Wagstaff, Kiri (JPL, Pasadena, CA):
Interactive Analysis of Heterogeneous Data to Determine the Impact of Weather on Crop Yield

 

Title

Mining Massive Earth Science Data Sets for Climate and Weather Forecast Models

Full Name

Amy Braverman

Institution Name

JPL

Proposal #

AIST-QRS-04-3014

In this proposal we address the technology objectives specified in Section I.1. of the Mini-AIST NRA announced May 5, 2004. Specifically, we will provide tools and support for data warehousing, data mining, and knowledge discovery for the ESE science challenge posed in Section I.2.2.b: Data Preparation for Medium Range Weather Forecasts. The sheer volume of Earth science data precludes interactive, real-time scientific exploration required to characterize and understand features that can inform and improve physical models. We propose to solve this problem by creating small, reduced volume and complexity summary data sets which can be used in place of the original as input to models, or for comparisons to model output. We propose using data compression techniques, modified for use as data reduction methods, to create summary data sets of small size and high accuracy for observational data from AIRS, MISR, ISCCP (International Satellite Cloud Climatology Project) together with model data from NCAR’s CAM3 and GFDL’s AM2 atmospheric models. These summary data sets can be thought of as ``thinned'' in the sense of retaining representative observations which, taken together, preserve the statistical and distributional character of the original data. The summary data can therefore also be used to create customized data products that estimate features of modelers' choice. Our technology is currently at TRL 4, and we expect to achieve TRL 6 in the 24-month performance period.

 

Title

Spatiotemporal Data Mining System for Tracking and Modeling Ocean Object Movement

Full Name

Yang Cai

Institution Name

Carnegie Mellon University

Proposal #

AIST-QRS-04-3031

Tracking and modeling spatiotemporal dynamics of ocean objects are essential to ESE missions in oceanographic studies, such as monitoring and predicting harmful algal blooms along the coastline, or river-based plume discharged to the open ocean.

In this project, we propose a spatiotemporal data mining system for following objectives: 1) tracking the movement of ocean objects that have been identified; 2) discovering the correlations between the object attributes and satellite readings from multiple databases; 3) predicating the movement of identified objects.

This generalized spatiotemporal data mining tool enables monitoring and modeling for multiple oceanographic objects, such as plume and harmful algal blooms. This may also be applied to other spatiotemporal problems, such as monitoring dust storms.

We will use SeaWiFS database as our main source. Meanwhile, we will explore the use of other remote sensing databases such as MODIS.

The technology would be based on our lab prototypes of multi-sensor data mining framework with the entrance Technical Readiness Level 4. The project deliverable would reach TRL 5 to 6. The total time for this project is for two years.

The Co-PI Dr. Richard P. Stumpf, Oceanographer from NOAA will specify the requirements for the data mining tool and validate the product with field data. Dr. Han-Shou Liu, Geophysicist of GSFC, will support computational models for data mining.

 

Title

Selection Technique for Thinning Satellite Data for Numerical Weather Prediction

Full Name

Ross Hoffman

Institution Name

Atmospheric and Environmental Research, Inc.

Proposal #

AIST-QRS-04-3019

Operational weather prediction centers use only a fraction of observations of the atmosphere and the earth's surface that are made by satellite, in situ, and ground-based instruments. In many cases satellite data are selected by regular decimation, i.e., every nth observation. The objective of this proposed project is to develop a more intelligent selection method that uses the local information content in a data scene to determine the density of observations to use. The method will be based on a wavelet analysis of the satellite data. Tests using QuikSCAT scatterometer wind observations in analysis and forecast systems will compare results based on ALL of the data to results from the REGULAR and WAVELET selections. A two year level of effort is proposed to advance the TRL of the method from 4 to 6.

 

Title

Rapid Characterization of Causal Interactions among Climate/Weather System Variables: An Advanced Information-Theoretic Approach

Full Name

Kevin Knuth

Institution Name

NASA Ames Research Center

Proposal #

AIST-QRS-04-3010

The NASA Earth Science Enterprise is focused on obtaining a better understanding of our home planet. While it is clear that the Earth’s climate changes over time, it is not known how this change occurs, what the primary causes of change are, or how climate subsystems respond to natural and human-induced changes. Vast amounts of data are being collected on the Earth climate system and it is increasingly important to rapidly discover relevant climate variables, and qualify and quantify their causal interactions. We will develop a suite of data-mining tools based on new information-theoretic techniques to rapidly identify, characterize, and quantify causal interactions among relevant climate variables in large distributed datasets. This information-theoretic approach relies on established quantities such as mutual information in addition to novel quantities called co-informations and derived quantities such as transfer entropy, which enable us to quantify complex causal interactions over different spatiotemporal scales. We have demonstrated these techniques at TRL 4, and during the period of performance from 10/01/04 through 9/30/06 the development of these tools will take them to TRL 6. In addition to quantifying causal interactions, these tools will also quantify the errors in the estimates thus quantifying inherent uncertainties in the results. These uncertainties are crucial to accurately evaluating our state of knowledge about the climate system, which is an element of key interest to the US Climate Change Science Program. These measures will be demonstrated using important climate datasets including MODIS, TRMM, and the International Satellite Cloud Climatology Project (ISCCP) dataset.

 

Title

Data Mining for Understanding the Dynamic Evolution of Land-Surface Variables: Technology Demonstration using the D2K Platform

Full Name

Praveen Kumar

Institution Name

University of Illinois

Proposal #

AIST-QRS-04-3015

The objective of this research proposal is to develop data mining and knowledge discovery in databases (KDD) techniques, using the “Data to Knowledge” (D2K) platform developed by National Center for Supercomputing Application (NCSA), to facilitate analysis, visualization and modeling of land-surface variables obtained from the TERRA and AQUA platforms in support of climate and weather applications. The specific technology objective addressed is: Tools and support for data mining and knowledge discovery. The ESE science challenge addressed is: Data mining for climate and weather applications. The specific science questions that this project will focus on are:
1. How are evolving surface variables such as vegetation indices, temperature, and emissivity, as obtained from the TERRA and AQUA platforms, dynamically linked?
2. How do they evolve in response to climate variability such as ENSO (El Niño Southern Oscillation)? and
3. How are they dependent on temporally invariant factors such as topography (and derived variables such as slope, aspect, nearness to streams), soil characteristics, land cover classification, etc?
Answers to these questions, at the continental to global scales, using data mining, will enable us to develop better parameterization of the relevant processes in forecast models for weather, and inter-seasonal to inter-annual climate prediction.

The entry Technology Readiness Level (TRL) for the project is 4 and we expect that after the two year project duration the exit TRL will be 6.

 

Title

Selection Interactive Analysis of Heterogeneous Data to Determine the Impact of Weather on Crop Yield

Full Name

Kiri Wagstaff

Institution Name

JPL

Proposal #

AIST-QRS-04-3004

We will develop a versatile toolkit for statistical data mining and machine learning that will enable (1) rapid, in-depth analysis of subtle relationships between multiple, different science data products, and (2) efficient testing of competing scientific hypotheses. The toolkit will feature advanced methods that are optimized for the analysis of data with spatial dependencies. We will include technologies for classification (support vector machines), clustering (using spatial constraints), and prediction (multivariate spatial models) that currently exist only as standalone methods (TRL 4). These techniques will be refined and demonstrated in a critical scientific investigation: a study of the fine-scale effects of varying weather patterns on agricultural crop yields. The final system will be an integrated toolkit with an easy-to-use graphical interface, demonstrated on full-scale science data (TRL 6). As a result of this work, scientists will be able to easily answer questions such as, "What is the impact on corn yield if Kansas receives only 75% of its normal rainfall?" Benefits over the state of the art include a) analysis methods that scale to global data sets and b) the ability to integrate heterogeneous data sources for improved prediction accuracy. The anticipated period of performance is October 2004 to September 2006.