SBIR/STTR Award attributes
One of the key challenges for predicting, understanding and assessing causes of global population migration is finding the right data, cleaning that data, and having the right methods to access the data. Like most data science problems, this challenge is being addressed by scientists across academia and the private sector by creating new datasets and mining existing open source data to develop models to predict patterns of population migration and the causes and consequences of these patterns. This process requires extensive effort searching for and assessing of diverse data from different sources and this data are structured in different ways which adds time and effort. After all that effort, more effort is required to access and aggregate the data due to both temporal and spatial aspects of the problem of population migration. This program seeks to address the following challenges associated with the diverse data required for analyzing population migrations: • Finding datasets that support the analysis of population migration • Cleaning the data to ensure the data is ready for analysis • Indexing, querying, and aggregating data in a way that supports spatio-temporal analysis • Dealing with the varying structure of the data sources (ie. qualitative vs. quantitative) In this proposal, we lay out a plan to prototype Framework for Analysis of Diverse Data (FADD), a tool for collecting, cleaning, analyzing data in support of predicting and understanding global population migration. The FADD system will provide the following functionality: • A data ingest service for adding data to FADD • A data cleaning component that automates the process of data preprocessing • A data access service that provides an interface for analytics running outside of FADD • A set of internal services that provide spatio-temporal queries, aggregation, and indexing • A model/analytic access service for utilizing models and analytics within FADD • A means to add models and analytics to FADD in a “plug and play” fashion Furthermore, models and analytics within FADD can produce data that can be added into the system via the data ingest service. The primary objective is the development and demonstration of a prototype system that can ingest a small number of data sets (at least one structured and one unstructured) and show the results of an analytic operating on those data sets. The results of the analytic will be displayed with an off-the-shelf visualization.

