Sponsor: National Science Foundation

The need for collaborative data analysis increases significantly when confronted with the challenges of big data. Although workflow tools offer a formal way to define, automate, and repeat multi-step computational procedures, designing complex data process workflow requires collaboration from multiple people with complementary expertise.  Existing tools are not suitable to support collaborative design of comprehensive workflows.  To address such a challenge, this project aims to design and develop a software infrastructure with the capability of supporting collaborative data-oriented workflow composition and management, adding a key component to existing NSF cyberinfrastructure that will support big data collaboration through the Internet.  Reproducibility and scalability are two major targets.  The project extends an existing open-source workflow tool, VisTrails, by adding system-level facilities to support human interaction and cooperation that are essential for an effective and efficient scientific collaboration.

 

This project will produce five outcomes:

  • A collaborative provenance data model equipped with a graph-level provenance querying formalism;
  • A type-theoretic approach for addressing format transformations;
  • Hypergraph theory-based algorithms for provenance management and mining;
  • A software tool supporting (a)synchronous collaborative scientific workflow design, composition, reproduction, and visualization; and
  • Principles, methodologies, experiences, and lessons that support the development of a generically applicable collaborative scientific workflow composition tool.

The resulting tools will explore the potential for using scientific workflows to accelerate scientific discoveries that require a collaborative effort on big data analytics.  The design of the tools is targeted toward use cases in the civil engineering discipline, but has the potential to broadly impact other areas of science and engineering.  Partnership with VisTrails enables usage and evaluation of the techniques in the VisTrails end user community.