Past Projects

eXtreme OpenMP

The eXtreme OpenMP project is a effort to enhance OpenMP with productive, portable, and efficient extreme-scale programming, a single programming model from heterogeneous multicore through large-scale distributed systems, a minimal yet powerful set of language extensions, and sophisticated implementation technology. Language research includes expression of parallelism,program locality, thread synchronization, parallel I/O, and program adaptivity. Expressions of parallelism include implementing long-lived nested regions, naming of parallel regions, and mapping of parallel regions to hardware.The problem of locality will be addressed by providing affinity of data and threads, thread subteams,
and data mapping hints. We intend to provide point-to-point synchronization, transactions, and synchronization attributes and parallel I/O via thread-collective I/O interfaces and hints.We will also provide OpenMP with adaptivity through asynchronous tasks and dynamically adjustable thread teams.

This work is supported by the under grant # CCF-0833201.
For more information, please visit the project site.

DARWIN - Dynamic Adaptive Runtime Infrastructure

In this project, we exploit existing compiler technology for automatic parallelization and OpenMP translation in order to facilitate application development for multicore systems. To do so, we are extending our robust, open source OpenUH compiler for Fortran 95/C/C++ and OpenMP to enable it to combine automatic parallelization and conventional OpenMP translation strategies. We are implementing the new tasking features of OpenMP to increase the usefulness of this programming model and are considering new static optimization strategies.

The Dragon Analysis Tool

The Dragon Analysis Tool, which supports OpenMP application development, is built on top of the OpenUH compiler. In addition to collecting and displaying the results of traditional static program analyses (e.g. Callgraph, Control Flow Graph and Dependence Graph), Dragon is able to instrument a program to gather and display dynamic execution details. A module is being added to automatically generate OpenMP directives. Other on-going work includes the attempt to gather precise information on thread-specific access to shared data at run time, and the integration of several tools and a compiler to provide a complete environment for the application development life cycle.

Dragon tool

PModels and PModels2 Project

The Center for Programming Models for Scalable Parallel Computing is focused on research and development in the area of programming models for scalable parallel computing. Work carried out in this Center advances the state of the art in the understanding, definition, implementation, and use of models expressed in libraries, languages, and annotations.

Apart from our group at University of Houston, this project involves participants from Argonne National Laboratory, Ohio State University, Pacific Nothwest National Laboratory, Rice
University, UC at Berkeley and University of Illinois.

HPCTools research group has joined the continuation of the successful project PModels currently named as PModels2.
This work is supported by the under grant # DE-FC02-06ER25759.

OpenMP Language for Multi-Core Architecture

OpenMP was designed for flat systems. But even some current SMPs do not provide equal cost of access to memory. Multi-core platforms may be hierarchical, as they may also exploit simultaneous or interleaved multithreading, and subsets of threads may share substantial resources. As a result, the way in which computation is mapped to the hardware may have a major performance impact. OpenMP provides features for assigning work to user-level threads, but not for the subgrouping of these threads, the mapping of them to the hardware, or for data placement. It has no point-to-point synchronization.


We are currently working within the OpenMP ARB to explore potential new features for OpenMP. There remains a tension between the need to enable highest performance for those programmers who require it, and the desire to keep OpenMP as simple and straightforward as possible. We have proposed language extensions that define, shape and exploit sub-teams of threads and permit a finer degree of thread synchronization and data locality. These ideas can be used to parallelize multi-dimensional loop nests for large thread counts, to describe a variety of execution scenarios including pipelining, as well as to assign work flexibly across a system with non-uniform resource sharing. We have successfully implemented and tested the subteam concept in the OpenUH compiler.

Cluster-Enabled OpenMP

Given the importance of clusters, we are evaluating approaches to providing OpenMP on clusters. The traditional approach relies on software distributed shared memory, which incurs high overheads and (unless it is integrated with a compiler) is not amenable to important code optimizations. An alternative solution involving translation to MPI is hard to implement. We chose instead to explore a translation using Global Arrays. Our solution is both simpler and permits a variety of compile-time improvements. This translation has been specified and an implementation is under way outside the current OpenUH compiler release, since optimization work is on-going.

OpenMP to GA

Modeling of Hybrid MPI+OpenMP Code

We have also worked to create a framework for performance modeling of hybrid OpenMP and MPI applications. The OpenUH compiler determines an application signature statically, and a parallelization overhead measurement benchmark, realized by Sphinx and Perfsuite, collect system profiles. Based upon these, we have proposed a performance evaluation measurement system to identify communication and computation efficiency.

Hybrid MPI and OpenMP

Our approach has the advantage that it does not need to execute the program. Our methodology is not only able to identify parallelization efficiency, it can also predict application performance. It can also be extended to support other programming models such as UPC and global arrays.
The work is funded by under contract CCF-0444468 and under contract DE-FC03-01ER25502.

Embedded high-level programming model and Medical Imaging Project

The goal of this project is to implement Medical UltraSound on BeagleBoard. The USB-powered Beagle Board delivers laptop-like performance and expansion. To discover more about Beagle Board, click the link below

We are collaborating with to achieve this project.Texas Instruments (TI) is a global analog and digital semiconductor IC design and manufacturing company. In addition to analog technologies, digital signal processing (DSP) and microcontroller (MCU) semiconductors, TI designs and manufactures semiconductor solutions for analog and digital embedded and application processing. Click this link to view a demo of the Medical Ultrasound Project.

OpenMP Validation Suite

This validation suite is the result of a collaborative effort between the group & the High Performance Computing Center at Stuttgart, Germany . The test suite is in conformance with OpenMP specifications 1.0 and 2.0, is complete for both Fortran and C, and permits tests to be individually or collectively executed. The test suite can be downloaded at:

OpenMP Validation Suite

Portable High-Level Programming Model for Heterogeneous Computing Based on OpenMP

The goal of this NSF project is to simplify the process of programming and deploying code on heterogeneous platforms by providing a single, high-level programming interface that may be used across, and within, multicore processors and a broad variety of accelerators. The objective is to design and implement portable, high-level directives that will enable the application developer to specify code regions for acceleration, along with the necessary synchronization and data motion, as well as to translate the regions themselves for execution on different kinds of accelerator boards. Our intent was to define suitable enhancements to the industry standard OpenMP API for shared memory parallel programming. The work is funded by [] under the contract CCF-0917285.

The project outcomes are as follows:

  • Developed a unified high-level portable programming model, libEOMP, based on OpenMP that provides both the necessary control over program execution and a high level of program abstraction on heterogeneous systems.
  • Explored compiler techniques for generating code and delivering satisfying performance for host and various devices.
  • Explored energy profiles of some commonly used kernels in HPC and quantified the important relationship between energy consumption and computation-communication factors of certain application kernels
  • Evaluated OpenMP using a real-world based ultrasonic imaging application and a variety of benchmarks.
  • Created extensions to the OpenMP task model to describe data access relationship between asynchronous tasks.
  • Explored an emerging programming model for accelerators, OpenACC . Created an OpenACC 1.0 validation and verification suite
  • Created an OpenACC NAS parallel benchmark (NPB) suite

    Links related to this project include: OpenMP , OpenMP References/Books , Multicore Association (MCA)

OpenACC SPEC Benchmark Suite

In this project, we collaborated with NVIDIA (members of SPEC HPG) and created an OpenACC benchmark suite. Our main aim is to port SPEC OMP2012 benchmark suite to OpenACC along with creating OpenACC versions of several other real-world applications from domains such as data mining, bioinformatics, image processing etc. We performed cutting-edge research on benchmarking methodologies and developed OpenACC-based standard application scenarios and workloads for potential use with emerging architectures.

© 2013 HPCTools. All rights reserved. | Website design by HPCTools Group