http://www.tokyo.rist.or.jp/jahpf/
JAHPF started in 1996 and its major goals are;
(1)Proposing a set of HPF extensions (HPF/JA) to improve performance
and applicability of High Performance Fortran,
(2)Accumulating experiences of parallelizing with HPF, and making feedback
to HPF specification and compiler implementation,
(3)Developing HPF benchmark programs, and
(4)Preparing documents for user friendly parallel programming.
This June, we published the Japanese translation of HPF 2.0 specification
from Springer-Verlag Tokyo, ISBN4-431-70822-7. It includes newly designed
HPF extensions, HPF/JA 1.0 as an appendix. The English version of the HPF/JA
specification is available at http://www.tokyo.rist.or.jp/jahpf/spec/jahpf-e.html.
The extensions are designed to give users more control over sophisticated
parallelizations and communication optimizations. They include parallelization
of loops with complicated reductions, asynchronous communication, user
controllable SHADOW, and communication pattern reuse for irregular remote
data accesses.
JAHPF has also been making a lot of efforts on evaluating HPF's applicability
to real-world HPC applications. Five applications have successfully parallelized
so far by the tight cooperation between user and vendor members, and four
more are now on their way. Compilers capable of handling HPF/JA extensions
are now under development by the vendor members and will be available next
year.
Benchmarking
We have so far developed and tested four benchmark programs in the
field of real-world scientific computation. These programs are trimmed
to be relatively compact (a few K steps on average) but still contain the
core part of the calculation.
(1) CARPARRI: Molecular dynamics
(Car-Parrinello)
(2) ESPAC2: 2D electro-static plasma
(PIC method)
(3) IMPACT3D: 3D CFD code with
TVD scheme
(4) CIP2D: 2D CFD code with CIP method.
We have also developed the experimental HPF code, called as NJR, for
analyzing the global climate change on the high performance computer, '
Earth Simulator. ' For this code we have made the performance evaluation
on NEC Cenju-4, Hitachi SR2201 and Fujitsu VPP5000.
z=0.0 !HPFJ INDEPENDENT,REDUCTION(MAX:z) DO i=1,n IF(z<A(i)) z=A(i) END |
Asynchronous communication
The ASYNCHRONOUS directive specifies the possibility of overlapping
communication with computation in the similar way to asynchronous I/O in
the HPF2.0 approved extensions.
Example: Communication overhead of redistribution will be hidden.
!HPFJ ASYNCHRONOUS(ID=q1)BEGIN !HPF$ REDUSTRUBYTE A(*,CYCLIC) !HPFJ END ASYNCHRONOUS ...communication behind the computation !HPFJ ASYNCWAIT(ID=q1) |
Explicit access to SHADOW and
local data
SHADOW is a set of additional array elements that extends the array
section on each processor. Compilers use it as a buffer for the corresponding
elements on the neighboring processors, but users can't in the original
HPF. HPF/JA allows users to access the shadow explicitly.
The REFLECT directive specifies that each value of the shadow data
should be updated with the value of the corresponding data object stored
in its owner. (Fig.1)
When the EXT_HOME clause is specified with ON directive instead of
HOME clause, the set of owner processors is extended with the processors
which own the SHADOW areas. And the LOCAL clause guarantees the access
without interprocessor communication. (Fig,2)
The LOCAL directive/clause eliminates the redundant communication to
the shadow and other data objects. While the RESIDENT directive/clause
(in the original HPF) does not assert disuse of communication inside the
active processor set, the LOCAL directive/clause asserts that communication
is not needed anyway.
Communication schedule reuse
for irregular array accesses
The INDEX_REUSE directive asserts that access patterns for indirectly
accessed arrays remain the same through consecutive invocations of loops.
The directive can be placed before independent DO or FORALL loops to assert
that indices of the specified array have the same values as those for the
previous loop invocation.
Example: Access pattern to the specified array will be preserved
at the first time and may be utilized later.
!HPFJ INCEX_REUSE(frag)A,B !HPF$ INDEPENDENT DO i=1,n . . . =A(lx(i))+B(ly(i)) END DO |