Single-core optimization: vectorization, effective exploiting of pipelines and instruction-level parallelism, accessing and using hardware performance counters.
Usage of advanced features and techniques of both MPI and OpenMP will be covered. Among other topics in MPI: the effective usage of MPI on large-scale systems with non-blocking communications and complex topologies, the remote inter-node memory access and the intra-node shared memory, massively parallel I/O. Among other topics in OpenMP: advanced affinity control, task decomposition, vectorizations and elements of heteogenous acceleration.