Abstract:
Numerical models are used extensively for simulating complex physical
systems including fluid flows, astronomical events, weather, and
climate.  Many researchers struggle to bring their model developments
from single-computer, interpreted languages to parallel high-performance
computing (HPC) systems.  There are initiatives to make interpreted
languages such as MATLAB, Python, and Julia feasible for HPC
programming.  In this talk I argue that the computational overhead
is far costlier than any potential development time saved.  Instead,
doing model development in C and unix tools from the start minimizes
porting headaches between platforms, reduces energy use on all
systems, and ensures reproducibility of results.


## brcon2020 - 2020-05-02

        title: Energy efficient programming in science

       author: Anders Damsgaard (adc)

      contact: anders@adamsgaard.dk
               gopher://adamsgaard.dk
               https://adamsgaard.dk


## About me

* 33 y/o Dane
* #bitreich-en since 2019-12-16

present:

* postdoctoral scholar at Stanford University (US)
* lecturer at Aarhus University (DK)

previous:

* Danish Environmental Protection Agency (DK)
* Scripps Institution of Oceanography (US)
* National Oceanic and Atmospheric Administration (NOAA, US)
* Princeton University (US)

#pause
academic interests:

* ice sheets, glaciers, and climate
* earthquake and landslide physics
* modeling of fluid flows and granular materials


## Numerical modeling

* numerical models used for simulating complex physical systems

  * n-body simulations: planetary formation, icebergs, soil/rock mechanics

  * fluid flows (CFD): aerodynamics, weather, climate


* domains and physical processes split up into small, manageable chunks


## From idea to application


    1. Construct system of equations

      |
      v

    2. Derivation of numerical algorithm

      |
      v

    3. Prototype in high-level language

      |
      v

    4. Re-implementation in low-level language


## From idea to application

 ,-----------------------------------------------.
 |  1. Construct system of equations             |
 |                                               |
 |    |                                          |
 |    v                                          |          _
 |                                               |     ___ | | __
 |  2. Derivation of numerical algorithm         |    / _ \| |/ /
 |                                               |   | (_) |   <
 |    |                                          |    \___/|_|\_\
 |    v                                          |
 |                                               |
 |  3. Prototype in high-level language          |
 `-----------------------------------------------'
      |                                              _       _
      v                                             | | ___ | | __
                                                    | |/ _ \| |/ /
    4. Re-implementation in low-level language      |_| (_) |   <
                                                    (_)\___/|_|\_\


## Numerical modeling

      task: Solve partial differential equations (PDEs) by stepping through time
            PDEs: conservation laws; mass, momentum, enthalpy

   example: Heat diffusion through homogenous medium

            ∂T
            -- = -k ∇²(T)
            ∂t

    domain:

       .---------------------------------------------------------------------.
       |                                                                     |
       |                                  T                                  |
       |                                                                     |
       '---------------------------------------------------------------------'

## Numerical modeling

    domain: discritize into n=7 cells

       .---------+---------+---------+---------+---------+---------+---------.
       |         |         |         |         |         |         |         |
       |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
       |         |         |         |         |         |         |         |
       '---------+---------+---------+---------+---------+---------+---------'

#pause
* Numerical solution with high-level programming:

    MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t)

    Python: fenics.solve(lhs==rhs, heat_pde, heat_bc)

     Julia: sol = solve(heat_pde, CVODE_BPF(linear_solver=:Diagonal); rel_tol, abs_tol)

        (the above are not entirely equivalent, but you get the point...)


## Numerical solution: Low-level programming

    example BC: outer boundaries constant temperature (T₁ & T₇)

* computing ∇²(T)

       .---------+---------+---------+---------+---------+---------+---------.
       |         |         |         |         |         |         |         |
  t    |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
       |         |         |         |         |         |         |         |
       '----|--\-+----|--\-+-/--|--\-+-/--|--\-+-/--|--\-+-/--|----+-/--|----'
            |   \     |   \ /   |   \ /   |   \ /   |   \ /   |     /   |  
            |    \    |    /    |    /    |    /    |    /    |    /    |   
            |     \   |   / \   |   / \   |   / \   |   / \   |   /     |    
       .----|----+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+----|----.
       |         |         |         |         |         |         |         |
t + dt |    T₁   |    T₂   |    T₃   |    T₄   |    T₅   |    T₆   |    T₇   |
       |         |         |         |         |         |         |         |
       '---------+---------+---------+---------+---------+---------+---------'
       |<- dx  ->|


## Numerical solution: Low-level programming

* explicit solution with central finite differences:

        for (t=0.0; t<t_end; t+=dt) {
            for (i=1; i<n-1; i++)
                T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
            tmp = T;
            T = T_new;
            T_new = tmp;
        }
#pause

* implicit, iterative solution with central finite differences:

        for (t=0.0; t<t_end; t+=dt) {
            do {
                for (i=1; i<n-1; i++) {
                    T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt;
                r_norm_max = 0.0;
                for (i=1; i<n-1; i++)
                    if (fabs((T_new[i] - T[i])/T[i]) > r_norm_max)
                        r_norm_max = fabs((T_new[i] - T[i])/T[i]);
                tmp = T;
                T = T_new;
                T_new = tmp;
            } while (r_norm_max < RTOL);
        }


## HPC platforms

* Stagnation of CPU clock frequency

* Performance through massively parallel deployment (MPI, GPGPU)

    * NOAA/DOE NCRC Gaea cluster
        * 2x Cray XC40, "Cray Linux Environment"
        * 4160 nodes, each 32 to 36 cores, 64 GB memory
        * infiniband
        * total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak)

## A (non-)solution

* high-level, interpreted code with extensive solver library -> low-level, compiled, parallel code

* suggested workaround: port interpreted high-level languages to HPC platforms

#pause

NO!

* high computational overhead 
* many machines
* reduced performance and energy efficiency


## A better way

    1. Construct system of equations

      |
      v

    2. Derivation of numerical algorithm

      |
      v

    3. Prototype in low-level language

      |
      v

    4. Add parallelization for HPC


## Example: Ice-sheet flow with sediment/fluid modeling


    --------------------------._____                   ATMOSPHERE
                ----->              ```--..
        ICE                                 `-._________________      __
                ----->                             ------>      |vvvv|  |vvv
                                               _________________|    |__|
                ----->                      ,'
                                          ,'    <><      OCEAN
                ---->                    /                        ><>
    ____________________________________/___________________________________
      SEDIMENT  -->
    ________________________________________________________________________

* example: granular dynamics and fluid flow simulation for glacier flow

* 90% of Antarctic ice sheet mass driven by ice flow over sediment

* need to understand ice-basal sliding in order to project sea-level rise


## Algorithm matters

                sphere: git://src.adamsgaard.dk/sphere
                        C++, Nvidia C, cmake, Python, Paraview
                        massively parallel, GPGPU
                        detailed physics
                        20,191 LOC
#pause
                        3 month computing time on nvidia tesla k40 (2880 cores)

#pause
* gained understanding of the mechanics (what matters and what doesn't)
* simplify the physics, algorithm, and numerics

#pause
    1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear
                        C99, makefiles, gnuplot
                        single threaded
                        simple physics
                        2,348 LOC
#pause
                        real: 0m00.07 s on laptop from 2012

#pause
                        ...guess which one is more portable?

## Summary

for numerical simulation:

* high-level languages
    * easy
    * produces results quickly
    * does not develop low-level programming skills
    * no insight into numerical algorithm
    * realistically speaking: no direct way to HPC

* low-level languages
    * require low-level skills
    * saves electrical energy
    * directly to HPC, just sprinkle some MPI on top


## Thanks

    20h && /names #bitreich-en