Abstract: Numerical models are used extensively for simulating complex physical systems including fluid flows, astronomical events, weather, and climate. Many researchers struggle to bring their model developments from single-computer, interpreted languages to parallel high-performance computing (HPC) systems. There are initiatives to make interpreted languages such as MATLAB, Python, and Julia feasible for HPC programming. In this talk I argue that the computational overhead is far costlier than any potential development time saved. Instead, doing model development in C and unix tools from the start minimizes porting headaches between platforms, reduces energy use on all systems, and ensures reproducibility of results. ## brcon2020 - 2020-05-02 title: Energy efficient programming in science author: Anders Damsgaard (adc) contact: anders@adamsgaard.dk gopher://adamsgaard.dk https://adamsgaard.dk ## About me * 33 y/o Dane * #bitreich-en since 2019-12-16 present: * postdoctoral scholar at Stanford University (US) * lecturer at Aarhus University (DK) previous: * Danish Environmental Protection Agency (DK) * Scripps Institution of Oceanography (US) * National Oceanic and Atmospheric Administration (NOAA, US) * Princeton University (US) #pause academic interests: * ice sheets, glaciers, and climate * earthquake and landslide physics * modeling of fluid flows and granular materials ## Numerical modeling * numerical models used for simulating complex physical systems * n-body simulations: planetary formation, icebergs, soil/rock mechanics * fluid flows (CFD): aerodynamics, weather, climate * domains and physical processes split up into small, manageable chunks ## From idea to application 1. Construct system of equations | v 2. Derivation of numerical algorithm | v 3. Prototype in high-level language | v 4. Re-implementation in low-level language ## From idea to application ,-----------------------------------------------. | 1. Construct system of equations | | | | | | | v | _ | | ___ | | __ | 2. Derivation of numerical algorithm | / _ \| |/ / | | | (_) | < | | | \___/|_|\_\ | v | | | | 3. Prototype in high-level language | `-----------------------------------------------' | _ _ v | | ___ | | __ | |/ _ \| |/ / 4. Re-implementation in low-level language |_| (_) | < (_)\___/|_|\_\ ## Numerical modeling task: Solve partial differential equations (PDEs) by stepping through time PDEs: conservation laws; mass, momentum, enthalpy example: Heat diffusion through homogenous medium ∂T -- = -k ∇²(T) ∂t domain: .---------------------------------------------------------------------. | | | T | | | '---------------------------------------------------------------------' ## Numerical modeling domain: discritize into n=7 cells .---------+---------+---------+---------+---------+---------+---------. | | | | | | | | | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ | | | | | | | | | '---------+---------+---------+---------+---------+---------+---------' #pause * Numerical solution with high-level programming: MATLAB: sol = pdepe(0, @heat_pde, @heat_initial, @heat_bc, x, t) Python: fenics.solve(lhs==rhs, heat_pde, heat_bc) Julia: sol = solve(heat_pde, CVODE_BPF(linear_solver=:Diagonal); rel_tol, abs_tol) (the above are not entirely equivalent, but you get the point...) ## Numerical solution: Low-level programming example BC: outer boundaries constant temperature (T₁ & T₇) * computing ∇²(T) .---------+---------+---------+---------+---------+---------+---------. | | | | | | | | t | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ | | | | | | | | | '----|--\-+----|--\-+-/--|--\-+-/--|--\-+-/--|--\-+-/--|----+-/--|----' | \ | \ / | \ / | \ / | \ / | / | | \ | / | / | / | / | / | | \ | / \ | / \ | / \ | / \ | / | .----|----+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+-\--|--/-+----|----. | | | | | | | | t + dt | T₁ | T₂ | T₃ | T₄ | T₅ | T₆ | T₇ | | | | | | | | | '---------+---------+---------+---------+---------+---------+---------' |<- dx ->| ## Numerical solution: Low-level programming * explicit solution with central finite differences: for (t=0.0; t<t_end; t+=dt) { for (i=1; i<n-1; i++) T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt; tmp = T; T = T_new; T_new = tmp; } #pause * implicit, iterative solution with central finite differences: for (t=0.0; t<t_end; t+=dt) { do { for (i=1; i<n-1; i++) { T_new[i] = T[i] - k*(T[i+1] - 2.0*T[i] + T[i-1])/(dx*dx) * dt; r_norm_max = 0.0; for (i=1; i<n-1; i++) if (fabs((T_new[i] - T[i])/T[i]) > r_norm_max) r_norm_max = fabs((T_new[i] - T[i])/T[i]); tmp = T; T = T_new; T_new = tmp; } while (r_norm_max < RTOL); } ## HPC platforms * Stagnation of CPU clock frequency * Performance through massively parallel deployment (MPI, GPGPU) * NOAA/DOE NCRC Gaea cluster * 2x Cray XC40, "Cray Linux Environment" * 4160 nodes, each 32 to 36 cores, 64 GB memory * infiniband * total: 200 TB memory, 32 PB SSD, 5.25 petaflops (peak) ## A (non-)solution * high-level, interpreted code with extensive solver library -> low-level, compiled, parallel code * suggested workaround: port interpreted high-level languages to HPC platforms #pause NO! * high computational overhead * many machines * reduced performance and energy efficiency ## A better way 1. Construct system of equations | v 2. Derivation of numerical algorithm | v 3. Prototype in low-level language | v 4. Add parallelization for HPC ## Example: Ice-sheet flow with sediment/fluid modeling --------------------------._____ ATMOSPHERE -----> ```--.. ICE `-._________________ __ -----> ------> |vvvv| |vvv _________________| |__| -----> ,' ,' <>< OCEAN ----> / ><> ____________________________________/___________________________________ SEDIMENT --> ________________________________________________________________________ * example: granular dynamics and fluid flow simulation for glacier flow * 90% of Antarctic ice sheet mass driven by ice flow over sediment * need to understand ice-basal sliding in order to project sea-level rise ## Algorithm matters sphere: git://src.adamsgaard.dk/sphere C++, Nvidia C, cmake, Python, Paraview massively parallel, GPGPU detailed physics 20,191 LOC #pause 3 month computing time on nvidia tesla k40 (2880 cores) #pause * gained understanding of the mechanics (what matters and what doesn't) * simplify the physics, algorithm, and numerics #pause 1d_fd_simple_shear: git://src.adamsgaard.dk/1d_fd_simple_shear C99, makefiles, gnuplot single threaded simple physics 2,348 LOC #pause real: 0m00.07 s on laptop from 2012 #pause ...guess which one is more portable? ## Summary for numerical simulation: * high-level languages * easy * produces results quickly * does not develop low-level programming skills * no insight into numerical algorithm * realistically speaking: no direct way to HPC * low-level languages * require low-level skills * saves electrical energy * directly to HPC, just sprinkle some MPI on top ## Thanks 20h && /names #bitreich-en