diff --git a/docs/source/accelerators.rst b/docs/source/accelerators.rst new file mode 100644 index 00000000..e60bfff2 --- /dev/null +++ b/docs/source/accelerators.rst @@ -0,0 +1,109 @@ +.. _accelerators: + +************ +Accelerators +************ + +The most computationally intensive parts of gprMax, which are the FDTD solver loops, have been parallelised using: + +1. `OpenMP `_ which supports multi-platform shared memory multiprocessing. +2. `NVIDIA CUDA `_ for NVIDIA GPUs. +3. `OpenCL `_ for a wider range of CPU and GPU hardware. + +Additionally the Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This can be useful in many GPR simulations where a B-scan (composed of multiple A-scans) is required. Each A-scan can be task-farmed as a independent model. Within each independent model OpenMP or CUDA accelerators described above can be used for parallelism. Overall this creates what is known as a mixed mode OpenMP/MPI or CUDA/MPI job. + +Some of these accelerators and frameworks require additional software to be installed. The guidance below explains how to do that and gives examples of usage. + +OpenMP +====== + +No additional software is required to use OpenMP as it is part of the standard installation of gprMax. + +By default gprMax will try to determine and use the maximum number of OpenMP threads (usually the number of physical CPU cores) available on your machine. You can override this behaviour in two ways: firstly, gprMax will check to see if the ``#cpu_threads`` command is present in your input file; if not, gprMax will check to see if the environment variable ``OMP_NUM_THREADS`` is set. This can be useful if you are running gprMax in a High-Performance Computing (HPC) environment where you might not want to use all of the available CPU cores. + +MPI +=== + +By default the MPI task farm functionality is turned off. It can be used with the ``-mpi`` command line option, which specifies the total number of MPI tasks, i.e. master + workers, for the MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 61``. + +Software required +----------------- + +The following steps provide guidance on how to install the extra components to allow the MPI task farm functionality with gprMax: + +1. Install MPI on your system. + +Linux/macOS +^^^^^^^^^^^ +It is recommended to use `OpenMPI `_. + +Microsoft Windows +^^^^^^^^^^^^^^^^^ +It is recommended to use `Microsoft MPI `_. Download and install both the .exe and .msi files. + +2. Install the ``mpi4py`` Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax`. Run :code:`pip install mpi4py` + + +CUDA +==== + +Software required +----------------- + +The following steps provide guidance on how to install the extra components to allow gprMax to run on your NVIDIA GPU: + +1. Install the `NVIDIA CUDA Toolkit `_. You can follow the Installation Guides in the `NVIDIA CUDA Toolkit Documentation `_ You must ensure the version of CUDA you install is compatible with the compiler you are using. This information can usually be found in a table in the CUDA Installation Guide under System Requirements. +2. You may need to add the location of the CUDA compiler (:code:`nvcc`) to your user path environment variable, e.g. for Windows :code:`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin` or Linux/macOS :code:`/Developer/NVIDIA/CUDA-10.0/bin`. +3. Install the pycuda Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax`. Run :code:`pip install pycuda` + +Example +------- + +Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax` + +Run one of the test models: + +.. code-block:: none + + (gprMax)$ python -m gprMax examples/cylinder_Ascan_2D.in -gpu + +.. note:: + + If you want to select a specific GPU card on your system, you can specify an integer after the :code:`-gpu` flag. The integer should be the NVIDIA CUDA device ID for a specific GPU card. If it is not specified it defaults to device ID 0. + + +OpenCL +====== + +Software required +----------------- + +The following steps provide guidance on how to install the extra components to allow gprMax to use OpenCL: + +TODO: Add OpenCL instructions +***************************** + +Example +------- + + + + + +CUDA/MPI +======== + +Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This is described in more detail in the :ref:`HPC section `. MPI can be combined with the GPU functionality to allow a series models to be distributed to multiple GPUs on the same machine (node). + +Example +------- + +For example, to run a B-scan that contains 60 A-scans (traces) on a system with 4 GPUs: + +.. code-block:: none + + (gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 5 -gpu 0 1 2 3 + +.. note:: + + The argument given with `-mpi` is number of MPI tasks, i.e. master + workers, for MPI task farm. So in this case, 1 master (CPU) and 4 workers (GPU cards). The integers given with the `-gpu` argument are the NVIDIA CUDA device IDs for the specific GPU cards to be used. diff --git a/docs/source/gpu.rst b/docs/source/gpu.rst deleted file mode 100644 index 0d321e49..00000000 --- a/docs/source/gpu.rst +++ /dev/null @@ -1,55 +0,0 @@ -.. _gpu: - -************ -Accelerators -************ - -The most computationally intensive parts of gprMax, which are the FDTD solver loops, can be accelarated using General-purpose computing on graphics processing units (GPGPU). There are two different frameworks that can be used depending on the hardware you have available: - -1. For `NVIDIA CUDA-Enabled GPUs `_, the NVIDIA CUDA programming environment can be utilised. -2. For a wider range of CPU and GPU hardware, `OpenCL `_ can be utilised. - -Both frameworks require some additional software components to be installed. - -CUDA -==== - -The following steps provide guidance on how to install the extra components to allow gprMax to run on your NVIDIA GPU: - -1. Install the `NVIDIA CUDA Toolkit `_. You can follow the Installation Guides in the `NVIDIA CUDA Toolkit Documentation `_ You must ensure the version of CUDA you install is compatible with the compiler you are using. This information can usually be found in a table in the CUDA Installation Guide under System Requirements. -2. You may need to add the location of the CUDA compiler (:code:`nvcc`) to your user path environment variable, e.g. for Windows :code:`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin` or Linux/macOS :code:`/Developer/NVIDIA/CUDA-10.0/bin`. -3. Install the pycuda Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax`. Run :code:`pip install pycuda` - -OpenCL -====== - -TODO - -Running gprMax using GPU(s) -=========================== - -Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax` - -Run one of the test models: - -.. code-block:: none - - (gprMax)$ python -m gprMax examples/cylinder_Ascan_2D.in -gpu - -.. note:: - - If you want to select a specific GPU card on your system, you can specify an integer after the :code:`-gpu` flag. The integer should be the NVIDIA CUDA device ID for a specific GPU card. If it is not specified it defaults to device ID 0. - - -Combining MPI and CUDA usage ----------------------------- - -Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This is described in more detail in the :ref:`OpenMP, MPI, HPC section `. MPI can be combined with the GPU functionality to allow a series models to be distributed to multiple GPUs on the same machine (node). For example, to run a B-scan that contains 60 A-scans (traces) on a system with 4 GPUs: - -.. code-block:: none - - (gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 5 -gpu 0 1 2 3 - -.. note:: - - The argument given with `-mpi` is number of MPI tasks, i.e. master + workers, for MPI task farm. So in this case, 1 master (CPU) and 4 workers (GPU cards). The integers given with the `-gpu` argument are the NVIDIA CUDA device IDs for the specific GPU cards to be used. diff --git a/docs/source/openmp_mpi.rst b/docs/source/hpc.rst similarity index 56% rename from docs/source/openmp_mpi.rst rename to docs/source/hpc.rst index b3a43342..fe6716fa 100644 --- a/docs/source/openmp_mpi.rst +++ b/docs/source/hpc.rst @@ -1,50 +1,14 @@ -.. _openmp-mpi: +.. _hpc: -******************** -OpenMP, MPI, and HPC -******************** - -OpenMP -====== - -The most computationally intensive parts of gprMax, which are the FDTD solver loops, have been parallelised using `OpenMP `_ which supports multi-platform shared memory multiprocessing. - -By default gprMax will try to determine and use the maximum number of OpenMP threads (usually the number of physical CPU cores) available on your machine. You can override this behaviour in two ways: firstly, gprMax will check to see if the ``#cpu_threads`` command is present in your input file; if not, gprMax will check to see if the environment variable ``OMP_NUM_THREADS`` is set. This can be useful if you are running gprMax in a High-Performance Computing (HPC) environment where you might not want to use all of the available CPU cores. - -MPI -=== - -The Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This can be useful in many GPR simulations where a B-scan (composed of multiple A-scans) is required. Each A-scan can be task-farmed as a independent model. Within each independent model OpenMP threading will continue to be used (as described above). Overall this creates what is know as a mixed mode OpenMP/MPI job. - -By default the MPI task farm functionality is turned off. It can be used with the ``-mpi`` command line option, which specifies the total number of MPI tasks, i.e. master + workers, for the MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 61``. - -Extra installation steps for MPI task farm usage ------------------------------------------------- - -The following steps provide guidance on how to install the extra components to allow the MPI task farm functionality with gprMax: - -1. Install MPI on your system. - -Linux/macOS -^^^^^^^^^^^ -It is recommended to use `OpenMPI `_. - -Microsoft Windows -^^^^^^^^^^^^^^^^^ -It is recommended to use `Microsoft MPI `_. Download and install both the .exe and .msi files. - -2. Install the ``mpi4py`` Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax`. Run :code:`pip install mpi4py` - -.. _hpc_script_examples: - -HPC job script examples -======================= +******************************** +High-performance computing (HPC) +******************************** HPC environments usually require jobs to be submitted to a queue using a job script. The following are examples of job scripts for a HPC environment that uses `Open Grid Scheduler/Grid Engine `_, and are intended as general guidance to help you get started. Using gprMax in an HPC environment is heavily dependent on the configuration of your specific HPC/cluster, e.g. the names of parallel environments (``-pe``) and compiler modules will depend on how they were defined by your system administrator. OpenMP example --------------- +============== :download:`gprmax_omp.sh <../../toolboxes/Utilities/HPC/gprmax_omp.sh>` @@ -58,7 +22,7 @@ In this example 10 models will be run one after another on a single node of the OpenMP/MPI example ------------------- +================== :download:`gprmax_omp_mpi.sh <../../toolboxes/Utilities/HPC/gprmax_omp_mpi.sh>` @@ -76,7 +40,7 @@ The ``NSLOTS`` variable which is required to set the total number of slots/cores Job array example ------------------ +================= :download:`gprmax_omp_jobarray.sh <../../toolboxes/Utilities/HPC/gprmax_omp_jobarray.sh>` @@ -88,4 +52,4 @@ Here is an example of a job script for running models, e.g. A-scans to make a B- The ``-t`` tells Grid Engine that we are using a job array followed by a range of integers which will be the IDs for each individual task (model). Task IDs must start from 1, and the total number of tasks in the range should correspond to the number of models you want to run, i.e. the integer with the ``-n`` flag passed to gprMax. The ``-task`` flag is passed to gprMax to tell it we are using a job array, along with the specific number of the task (model) with the environment variable ``$SGE_TASK_ID``. -A job array means that exactly the same submit script is going to be run multiple times, the only difference between each run is the environment variable ``$SGE_TASK_ID``. +A job array means that exactly the same submit script is going to be run multiple times, the only difference between each run is the environment variable ``$SGE_TASK_ID``. \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index aee62539..6f260495 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -46,8 +46,8 @@ gprMax User Guide :maxdepth: 2 :caption: Performance - openmp_mpi - gpu + accelerators + hpc benchmarking .. toctree:: diff --git a/docs/source/output.rst b/docs/source/output.rst index 60eaa9bd..14710b7a 100644 --- a/docs/source/output.rst +++ b/docs/source/output.rst @@ -1,29 +1,29 @@ .. _output: -*********** -Output data -*********** +************* +Model outputs +************* Field(s) output =============== -gprMax produces an output file that has the same name as the input file but with ``.h5`` appended. The output file uses the widely-supported `HDF5 `_ format which was designed to store and organize large amounts of numerical data. There are a number of free tools available to read HDF5 files. Also MATLAB has high- and low-level functions for reading and writing HDF5 files, i.e. ``h5info`` and ``h5disp`` are useful for returning information and displaying the contents of HDF5 files respectively. gprMax includes some Python modules (in the ``tools`` package) to help you view output data. These are documented in the :ref:`tools section `. +gprMax produces an output file that primarily contains time history data for electromagnetic field outputs (receivers) in the model. The output file has the same name as the input file but with ``.h5`` appended, and therefore uses the widely-supported `HDF5 `_ format which was designed to store and organize large amounts of numerical data. There are a number of free tools available to read HDF5 files. Also MATLAB has high- and low-level functions for reading and writing HDF5 files, i.e. ``h5info`` and ``h5disp`` are useful for returning information and displaying the contents of HDF5 files respectively. gprMax includes some Python modules (in the ``toolboxes/plotting`` package) to help you view output data. These are documented in :ref:`plotting toolbox section `. File structure -------------- The output file has the following HDF5 attributes at the root (``/``): -* ``gprMax`` is the version number of gprMax used to create the output -* ``Title`` is the title of the model -* ``Iterations`` is the number of iterations for the time window of the model -* ``nx_ny_nz`` is a tuple containing the number of cells in each direction of the model -* ``dx_dy_dz`` is a tuple containing the spatial discretisation, i.e. :math:`\Delta x`, :math:`\Delta y`, :math:`\Delta z` -* ``dt`` is the time step of the model, i.e. :math:`\Delta t` -* ``srcsteps`` is the spatial increment used to move all sources between model runs. -* ``rxsteps`` is the spatial increment used to move all receivers between model runs. -* ``nsrc`` is the total number of sources in the model. -* ``nrx`` is the total number of receievers in the model. +- ``gprMax`` is the version number of gprMax used to create the output +- ``Title`` is the title of the model +- ``Iterations`` is the number of iterations for the time window of the model +- ``nx_ny_nz`` is a tuple containing the number of cells in each direction of the model +- ``dx_dy_dz`` is a tuple containing the spatial discretisation, i.e. :math:`\Delta x`, :math:`\Delta y`, :math:`\Delta z` +- ``dt`` is the time step of the model, i.e. :math:`\Delta t` +- ``srcsteps`` is the spatial increment used to move all sources between model runs. +- ``rxsteps`` is the spatial increment used to move all receivers between model runs. +- ``nsrc`` is the total number of sources in the model. +- ``nrx`` is the total number of receievers in the model. The output file contains HDF5 groups for sources (``srcs``), transmission lines (``tls``), and receivers (``rxs``). Within each group are further groups that correspond to individual sources/transmission lines/receivers, e.g. ``src1``, ``src2`` etc... @@ -103,7 +103,10 @@ Within each individual ``tl`` group are the following datasets: Snapshots --------- -Snapshot files use the open source `Visualization ToolKit (VTK) `_ format which can be viewed in many free readers, such as `Paraview `_. Paraview is an open-source, multi-platform data analysis and visualization application. It is available for Linux, macOS, and Windows. The ``#snapshot:`` command produces an ImageData (.vti) snapshot file containing electric and magnetic field data and current data for each time instance requested. +Snapshot files contain a snapshot of the electromagnetic field values of a specified volume of the model domain at a specified point in time during the simulation. By default snapshot files use the open source `Visualization ToolKit (VTK) `_ format which can be viewed in many free readers, such as `Paraview `_. Paraview is an open-source, multi-platform data analysis and visualization application. It is available for Linux, macOS, and Windows. You can optionally output snapshot files using the HDF5 format if desired. + +TODO: UPDATE Example +******************** .. tip:: You can take advantage of Python scripting to easily create a series of snapshots. For example, to create 30 snapshots starting at time 0.1ns until 3ns in intervals of 0.1ns, use the following code snippet in your input file. Replace ``xs, ys, zs, xf, yf, zf, dx, dy, dz`` accordingly. @@ -137,6 +140,9 @@ Geometry output Geometry files use the open source `Visualization ToolKit (VTK) `_ format which can be viewed in many free readers, such as `Paraview `_. Paraview is an open-source, multi-platform data analysis and visualization application. It is available for Linux, Mac OS X, and Windows. +TODO: UPDATE file formats +************************* + The ``#geometry_view:`` command produces either ImageData (.vti) for a per-cell geometry view, or PolygonalData (.vtp) for a per-cell-edge geometry view. The per-cell geometry views also show the location of the PML regions and any sources and receivers in the model. The following are steps to get started with viewing geometry files in Paraview: .. _pv_toolbar: