Docs work

这个提交包含在:
craig-warren
2023-03-11 11:13:06 -07:00
父节点 d90a872298
当前提交 2dd5eda28d
共有 6 个文件被更改,包括 50 次插入60 次删除

查看文件

@@ -1,13 +1,18 @@
.. _gpu:
*****
GPGPU
*****
************
Accelerators
************
The most computationally intensive parts of gprMax, which are the FDTD solver loops, can optionally be executed using General-purpose computing on graphics processing units (GPGPU). This has been achieved through use of the NVIDIA CUDA programming environment, therefore a `NVIDIA CUDA-Enabled GPU <https://developer.nvidia.com/cuda-gpus>`_ is required to take advantage of the GPU-based solver.
The most computationally intensive parts of gprMax, which are the FDTD solver loops, can be accelarated using General-purpose computing on graphics processing units (GPGPU). There are two different frameworks that can be used depending on the hardware you have available:
Extra installation steps for GPU usage
======================================
1. For `NVIDIA CUDA-Enabled GPUs <https://developer.nvidia.com/cuda-gpus>`_, the NVIDIA CUDA programming environment can be utilised.
2. For a wider range of CPU and GPU hardware, `OpenCL <https://www.khronos.org/api/opencl>`_ can be utilised.
Both frameworks require some additional software components to be installed.
CUDA
====
The following steps provide guidance on how to install the extra components to allow gprMax to run on your NVIDIA GPU:
@@ -15,6 +20,11 @@ The following steps provide guidance on how to install the extra components to a
2. You may need to add the location of the CUDA compiler (:code:`nvcc`) to your user path environment variable, e.g. for Windows :code:`C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin` or Linux/macOS :code:`/Developer/NVIDIA/CUDA-10.0/bin`.
3. Install the pycuda Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`conda activate gprMax`. Run :code:`pip install pycuda`
OpenCL
======
TODO
Running gprMax using GPU(s)
===========================
@@ -24,21 +34,21 @@ Run one of the test models:
.. code-block:: none
(gprMax)$ python -m gprMax user_models/cylinder_Ascan_2D.in -gpu
(gprMax)$ python -m gprMax examples/cylinder_Ascan_2D.in -gpu
.. note::
If you want to select a specific GPU card on your system, you can specify an integer after the :code:`-gpu` flag. The integer should be the NVIDIA CUDA device ID for a specific GPU card. If it is not specified it defaults to device ID 0.
Combining MPI and GPU usage
---------------------------
Combining MPI and CUDA usage
----------------------------
Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This is described in more detail in the :ref:`OpenMP, MPI, HPC section <openmp-mpi>`. MPI can be combined with the GPU functionality to allow a series models to be distributed to multiple GPUs on the same machine (node). For example, to run a B-scan that contains 60 A-scans (traces) on a system with 4 GPUs:
.. code-block:: none
(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60 -mpi 5 -gpu 0 1 2 3
(gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 5 -gpu 0 1 2 3
.. note::

查看文件

@@ -14,7 +14,7 @@ gprMax User Guide
.. toctree::
:maxdepth: 2
:caption: Using gprMax
:caption: Usage
input_hash_cmds
input_api
@@ -44,19 +44,18 @@ gprMax User Guide
.. toctree::
:maxdepth: 2
:caption: Advanced topics
:caption: Performance
python_scripting
openmp_mpi
gpu
benchmarking
.. toctree::
:maxdepth: 2
:caption: Accuracy and performance
:caption: Accuracy
comparisons_analytical
comparisons_numerical
benchmarking
.. toctree::
:maxdepth: 2

查看文件

@@ -1,8 +1,8 @@
.. _api:
******************************
Model building with Python API
******************************
*******************************
Model building using Python API
*******************************
Introduction
============

查看文件

@@ -1,8 +1,8 @@
.. _commands:
*********************************
Model building with hash commands
*********************************
**********************************
Model building using hash commands
**********************************
An input file has to be supplied to gprMax which should contain all the necessary information to run a GPR model. The input file is an ASCII text file which can be prepared with any text editor or word-processing program. In the input file the hash character (``#``) is reserved and is used to denote the beginning of a command which will be passed to gprMax. The general syntax of commands is:

查看文件

@@ -9,16 +9,14 @@ OpenMP
The most computationally intensive parts of gprMax, which are the FDTD solver loops, have been parallelised using `OpenMP <http://openmp.org>`_ which supports multi-platform shared memory multiprocessing.
By default gprMax will try to determine and use the maximum number of OpenMP threads (usually the number of physical CPU cores) available on your machine. You can override this behaviour in two ways: firstly, gprMax will check to see if the ``#num_threads`` command is present in your input file; if not, gprMax will check to see if the environment variable ``OMP_NUM_THREADS`` is set. This can be useful if you are running gprMax in a High-Performance Computing (HPC) environment where you might not want to use all of the available CPU cores.
By default gprMax will try to determine and use the maximum number of OpenMP threads (usually the number of physical CPU cores) available on your machine. You can override this behaviour in two ways: firstly, gprMax will check to see if the ``#cpu_threads`` command is present in your input file; if not, gprMax will check to see if the environment variable ``OMP_NUM_THREADS`` is set. This can be useful if you are running gprMax in a High-Performance Computing (HPC) environment where you might not want to use all of the available CPU cores.
MPI
===
The Message Passing Interface (MPI) has been utilised to implement a simple task farm that can be used to distribute a series of models as independent tasks. This can be useful in many GPR simulations where a B-scan (composed of multiple A-scans) is required. Each A-scan can be task-farmed as a independent model. Within each independent model OpenMP threading will continue to be used (as described above). Overall this creates what is know as a mixed mode OpenMP/MPI job.
By default the MPI task farm functionality is turned off. It can be used with the ``-mpi`` command line option, which specifies the total number of MPI tasks, i.e. master + workers, for the MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60 -mpi 61``.
Our default MPI task farm implementation (activated using the ``-mpi`` command line option) makes use of the `MPI spawn mechanism <https://www.open-mpi.org/doc/current/man3/MPI_Comm_spawn.3.php>`_. This is sometimes not supported or properly configured on HPC systems. There is therefore an alternate MPI task farm implementation that does not use the MPI spawn mechanism, and is activated using the ``--mpi-no-spawn`` command line option. See :ref:`examples for usage <hpc_script_examples>`.
By default the MPI task farm functionality is turned off. It can be used with the ``-mpi`` command line option, which specifies the total number of MPI tasks, i.e. master + workers, for the MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax examples/cylinder_Bscan_2D.in -n 60 -mpi 61``.
Extra installation steps for MPI task farm usage
------------------------------------------------
@@ -48,11 +46,11 @@ HPC environments usually require jobs to be submitted to a queue using a job scr
OpenMP example
--------------
:download:`gprmax_omp.sh <../../tools/HPC_scripts/gprmax_omp.sh>`
:download:`gprmax_omp.sh <../../toolboxes/Utilities/HPC/gprmax_omp.sh>`
Here is an example of a job script for running models, e.g. A-scans to make a B-scan, one after another on a single cluster node. This is not as beneficial as the OpenMP/MPI example, but it can be a helpful starting point when getting the software running in your HPC environment. The behaviour of most of the variables is explained in the comments in the script.
.. literalinclude:: ../../tools/HPC_scripts/gprmax_omp.sh
.. literalinclude:: ../../toolboxes/Utilities/HPC/gprmax_omp.sh
:language: bash
:linenos:
@@ -62,11 +60,11 @@ In this example 10 models will be run one after another on a single node of the
OpenMP/MPI example
------------------
:download:`gprmax_omp_mpi.sh <../../tools/HPC_scripts/gprmax_omp_mpi.sh>`
:download:`gprmax_omp_mpi.sh <../../toolboxes/Utilities/HPC/gprmax_omp_mpi.sh>`
Here is an example of a job script for running models, e.g. A-scans to make a B-scan, distributed as independent tasks in a HPC environment using MPI. The behaviour of most of the variables is explained in the comments in the script.
.. literalinclude:: ../../tools/HPC_scripts/gprmax_omp_mpi.sh
.. literalinclude:: ../../toolboxes/Utilities/HPC/gprmax_omp_mpi.sh
:language: bash
:linenos:
@@ -76,32 +74,15 @@ The ``-mpi`` argument is passed to gprMax which takes the number of MPI tasks to
The ``NSLOTS`` variable which is required to set the total number of slots/cores for the parallel environment ``-pe mpi`` is usually the number of MPI tasks multiplied by the number of OpenMP threads per task. In this example the number of MPI tasks is 11 and number of OpenMP threads per task is 16, so 176 slots are required.
OpenMP/MPI example - no spawn
-----------------------------
:download:`gprmax_omp_mpi_no_spawn.sh <../../tools/HPC_scripts/gprmax_omp_mpi_no_spawn.sh>`
Here is an example of a job script for running models, e.g. A-scans to make a B-scan, distributed as independent tasks in a HPC environment using the MPI implementation without the MPI spawn mechanism. The behaviour of most of the variables is explained in the comments in the script.
.. literalinclude:: ../../tools/HPC_scripts/gprmax_omp_mpi_no_spawn.sh
:language: bash
:linenos:
In this example 10 models will be distributed as independent tasks in a HPC environment using the MPI implementation without the MPI spawn mechanism.
The ``--mpi-no-spawn`` flag is passed to gprMax which ensures the MPI implementation without the MPI spawn mechanism is used. The number of MPI tasks, i.e. number of models (worker tasks) plus one extra for the master task, should be passed as an argument (``-n``) to the ``mpiexec`` or ``mpirun`` command.
The ``NSLOTS`` variable which is required to set the total number of slots/cores for the parallel environment ``-pe mpi`` is usually the number of MPI tasks multiplied by the number of OpenMP threads per task. In this example the number of MPI tasks is 11 and number of OpenMP threads per task is 16, so 176 slots are required.
Job array example
-----------------
:download:`gprmax_omp_jobarray.sh <../../tools/HPC_scripts/gprmax_omp_jobarray.sh>`
:download:`gprmax_omp_jobarray.sh <../../toolboxes/Utilities/HPC/gprmax_omp_jobarray.sh>`
Here is an example of a job script for running models, e.g. A-scans to make a B-scan, using the job array functionality of Open Grid Scheduler/Grid Engine. A job array is a single submit script that is run multiple times. It has similar functionality, for gprMax, to using the aforementioned MPI task farm. The behaviour of most of the variables is explained in the comments in the script.
.. literalinclude:: ../../tools/HPC_scripts/gprmax_omp_jobarray.sh
.. literalinclude:: ../../toolboxes/Utilities/HPC/gprmax_omp_jobarray.sh
:language: bash
:linenos: