Updates for version 3.1.0 - Big Smoke. These updates are predominately to enable gprMax to be run on GPUs using NVIDIA CUDA.

这个提交包含在:
Craig Warren
2017-06-19 16:24:08 +01:00
父节点 939a6322a1
当前提交 3c8e5bab1b
共有 41 个文件被更改,包括 3282 次插入73 次删除

查看文件

@@ -1,10 +0,0 @@
# Copyright (C) 2015-2017: The University of Edinburgh
# Authors: Craig Warren and Antonis Giannopoulos
#
# This is the official list of people who have contributed to the gprMax repository.
#
# Please keep the list sorted by surname.
Oystein Bjorndal
John Hartley
Rajath Kumar

22
CREDITS 普通文件
查看文件

@@ -0,0 +1,22 @@
# Copyright (C) 2015-2017: The University of Edinburgh
# Authors: Craig Warren and Antonis Giannopoulos
#
# This is the official list of entities and people who have contributed to gprMax.
#
# Please keep the lists of sorted alphabetically (by surname for individuals).
gprMax has been supported through research projects funded by:
The Defence Science and Technology Laboratory (Dstl)
Google
gprMax is a contribution to COST Action TU1208 'Civil Engineering Applications of Ground Penetrating Radar'
As well as the aforementioned authors, the following individuals have contributed to gprMax:
Oystein Bjorndal
John Hartley
Rajath Kumar

查看文件

@@ -168,6 +168,7 @@ Optional command line arguments
Argument name Type Description
====================== ======= ===========
``-n`` integer number of times to run the input file. This option can be used to run a series of models, e.g. to create a B-scan with 60 traces: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60``
``-gpu`` integer NVIDIA CUDA device ID for a specific GPU card. If not specified will default to device ID 0.
``-restart`` integer model number to start/restart simulation from. It would typically be used to restart a series of models from a specific model number, with the ``-n`` argument, e.g. to restart from A-scan 45 when creating a B-scan with 60 traces: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 15 -restart 45``
``-task`` integer task identifier (model number) when running simulation as a job array on `Open Grid Scheduler/Grid Engine <http://gridscheduler.sourceforge.net/index.html>`_. For further details see the `parallel performance section of the User Guide <http://docs.gprmax.com/en/latest/openmp_mpi.html>`_
``-mpi`` integer number of Message Passing Interface (MPI) tasks, i.e. master + workers, for MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60 -mpi 61``. For further details see the `parallel performance section of the User Guide <http://docs.gprmax.com/en/latest/openmp_mpi.html>`_

查看文件

@@ -1,4 +1,4 @@
# This file may be used to create an environment using:
# This file may be used to create a conda environment using:
# $ conda create --name gprMax --file <this file>
name: gprMax
@@ -9,12 +9,28 @@ dependencies:
- h5py
- jupyter
- matplotlib
- mpi4py
- numpy
- psutil
- pyqt=4.11.4
- scipy
- pip:
- mpi4py
- terminaltables
- tqdm
#name: gprMax-essential
#dependencies:
#- python>=3.4
#- colorama
#- cython
#- h5py
#- numpy
#- psutil
#- scipy
#
#- pip:
# - pycuda
# - terminaltables
# - tqdm

27
docs/source/gpu.rst 普通文件
查看文件

@@ -0,0 +1,27 @@
.. _gpu:
**********
Using GPUs
**********
The most computationally intensive parts of gprMax, which are the FDTD solver loops, can optionally be executed using General-purpose computing on graphics processing units (GPGPU). This has been achieved through use of the NVIDIA CUDA programming environment, therefore a `NVIDIA CUDA-Enabled GPU <https://developer.nvidia.com/cuda-gpus>`_ is required to take advantage of the GPU-based solver.
Extra installation steps for GPU usage
======================================
The following steps provide guidance on how to install the extra components to allow gprMax to run on your GPU:
1. Install the `NVIDIA CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_. You can follow the Installation Guides in the `NVIDIA CUDA Toolkit Documentation <http://docs.nvidia.com/cuda/index.html#installation-guides>`_
2. Install the pycuda Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`source activate gprMax` (Linux/macOS) or :code:`activate gprMax` (Windows). Run :code:`conda install pycuda`
Run a test model
================
Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`source activate gprMax` (Linux/macOS) or :code:`activate gprMax` (Windows)
Run one of the test models:
.. code-block:: none
(gprMax)$ python -m gprMax user_models/cylinder_Ascan_2D.in -gpu

查看文件

@@ -31,6 +31,7 @@ gprMax User Guide
python_scripting
openmp_mpi
gpu
.. toctree::
:maxdepth: 2

查看文件

@@ -1,8 +1,8 @@
.. _openmp-mpi:
******************
Parallel execution
******************
*********************
OpenMP, MPI, and HPCs
*********************
OpenMP
======

查看文件

@@ -1,4 +1,4 @@
# This is where the version number is set and read by setup.py and conf.py (for the docs)
__version__ = '3.0.20'
codename = 'Bowmore'
__version__ = '3.1.0'
codename = 'Big Smoke'

查看文件

@@ -30,5 +30,5 @@ ctypedef np.float32_t floattype_t
ctypedef np.complex64_t complextype_t
# Double precision
#ctypedef np.float64_t floattype_t
#ctypedef np.complex128_t complextype_t
# ctypedef np.float64_t floattype_t
# ctypedef np.complex128_t complextype_t

查看文件

@@ -37,8 +37,8 @@ floattype = np.float32
complextype = np.complex64
# For C (CUDA) arrays
cfloattype = 'float'
ccomplextype = 'pycuda::complex<float>'
cudafloattype = 'float'
cudacomplextype = 'pycuda::complex<float>'
# Double precision
# For numpy arrays
@@ -46,5 +46,5 @@ ccomplextype = 'pycuda::complex<float>'
# complextype = np.complex128
# For C (CUDA) arrays
# cfloattype = 'double'
# ccomplextype = 'pycuda::complex<double>'
# cudafloattype = 'double'
# cudacomplextype = 'pycuda::complex<double>'

查看文件

@@ -23,7 +23,7 @@ from colorama import Fore
init()
sys.tracebacklimit = None
#sys.tracebacklimit = None
class GeneralError(ValueError):

查看文件

@@ -16,6 +16,8 @@
# You should have received a copy of the GNU General Public License
# along with gprMax. If not, see <http://www.gnu.org/licenses/>.
from string import Template
import h5py
from gprMax._version import __version__
@@ -46,6 +48,47 @@ def store_outputs(iteration, Ex, Ey, Ez, Hx, Hy, Hz, G):
tl.Itotal[iteration] = tl.current[tl.antpos]
kernel_template_store_outputs = Template("""
// Macros for converting subscripts to linear index:
#define INDEX2D_RXCOORDS(m, n) (m)*($NY_RXCOORDS)+(n)
#define INDEX3D_RXS(i, j, k) (i)*($NY_RXS)*($NZ_RXS)+(j)*($NZ_RXS)+(k)
#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
//////////////////////////////////////////////////////
// Stores field component values for every receiver //
//////////////////////////////////////////////////////
__global__ void store_outputs(int NRX, int iteration, const int* __restrict__ rxcoords, $REAL *rxs, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
// This function stores field component values for every receiver in the model.
//
// Args:
// NRX: Total number of receivers in the model
// rxs: Array to store field components for receivers - rows are field components; columns are iterations; pages are receivers
// E, H: Access to field component arrays
// Obtain the linear index corresponding to the current thread and use for each receiver
int rx = blockIdx.x * blockDim.x + threadIdx.x;
int i, j, k;
if (rx < NRX) {
i = rxcoords[INDEX2D_RXCOORDS(rx,0)];
j = rxcoords[INDEX2D_RXCOORDS(rx,1)];
k = rxcoords[INDEX2D_RXCOORDS(rx,2)];
rxs[INDEX3D_RXS(0,iteration,rx)] = Ex[INDEX3D_FIELDS(i,j,k)];
rxs[INDEX3D_RXS(1,iteration,rx)] = Ey[INDEX3D_FIELDS(i,j,k)];
rxs[INDEX3D_RXS(2,iteration,rx)] = Ez[INDEX3D_FIELDS(i,j,k)];
rxs[INDEX3D_RXS(3,iteration,rx)] = Hx[INDEX3D_FIELDS(i,j,k)];
rxs[INDEX3D_RXS(4,iteration,rx)] = Hy[INDEX3D_FIELDS(i,j,k)];
rxs[INDEX3D_RXS(5,iteration,rx)] = Hz[INDEX3D_FIELDS(i,j,k)];
}
}
""")
def write_hdf5_outputfile(outputfile, Ex, Ey, Ez, Hx, Hy, Hz, G):
"""Write an output file in HDF5 format.

242
gprMax/fields_updates_gpu.py 普通文件
查看文件

@@ -0,0 +1,242 @@
# Copyright (C) 2015-2017: The University of Edinburgh
# Authors: Craig Warren and Antonis Giannopoulos
#
# This file is part of gprMax.
#
# gprMax is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# gprMax is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with gprMax. If not, see <http://www.gnu.org/licenses/>.
from string import Template
kernels_template_fields = Template("""
#include <pycuda-complex.hpp>
// Macros for converting subscripts to linear index:
#define INDEX2D_MAT(m, n) (m)*($NY_MATCOEFFS)+(n)
#define INDEX2D_MATDISP(m, n) (m)*($NY_MATDISPCOEFFS)+(n)
#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
#define INDEX4D_ID(p, i, j, k) (p)*($NX_ID)*($NY_ID)*($NZ_ID)+(i)*($NY_ID)*($NZ_ID)+(j)*($NZ_ID)+(k)
#define INDEX4D_T(p, i, j, k) (p)*($NX_T)*($NY_T)*($NZ_T)+(i)*($NY_T)*($NZ_T)+(j)*($NZ_T)+(k)
// Material coefficients (read-only) in constant memory (64KB)
__device__ __constant__ $REAL updatecoeffsE[$N_updatecoeffsE];
__device__ __constant__ $REAL updatecoeffsH[$N_updatecoeffsH];
/////////////////////////////////////////////////
// Electric field updates - standard materials //
/////////////////////////////////////////////////
__global__ void update_e(int NX, int NY, int NZ, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
// This function updates electric field values.
//
// Args:
// NX, NY, NZ: Number of cells of the model domain
// ID, E, H: Access to ID and field component arrays
// Obtain the linear index corresponding to the current thread
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Convert the linear index to subscripts for 3D field arrays
int i = idx / ($NY_FIELDS * $NZ_FIELDS);
int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
// Convert the linear index to subscripts for 4D material ID array
int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
// Ex component
if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
Ex[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEx,0)] * Ex[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEx,2)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEx,3)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i,j,k-1)]);
}
// Ey component
if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
Ey[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEy,0)] * Ey[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEy,3)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEy,1)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i-1,j,k)]);
}
// Ez component
if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
Ez[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEz,0)] * Ez[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEz,1)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,2)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j-1,k)]);
}
}
///////////////////////////////////////////////////
// Electric field updates - dispersive materials //
///////////////////////////////////////////////////
__global__ void update_e_dispersive_A(int NX, int NY, int NZ, int MAXPOLES, const $COMPLEX* __restrict__ updatecoeffsdispersive, $COMPLEX *Tx, $COMPLEX *Ty, $COMPLEX *Tz, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
// This function is part A of updates to electric field values when dispersive materials (with multiple poles) are present.
//
// Args:
// NX, NY, NZ: Number of cells of the model domain
// MAXPOLES: Maximum number of dispersive material poles present in model
// updatedispersivecoeffs, T, ID, E, H: Access to update coefficients, dispersive, ID and field component arrays
// Obtain the linear index corresponding to the current thread
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Convert the linear index to subscripts for 3D field arrays
int i = idx / ($NY_FIELDS * $NZ_FIELDS);
int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
// Convert the linear index to subscripts for 4D material ID array
int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
// Convert the linear index to subscripts for 4D dispersive array
int i_T = (idx % ($NX_T * $NY_T * $NZ_T)) / ($NY_T * $NZ_T);
int j_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) / $NZ_T;
int k_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) % $NZ_T;
// Ex component
if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
$REAL phi = 0;
for (int pole = 0; pole < MAXPOLES; pole++) {
phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,pole*3)].real() * Tx[INDEX4D_T(pole,i_T,j_T,k_T)].real();
Tx[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,1+(pole*3))] * Tx[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,2+(pole*3))] * Ex[INDEX3D_FIELDS(i,j,k)];
}
Ex[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEx,0)] * Ex[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEx,2)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEx,3)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * phi;
}
// Ey component
if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
$REAL phi = 0;
for (int pole = 0; pole < MAXPOLES; pole++) {
phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,pole*3)].real() * Ty[INDEX4D_T(pole,i_T,j_T,k_T)].real();
Ty[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,1+(pole*3))] * Ty[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,2+(pole*3))] * Ey[INDEX3D_FIELDS(i,j,k)];
}
Ey[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEy,0)] * Ey[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEy,3)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEy,1)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * phi;
}
// Ez component
if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
$REAL phi = 0;
for (int pole = 0; pole < MAXPOLES; pole++) {
phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,pole*3)].real() * Tz[INDEX4D_T(pole,i_T,j_T,k_T)].real();
Tz[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,1+(pole*3))] * Tz[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,2+(pole*3))] * Ez[INDEX3D_FIELDS(i,j,k)];
}
Ez[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEz,0)] * Ez[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEz,1)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,2)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * phi;
}
}
__global__ void update_e_dispersive_B(int NX, int NY, int NZ, int MAXPOLES, const $COMPLEX* __restrict__ updatecoeffsdispersive, $COMPLEX *Tx, $COMPLEX *Ty, $COMPLEX *Tz, const unsigned int* __restrict__ ID, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez) {
// This function is part B which updates the dispersive field arrays when dispersive materials (with multiple poles) are present.
//
// Args:
// NX, NY, NZ: Number of cells of the model domain
// MAXPOLES: Maximum number of dispersive material poles present in model
// updatedispersivecoeffs, T, ID, E, H: Access to update coefficients, dispersive, ID and field component arrays
// Obtain the linear index corresponding to the current thread
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Convert the linear index to subscripts for 3D field arrays
int i = idx / ($NY_FIELDS * $NZ_FIELDS);
int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
// Convert the linear index to subscripts for 4D material ID array
int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
// Convert the linear index to subscripts for 4D dispersive array
int i_T = (idx % ($NX_T * $NY_T * $NZ_T)) / ($NY_T * $NZ_T);
int j_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) / $NZ_T;
int k_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) % $NZ_T;
// Ex component
if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
for (int pole = 0; pole < MAXPOLES; pole++) {
Tx[INDEX4D_T(pole,i_T,j_T,k_T)] = Tx[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,2+(pole*3))] * Ex[INDEX3D_FIELDS(i,j,k)];
}
}
// Ey component
if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
for (int pole = 0; pole < MAXPOLES; pole++) {
Ty[INDEX4D_T(pole,i_T,j_T,k_T)] = Ty[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,2+(pole*3))] * Ey[INDEX3D_FIELDS(i,j,k)];
}
}
// Ez component
if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
for (int pole = 0; pole < MAXPOLES; pole++) {
Tz[INDEX4D_T(pole,i_T,j_T,k_T)] = Tz[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,2+(pole*3))] * Ez[INDEX3D_FIELDS(i,j,k)];
}
}
}
////////////////////////////
// Magnetic field updates //
////////////////////////////
__global__ void update_h(int NX, int NY, int NZ, const unsigned int* __restrict__ ID, $REAL *Hx, $REAL *Hy, $REAL *Hz, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez) {
// This function updates magnetic field values.
//
// Args:
// NX, NY, NZ: Number of cells of the model domain
// ID, E, H: Access to ID and field component arrays
// Obtain the linear index corresponding to the current thread
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Convert the linear index to subscripts for 3D field arrays
int i = idx / ($NY_FIELDS * $NZ_FIELDS);
int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
// Convert the linear index to subscripts for 4D material ID array
int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
// Hx component
if (NX != 1 && i > 0 && i < NX && j >= 0 && j < NY && k >= 0 && k < NZ) {
int materialHx = ID[INDEX4D_ID(3,i_ID,j_ID,k_ID)];
Hx[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHx,0)] * Hx[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHx,2)] * (Ez[INDEX3D_FIELDS(i,j+1,k)] - Ez[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHx,3)] * (Ey[INDEX3D_FIELDS(i,j,k+1)] - Ey[INDEX3D_FIELDS(i,j,k)]);
}
// Hy component
if (NY != 1 && i >= 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
int materialHy = ID[INDEX4D_ID(4,i_ID,j_ID,k_ID)];
Hy[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHy,0)] * Hy[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHy,3)] * (Ex[INDEX3D_FIELDS(i,j,k+1)] - Ex[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHy,1)] * (Ez[INDEX3D_FIELDS(i+1,j,k)] - Ez[INDEX3D_FIELDS(i,j,k)]);
}
// Hz component
if (NZ != 1 && i >= 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
int materialHz = ID[INDEX4D_ID(5,i_ID,j_ID,k_ID)];
Hz[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHz,0)] * Hz[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHz,1)] * (Ey[INDEX3D_FIELDS(i+1,j,k)] - Ey[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHz,2)] * (Ex[INDEX3D_FIELDS(i,j+1,k)] - Ex[INDEX3D_FIELDS(i,j,k)]);
}
}
""")

查看文件

@@ -20,6 +20,8 @@ import numpy as np
from gprMax.constants import floattype
from gprMax.constants import complextype
from gprMax.fractals_generate import generate_fractal2D
from gprMax.fractals_generate import generate_fractal3D
from gprMax.utilities import round_value
np.seterr(divide='raise')
@@ -58,8 +60,12 @@ class FractalSurface(object):
self.filldepth = 0
self.grass = []
def generate_fractal_surface(self):
"""Generate a 2D array with a fractal distribution."""
def generate_fractal_surface(self, G):
"""Generate a 2D array with a fractal distribution.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
if self.xs == self.xf:
surfacedims = (self.ny, self.nz)
@@ -82,16 +88,8 @@ class FractalSurface(object):
# Shift the zero frequency component to the centre of the array
A = np.fft.fftshift(A)
for i in range(surfacedims[0]):
for j in range(surfacedims[1]):
# Positional vector for current position
v2 = np.array([self.weighting[0] * i, self.weighting[1] * j])
rr = np.linalg.norm(v2 - v1)
try:
self.fractalsurface[i, j] = A[i, j] * 1 / (rr**self.b)
except FloatingPointError:
rr = 0.9
self.fractalsurface[i, j] = A[i, j] * 1 / (rr**self.b)
# Generate fractal
generate_fractal2D(surfacedims[0], surfacedims[1], G.nthreads, self.b, self.weighting, v1, A, self.fractalsurface)
# Shift the zero frequency component to start of the array
self.fractalsurface = np.fft.ifftshift(self.fractalsurface)
@@ -135,8 +133,12 @@ class FractalVolume(object):
self.nbins = 0
self.fractalsurfaces = []
def generate_fractal_volume(self):
"""Generate a 3D volume with a fractal distribution."""
def generate_fractal_volume(self, G):
"""Generate a 3D volume with a fractal distribution.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
# Scale filter according to size of fractal volume
if self.nx == 1:
@@ -168,17 +170,8 @@ class FractalVolume(object):
# Shift the zero frequency component to the centre of the array
A = np.fft.fftshift(A)
for i in range(self.nx):
for j in range(self.ny):
for k in range(self.nz):
# Positional vector for current position
v2 = np.array([self.weighting[0] * i, self.weighting[1] * j, self.weighting[2] * k])
rr = np.linalg.norm(v2 - v1)
try:
self.fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**self.b)
except FloatingPointError:
rr = 0.9
self.fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**self.b)
# Generate fractal
generate_fractal3D(self.nx, self.ny, self.nz, G.nthreads, self.b, self.weighting, v1, A, self.fractalvolume)
# Shift the zero frequency component to the start of the array
self.fractalvolume = np.fft.ifftshift(self.fractalvolume)

查看文件

@@ -0,0 +1,89 @@
# Copyright (C) 2015-2017: The University of Edinburgh
# Authors: Craig Warren and Antonis Giannopoulos
#
# This file is part of gprMax.
#
# gprMax is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# gprMax is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with gprMax. If not, see <http://www.gnu.org/licenses/>.
import numpy as np
cimport numpy as np
from cython.parallel import prange
from gprMax.constants cimport complextype_t
cpdef void generate_fractal2D(int nx, int ny, int nthreads, int b, np.float64_t[:] weighting, np.float64_t[:] v1, np.complex128_t[:, ::1] A, complextype_t[:, ::1] fractalsurface):
"""This function generates a fractal surface for a 2D array.
Args:
nx, ny (int): Fractal surface size in cells
nthreads (int): Number of threads to use
b (int): Constant related to fractal dimension
weighting (memoryview): Access to weighting vector
v1 (memoryview): Access to positional vector at centre of array, scaled by weighting
A (memoryview): Access to array containing random numbers (to be convolved with fractal function)
fractalsurface (memoryview): Access to array containing fractal surface data
"""
cdef Py_ssize_t i, j
cdef float v2x, v2y, rr
for i in prange(nx, nogil=True, schedule='static', num_threads=nthreads):
for j in range(ny):
# Positional vector for current position
v2x = weighting[0] * i
v2y = weighting[1] * j
# Calulate norm of v2 - v1
rr = ((v2x - v1[0])**2 + (v2y - v1[1])**2)**(1/2)
# Catch potential divide by zero
if rr == 0:
rr = 0.9
fractalsurface[i, j] = A[i, j] * 1 / (rr**b)
cpdef void generate_fractal3D(int nx, int ny, int nz, int nthreads, int b, np.float64_t[:] weighting, np.float64_t[:] v1, np.complex128_t[:, :, ::1] A, complextype_t[:, :, ::1] fractalvolume):
"""This function generates a fractal volume for a 3D array.
Args:
nx, ny, nz (int): Fractal volume size in cells
nthreads (int): Number of threads to use
b (int): Constant related to fractal dimension
weighting (memoryview): Access to weighting vector
v1 (memoryview): Access to positional vector at centre of array, scaled by weighting
A (memoryview): Access to array containing random numbers (to be convolved with fractal function)
fractalvolume (memoryview): Access to array containing fractal volume data
"""
cdef Py_ssize_t i, j, k
cdef float v2x, v2y, v2z, rr
for i in prange(nx, nogil=True, schedule='static', num_threads=nthreads):
for j in range(ny):
for k in range(nz):
# Positional vector for current position
v2x = weighting[0] * i
v2y = weighting[1] * j
v2z = weighting[2] * k
# Calulate norm of v2 - v1
rr = ((v2x - v1[0])**2 + (v2y - v1[1])**2 + (v2z - v1[2])**2)**(1/2)
# Catch potential divide by zero
if rr == 0:
rr = 0.9
fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**b)

查看文件

@@ -36,6 +36,7 @@ from gprMax.constants import m0
from gprMax.constants import z0
from gprMax.exceptions import GeneralError
from gprMax.model_build_run import run_model
from gprMax.utilities import detect_gpus
from gprMax.utilities import get_host_info
from gprMax.utilities import get_terminal_width
from gprMax.utilities import human_size
@@ -57,6 +58,7 @@ def main():
parser.add_argument('-restart', type=int, help='model number to restart from, e.g. when creating B-scan')
parser.add_argument('-mpi', type=int, help='number of MPI tasks, i.e. master + workers')
parser.add_argument('--mpi-worker', action='store_true', default=False, help=argparse.SUPPRESS)
parser.add_argument('-gpu', type=int, action='append', nargs='?', const=True, help='flag to use Nvidia GPU (option to give device ID)')
parser.add_argument('-benchmark', action='store_true', default=False, help='flag to switch on benchmarking mode')
parser.add_argument('--geometry-only', action='store_true', default=False, help='flag to only build model and produce geometry file(s)')
parser.add_argument('--geometry-fixed', action='store_true', default=False, help='flag to not reprocess model geometry, e.g. for B-scans where the geometry is fixed')
@@ -73,6 +75,7 @@ def api(
task=None,
restart=None,
mpi=False,
gpu=None,
benchmark=False,
geometry_only=False,
geometry_fixed=False,
@@ -94,6 +97,7 @@ def api(
args.task = task
args.restart = restart
args.mpi = mpi
args.gpu = gpu
args.benchmark = benchmark
args.geometry_only = geometry_only
args.geometry_fixed = geometry_fixed
@@ -117,6 +121,34 @@ def run_main(args):
hostinfo = get_host_info()
hyperthreading = ', {} cores with Hyper-Threading'.format(hostinfo['logicalcores']) if hostinfo['hyperthreading'] else ''
print('\nHost: {}; {} x {} ({} cores{}); {} RAM; {}'.format(hostinfo['machineID'], hostinfo['sockets'], hostinfo['cpuID'], hostinfo['physicalcores'], hyperthreading, human_size(hostinfo['ram'], a_kilobyte_is_1024_bytes=True), hostinfo['osversion']))
# Get information/setup Nvidia GPU(s)
if args.gpu is not None:
# Extract first item of list, either True to automatically determine device ID,
# or an integer to manually specify device ID
args.gpu = args.gpu[0]
gpus = detect_gpus()
# If a device ID is specified check it is valid
if not isinstance(args.gpu, bool):
if args.gpu > len(gpus) - 1:
raise GeneralError('GPU with device ID {} does not exist'.format(args.gpu))
# Set args.gpu to GPU object to access elsewhere
args.gpu = next(gpu for gpu in gpus if gpu.deviceID == args.gpu)
# If no device ID is specified
else:
# If in MPI mode then set args.gpu to list of available GPUs
if args.mpi:
if args.mpi - 1 > len(gpus):
raise GeneralError('Too many MPI tasks requested ({}). The number of MPI tasks requested can only be a maximum of the number of GPU(s) detected plus one, i.e. {} GPU worker tasks + 1 CPU master task'.format(args.mpi, len(gpus)))
args.gpu = gpus
# If benchmarking mode then set args.gpu to list of available GPUs
elif args.benchmark:
args.gpu = gpus
# Otherwise set args.gpu to GPU object with default device ID (0) to access elsewhere
else:
args.gpu = next(gpu for gpu in gpus if gpu.deviceID == 0)
# Create a separate namespace that users can access in any Python code blocks in the input file
usernamespace = {'c': c, 'e0': e0, 'm0': m0, 'z0': z0, 'number_model_runs': args.n, 'inputfile': os.path.abspath(inputfile.name)}
@@ -239,13 +271,41 @@ def run_benchmark_sim(args, inputfile, usernamespace):
cputimes = np.zeros(len(cputhreads))
numbermodelruns = len(cputhreads)
modelend = numbermodelruns + 1
# Both CPU and GPU benchmarking
gpus = None
gpuIDs = []
gputimes = np.array([])
if args.gpu is not None:
# Set size of array to store GPU runtimes and number of runs of model required
if isinstance(args.gpu, list):
for gpu in args.gpu:
gpuIDs.append(gpu.name)
gputimes = np.zeros(len(args.gpu))
numbermodelruns += len(args.gpu)
else:
gpuIDs.append(args.gpu.name)
gputimes = np.zeros(1)
numbermodelruns += 1
# Store GPU information in a temp variable and set args.gpu to None to do CPU benchmarking first
gpus = args.gpu
args.gpu = None
usernamespace['number_model_runs'] = numbermodelruns
modelend = numbermodelruns + 1
for currentmodelrun in range(1, modelend):
os.environ['OMP_NUM_THREADS'] = str(cputhreads[currentmodelrun - 1])
cputimes[currentmodelrun - 1] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)
# Set args.gpu if doing GPU benchmark
if currentmodelrun > len(cputhreads):
if isinstance(gpus, list):
args.gpu = gpus[(currentmodelrun - 1) - len(cputhreads)]
else:
args.gpu = gpus
# del os.environ['OMP_NUM_THREADS']
gputimes[(currentmodelrun - 1) - len(cputhreads)] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)
else:
os.environ['OMP_NUM_THREADS'] = str(cputhreads[currentmodelrun - 1])
cputimes[currentmodelrun - 1] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)
# Get model size (in cells) and number of iterations
if currentmodelrun == 1:
@@ -340,7 +400,15 @@ def run_mpi_sim(args, inputfile, usernamespace, optparams=None):
for work in iter(lambda: comm.sendrecv(0, dest=0), StopIteration):
currentmodelrun = work['currentmodelrun']
# Get info and setup device ID for GPU(s)
gpuinfo = ''
if args.gpu is not None:
# Set device ID for multiple GPUs
if isinstance(args.gpu, list):
deviceID = (rank - 1) % len(args.gpu)
args.gpu = next(gpu for gpu in args.gpu if gpu.deviceID == deviceID)
gpuinfo = ' using {} - {}, {} RAM '.format(args.gpu.deviceID, args.gpu.name, human_size(args.gpu.totalmem, a_kilobyte_is_1024_bytes=True))
print('MPI worker rank {} (PID {}) starting model {}/{}{} on {}'.format(rank, os.getpid(), currentmodelrun, numbermodelruns, gpuinfo, name))
# If Taguchi optimistaion, add specific value for each parameter to

查看文件

@@ -87,8 +87,18 @@ class FDTDGrid(Grid):
self.messages = True
self.tqdmdisable = False
# Get information about host machine
self.hostinfo = None
# CPU - OpenMP threads
self.nthreads = 0
# GPU
# Threads per block
self.tpb = (256, 1, 1)
# GPU object
self.gpu = None
# Threshold (dB) down from maximum power (0dB) of main frequency used
# to calculate highest frequency for numerical dispersion analysis
@@ -168,6 +178,33 @@ class FDTDGrid(Grid):
self.Tz = np.zeros((Material.maxpoles, self.nx + 1, self.ny + 1, self.nz + 1), dtype=complextype)
self.updatecoeffsdispersive = np.zeros((len(self.materials), 3 * Material.maxpoles), dtype=complextype)
def gpu_set_blocks_per_grid(self):
"""Set the blocks per grid size used for updating the electric and magnetic field arrays on a GPU."""
self.bpg = (int(np.ceil(((self.nx + 1) * (self.ny + 1) * (self.nz + 1)) / self.tpb[0])), 1, 1)
def gpu_initialise_arrays(self):
"""Initialise standard field arrays on GPU."""
import pycuda.gpuarray as gpuarray
self.ID_gpu = gpuarray.to_gpu(self.ID)
self.Ex_gpu = gpuarray.to_gpu(self.Ex)
self.Ey_gpu = gpuarray.to_gpu(self.Ey)
self.Ez_gpu = gpuarray.to_gpu(self.Ez)
self.Hx_gpu = gpuarray.to_gpu(self.Hx)
self.Hy_gpu = gpuarray.to_gpu(self.Hy)
self.Hz_gpu = gpuarray.to_gpu(self.Hz)
def gpu_initialise_dispersive_arrays(self):
"""Initialise dispersive material coefficient arrays on GPU."""
import pycuda.gpuarray as gpuarray
self.Tx_gpu = gpuarray.to_gpu(self.Tx)
self.Ty_gpu = gpuarray.to_gpu(self.Ty)
self.Tz_gpu = gpuarray.to_gpu(self.Tz)
self.updatecoeffsdispersive_gpu = gpuarray.to_gpu(self.updatecoeffsdispersive)
def dispersion_analysis(G):
"""

查看文件

@@ -876,7 +876,7 @@ def process_geometrycmds(geometry, G):
volume.operatingonID = tmp[12]
volume.nbins = nbins
volume.seed = seed
volume.weighting = (float(tmp[8]), float(tmp[9]), float(tmp[10]))
volume.weighting = np.array([float(tmp[8]), float(tmp[9]), float(tmp[10])])
try:
volume.averaging = averagefractalbox
except:
@@ -995,14 +995,14 @@ def process_geometrycmds(geometry, G):
surface.fractalrange = fractalrange
surface.operatingonID = volume.ID
surface.seed = seed
surface.weighting = (float(tmp[8]), float(tmp[9]))
surface.weighting = np.array([float(tmp[8]), float(tmp[9])])
# List of existing surfaces IDs
existingsurfaceIDs = [x.surfaceID for x in volume.fractalsurfaces]
if surface.surfaceID in existingsurfaceIDs:
raise CmdInputError("'" + ' '.join(tmp) + "'" + ' has already been used on the {} surface'.format(surface.surfaceID))
surface.generate_fractal_surface()
surface.generate_fractal_surface(G)
volume.fractalsurfaces.append(surface)
if G.messages:
@@ -1212,7 +1212,7 @@ def process_geometrycmds(geometry, G):
# Set the fractal range to scale the fractal distribution between zero and one
surface.fractalrange = (0, 1)
surface.operatingonID = volume.ID
surface.generate_fractal_surface()
surface.generate_fractal_surface(G)
if numblades > surface.fractalsurface.shape[0] * surface.fractalsurface.shape[1]:
raise CmdInputError("'" + ' '.join(tmp) + "'" + ' the specified surface is not large enough for the number of grass blades/roots specified')
@@ -1308,7 +1308,7 @@ def process_geometrycmds(geometry, G):
materialnumID = next(x.numID for x in G.materials if x.ID == volume.operatingonID)
volume.fractalvolume *= materialnumID
else:
volume.generate_fractal_volume()
volume.generate_fractal_volume(G)
volume.fractalvolume += mixingmodel.startmaterialnum
volume.generate_volume_mask()
@@ -1521,7 +1521,7 @@ def process_geometrycmds(geometry, G):
if volume.nbins == 1:
raise CmdInputError("'" + ' '.join(tmp) + "'" + ' is being used with a single material and no modifications, therefore please use a #box command instead.')
else:
volume.generate_fractal_volume()
volume.generate_fractal_volume(G)
volume.fractalvolume += mixingmodel.startmaterialnum
data = volume.fractalvolume.astype('int16', order='C')

查看文件

@@ -295,6 +295,10 @@ def process_multicmds(multicmds, G):
tmp = cmdinstance.split()
if len(tmp) < 6:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires at least six parameters')
# Warn about using a transmission line on GPU
if G.gpu is not None:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' A #transmission_line cannot currently be used with GPU solving. Consider using a #voltage_source instead.')
# Check polarity & position parameters
polarisation = tmp[0].lower()
@@ -386,7 +390,10 @@ def process_multicmds(multicmds, G):
else:
r.ID = tmp[3]
# Get allowable outputs
allowableoutputs = Rx.allowableoutputs
if G.gpu is not None:
allowableoutputs = Rx.gpu_allowableoutputs
else:
allowableoutputs = Rx.allowableoutputs
# Check and add field output names
for field in tmp[4::]:
if field in allowableoutputs:
@@ -428,17 +435,17 @@ def process_multicmds(multicmds, G):
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the lower coordinates should be less than the upper coordinates')
if dx < 0 or dy < 0 or dz < 0:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
if dx < G.dx:
if dx < 1:
if dx == 0:
dx = 1
else:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
if dy < G.dy:
if dy < 1:
if dy == 0:
dy = 1
else:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
if dz < G.dz:
if dz < 1:
if dz == 0:
dz = 1
else:
@@ -471,6 +478,10 @@ def process_multicmds(multicmds, G):
tmp = cmdinstance.split()
if len(tmp) != 11:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires exactly eleven parameters')
# Warn about using snapshots on GPU
if G.gpu is not None:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' The #snapshot command cannot currently be used with GPU solving.')
xs = G.calculate_coord('x', tmp[0])
ys = G.calculate_coord('y', tmp[1])
@@ -502,7 +513,7 @@ def process_multicmds(multicmds, G):
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the lower coordinates should be less than the upper coordinates')
if dx < 0 or dy < 0 or dz < 0:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
if dx < G.dx or dy < G.dy or dz < G.dz:
if dx < 1 or dy < 1 or dz < 1:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
if time <= 0 or time > G.iterations:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' time value is not valid')
@@ -724,7 +735,7 @@ def process_multicmds(multicmds, G):
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
if dx > G.nx or dy > G.ny or dz > G.nz:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should be less than the domain size')
if dx < G.dx or dy < G.dy or dz < G.dz:
if dx < 1 or dy < 1 or dz < 1:
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
if tmp[10].lower() != 'n' and tmp[10].lower() != 'f':
raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires type to be either n (normal) or f (fine)')

查看文件

@@ -77,7 +77,7 @@ def process_singlecmds(singlecmds, G):
# os.environ['OMP_DISPLAY_ENV'] = 'TRUE' # Prints OMP version and environment variables (useful for debug)
# Catch bug with Windows Subsystem for Linux (https://github.com/Microsoft/BashOnWindows/issues/785)
if 'Microsoft' in hostinfo['osversion']:
if 'Microsoft' in G.hostinfo['osversion']:
os.environ['KMP_AFFINITY'] = 'disabled'
del os.environ['OMP_PLACES']
del os.environ['OMP_PROC_BIND']
@@ -99,9 +99,14 @@ def process_singlecmds(singlecmds, G):
if G.messages:
print('Number of CPU (OpenMP) threads: {}'.format(G.nthreads))
if G.nthreads > hostinfo['physicalcores']:
if G.nthreads > G.hostinfo['physicalcores']:
print(Fore.RED + 'WARNING: You have specified more threads ({}) than available physical CPU cores ({}). This may lead to degraded performance.'.format(G.nthreads, hostinfo['physicalcores']) + Style.RESET_ALL)
# Print information about any GPU in use
if G.messages:
if G.gpu is not None:
print('GPU solving using: {} - {}'.format(G.gpu.deviceID, G.gpu.name))
# Spatial discretisation
cmd = '#dx_dy_dz'
tmp = [float(x) for x in singlecmds[cmd].split()]
@@ -135,8 +140,13 @@ def process_singlecmds(singlecmds, G):
# Estimate memory (RAM) usage
memestimate = memory_usage(G)
# Check if model can be built and/or run on host
if memestimate > hostinfo['ram']:
if memestimate > G.hostinfo['ram']:
raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected!\n'.format(human_size(memestimate), human_size(hostinfo['ram'], a_kilobyte_is_1024_bytes=True)))
# Check if model can be run on specified GPU if required
if G.gpu is not None:
if memestimate > G.gpu.totalmem:
raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected on specified {} - {} GPU!\n'.format(human_size(memestimate), human_size(G.gpu.totalmem, a_kilobyte_is_1024_bytes=True), G.gpu.deviceID, G.gpu.name))
if G.messages:
print('Estimated memory (RAM) required: ~{}'.format(human_size(memestimate)))

查看文件

@@ -23,16 +23,19 @@ import psutil
import sys
from time import perf_counter
from colorama import init, Fore, Style
from colorama import init
from colorama import Fore
from colorama import Style
init()
import numpy as np
from terminaltables import AsciiTable
from tqdm import tqdm
from gprMax.constants import floattype, cfloattype, ccomplextype
from gprMax.constants import floattype, cudafloattype, cudacomplextype
from gprMax.exceptions import GeneralError
from gprMax.fields_outputs import store_outputs
from gprMax.fields_outputs import kernel_template_store_outputs
from gprMax.fields_outputs import write_hdf5_outputfile
from gprMax.fields_updates import update_electric
@@ -41,6 +44,7 @@ from gprMax.fields_updates import update_electric_dispersive_multipole_A
from gprMax.fields_updates import update_electric_dispersive_multipole_B
from gprMax.fields_updates import update_electric_dispersive_1pole_A
from gprMax.fields_updates import update_electric_dispersive_1pole_B
from gprMax.fields_updates_gpu import kernels_template_fields
from gprMax.grid import FDTDGrid
from gprMax.grid import dispersion_analysis
@@ -53,9 +57,17 @@ from gprMax.input_cmds_singleuse import process_singlecmds
from gprMax.materials import Material, process_materials
from gprMax.pml import PML
from gprMax.pml import build_pmls
from gprMax.pml_updates_gpu import kernels_template_pml
from gprMax.receivers import gpu_initialise_rx_arrays
from gprMax.receivers import gpu_get_rx_array
from gprMax.sources import gpu_initialise_src_arrays
from gprMax.source_updates_gpu import kernels_template_sources
from gprMax.utilities import get_host_info
from gprMax.utilities import get_terminal_width
from gprMax.utilities import human_size
from gprMax.utilities import memory_usage
from gprMax.utilities import open_path_file
from gprMax.utilities import round32
from gprMax.yee_cell_build import build_electric_components
from gprMax.yee_cell_build import build_magnetic_components
@@ -90,6 +102,13 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern
# Initialise an instance of the FDTDGrid class
G = FDTDGrid()
# Get information about host machine
G.hostinfo = get_host_info()
# Single GPU object
if args.gpu:
G.gpu = args.gpu
G.inputfilename = os.path.split(inputfile.name)[1]
G.inputdirectory = os.path.dirname(os.path.abspath(inputfile.name))
@@ -186,6 +205,19 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern
# Initialise arrays of update coefficients and temporary values if
# there are any dispersive materials
if Material.maxpoles != 0:
# Update estimated memory (RAM) usage
memestimate = memory_usage(G)
# Check if model can be built and/or run on host
if memestimate > G.hostinfo['ram']:
raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected!\n'.format(human_size(memestimate), human_size(G.hostinfo['ram'], a_kilobyte_is_1024_bytes=True)))
# Check if model can be run on specified GPU if required
if G.gpu is not None:
if memestimate > G.gpu.totalmem:
raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected on specified {} - {} GPU!\n'.format(human_size(memestimate), human_size(G.gpu.totalmem, a_kilobyte_is_1024_bytes=True), G.gpu.deviceID, G.gpu.name))
if G.messages:
print('Estimated memory (RAM) required: ~{}'.format(human_size(memestimate)))
G.initialise_dispersive_arrays()
# Process complete list of materials - calculate update coefficients,
@@ -271,7 +303,10 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern
print('\nOutput file: {}\n'.format(outputfile))
# Main FDTD solving functions for either CPU or GPU
tsolve = solve_cpu(currentmodelrun, modelend, G)
if G.gpu is None:
tsolve = solve_cpu(currentmodelrun, modelend, G)
else:
tsolve = solve_gpu(currentmodelrun, modelend, G)
# Write an output file in HDF5 format
write_hdf5_outputfile(outputfile, G.Ex, G.Ey, G.Ez, G.Hx, G.Hy, G.Hz, G)
@@ -358,3 +393,149 @@ def solve_cpu(currentmodelrun, modelend, G):
tsolve = perf_counter() - tsolvestart
return tsolve
def solve_gpu(currentmodelrun, modelend, G):
"""Solving using FDTD method on GPU. Implemented using Nvidia CUDA.
Args:
currentmodelrun (int): Current model run number.
modelend (int): Number of last model to run.
G (class): Grid class instance - holds essential parameters describing the model.
Returns:
tsolve (float): Time taken to execute solving
"""
import pycuda.driver as drv
from pycuda.compiler import SourceModule
drv.init()
# Create device handle and context on specifc GPU device (and make it current context)
dev = drv.Device(G.gpu.deviceID)
ctx = dev.make_context()
# Electric and magnetic field updates - prepare kernels, and get kernel functions
if Material.maxpoles > 0:
kernels_fields = SourceModule(kernels_template_fields.substitute(REAL=cudafloattype, COMPLEX=cudacomplextype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_MATDISPCOEFFS=G.updatecoeffsdispersive.shape[1], NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3], NX_T=G.Tx.shape[1], NY_T=G.Tx.shape[2], NZ_T=G.Tx.shape[3]))
else: # Set to one any substitutions for dispersive materials
kernels_fields = SourceModule(kernels_template_fields.substitute(REAL=cudafloattype, COMPLEX=cudacomplextype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_MATDISPCOEFFS=1, NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3], NX_T=1, NY_T=1, NZ_T=1))
update_e_gpu = kernels_fields.get_function("update_e")
update_h_gpu = kernels_fields.get_function("update_h")
# Copy material coefficient arrays to constant memory of GPU (must be <64KB) for fields kernels
updatecoeffsE = kernels_fields.get_global('updatecoeffsE')[0]
updatecoeffsH = kernels_fields.get_global('updatecoeffsH')[0]
if G.updatecoeffsE.nbytes + G.updatecoeffsH.nbytes > G.gpu.constmem:
raise GeneralError('Too many materials in the model to fit onto constant memory of size {} on {} - {} GPU'.format(human_size(G.gpu.constmem), G.gpu.deviceID, G.gpu.name))
else:
drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
# Electric and magnetic field updates - dispersive materials - get kernel functions
if Material.maxpoles > 0: # If there are any dispersive materials (updates are split into two parts as they require present and updated electric field values).
update_e_dispersive_A_gpu = kernels_fields.get_function("update_e_dispersive_A")
update_e_dispersive_B_gpu = kernels_fields.get_function("update_e_dispersive_B")
G.gpu_initialise_dispersive_arrays()
# Electric and magnetic field updates - set blocks per grid and initialise field arrays on GPU
G.gpu_set_blocks_per_grid()
G.gpu_initialise_arrays()
# PML updates
if G.pmls:
# Prepare kernels
kernels_pml = SourceModule(kernels_template_pml.substitute(REAL=cudafloattype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_R=G.pmls[0].ERA.shape[1], NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3]))
# Copy material coefficient arrays to constant memory of GPU (must be <64KB) for PML kernels
updatecoeffsE = kernels_pml.get_global('updatecoeffsE')[0]
updatecoeffsH = kernels_pml.get_global('updatecoeffsH')[0]
drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
# Set block per grid, initialise arrays on GPU, and get kernel functions
for pml in G.pmls:
pml.gpu_set_blocks_per_grid(G)
pml.gpu_initialise_arrays()
pml.gpu_get_update_funcs(kernels_pml)
# Receivers
if G.rxs:
# Initialise arrays on GPU
rxcoords_gpu, rxs_gpu = gpu_initialise_rx_arrays(G)
# Prepare kernel and get kernel function
kernel_store_outputs = SourceModule(kernel_template_store_outputs.substitute(REAL=cudafloattype, NY_RXCOORDS=3, NX_RXS=6, NY_RXS=G.iterations, NZ_RXS=len(G.rxs), NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2]))
store_outputs_gpu = kernel_store_outputs.get_function("store_outputs")
# Sources - initialise arrays on GPU, prepare kernel and get kernel functions
if G.voltagesources + G.hertziandipoles + G.magneticdipoles:
kernels_sources = SourceModule(kernels_template_sources.substitute(REAL=cudafloattype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_SRCINFO=4, NY_SRCWAVES=G.iterations, NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3]))
# Copy material coefficient arrays to constant memory of GPU (must be <64KB) for source kernels
updatecoeffsE = kernels_sources.get_global('updatecoeffsE')[0]
updatecoeffsH = kernels_sources.get_global('updatecoeffsH')[0]
drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
if G.hertziandipoles:
srcinfo1_hertzian_gpu, srcinfo2_hertzian_gpu, srcwaves_hertzian_gpu = gpu_initialise_src_arrays(G.hertziandipoles, G)
update_hertzian_dipole_gpu = kernels_sources.get_function("update_hertzian_dipole")
if G.magneticdipoles:
srcinfo1_magnetic_gpu, srcinfo2_magnetic_gpu, srcwaves_magnetic_gpu = gpu_initialise_src_arrays(G.magneticdipoles, G)
update_magnetic_dipole_gpu = kernels_sources.get_function("update_magnetic_dipole")
if G.voltagesources:
srcinfo1_voltage_gpu, srcinfo2_voltage_gpu, srcwaves_voltage_gpu = gpu_initialise_src_arrays(G.voltagesources, G)
update_voltage_source_gpu = kernels_sources.get_function("update_voltage_source")
# Iteration loop timer
iterstart = drv.Event()
iterend = drv.Event()
iterstart.record()
for iteration in tqdm(range(G.iterations), desc='Running simulation, model ' + str(currentmodelrun) + '/' + str(modelend), ncols=get_terminal_width() - 1, file=sys.stdout, disable=G.tqdmdisable):
# Store field component values for every receiver
if G.rxs:
store_outputs_gpu(np.int32(len(G.rxs)), np.int32(iteration), rxcoords_gpu.gpudata, rxs_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.rxs)), 1, 1))
# Update magnetic field components
update_h_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), G.ID_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=G.tpb, grid=G.bpg)
# Update magnetic field components with the PML correction
for pml in G.pmls:
pml.gpu_update_magnetic(G)
# Update magnetic field components for magetic dipole sources
if G.magneticdipoles:
update_magnetic_dipole_gpu(np.int32(len(G.magneticdipoles)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_magnetic_gpu.gpudata, srcinfo2_magnetic_gpu.gpudata, srcwaves_magnetic_gpu.gpudata, G.ID_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.magneticdipoles)), 1, 1))
# Update electric field components
if Material.maxpoles == 0: # If all materials are non-dispersive do standard update
update_e_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=G.tpb, grid=G.bpg)
else: # If there are any dispersive materials do 1st part of dispersive update (it is split into two parts as it requires present and updated electric field values).
update_e_dispersive_A_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), np.int32(Material.maxpoles), G.updatecoeffsdispersive_gpu.gpudata, G.Tx_gpu.gpudata, G.Ty_gpu.gpudata, G.Tz_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=G.tpb, grid=G.bpg)
# Update electric field components with the PML correction
for pml in G.pmls:
pml.gpu_update_electric(G)
# Update electric field components for voltage sources
if G.voltagesources:
update_voltage_source_gpu(np.int32(len(G.voltagesources)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_voltage_gpu.gpudata, srcinfo2_voltage_gpu.gpudata, srcwaves_voltage_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.voltagesources)), 1, 1))
# Update electric field components for Hertzian dipole sources (update any Hertzian dipole sources last)
if G.hertziandipoles:
update_hertzian_dipole_gpu(np.int32(len(G.hertziandipoles)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_hertzian_gpu.gpudata, srcinfo2_hertzian_gpu.gpudata, srcwaves_hertzian_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.hertziandipoles)), 1, 1))
# If there are any dispersive materials do 2nd part of dispersive update (it is split into two parts as it requires present and updated electric field values). Therefore it can only be completely updated after the electric field has been updated by the PML and source updates.
if Material.maxpoles > 0:
update_e_dispersive_B_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), np.int32(Material.maxpoles), G.updatecoeffsdispersive_gpu.gpudata, G.Tx_gpu.gpudata, G.Ty_gpu.gpudata, G.Tz_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=G.tpb, grid=G.bpg)
# Copy output from receivers array back to correct receiver objects
gpu_get_rx_array(rxs_gpu.get(), rxcoords_gpu.get(), G)
iterend.record()
iterend.synchronize()
tsolve = iterstart.time_till(iterend) * 1e-3
# Remove context from top of stack and delete
ctx.pop()
del ctx
return tsolve

查看文件

@@ -266,6 +266,63 @@ class PML(object):
func = getattr(import_module('gprMax.pml_updates'), 'update_pml_' + str(len(self.CFS)) + 'order_magnetic_' + self.direction)
func(self.xs, self.xf, self.ys, self.yf, self.zs, self.zf, G.nthreads, G.updatecoeffsH, G.ID, G.Ex, G.Ey, G.Ez, G.Hx, G.Hy, G.Hz, self.HPhi1, self.HPhi2, self.HRA, self.HRB, self.HRE, self.HRF, self.d)
def gpu_set_blocks_per_grid(self, G):
"""Set the blocks per grid size used for updating the PML field arrays on a GPU.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
self.bpg = (int(np.ceil(((self.EPhi1.shape[1] + 1) * (self.EPhi1.shape[2] + 1) * (self.EPhi1.shape[3] + 1)) / G.tpb[0])), 1, 1)
def gpu_initialise_arrays(self):
"""Initialise PML field and coefficient arrays on GPU."""
import pycuda.gpuarray as gpuarray
self.EPhi1_gpu = gpuarray.to_gpu(self.EPhi1)
self.EPhi2_gpu = gpuarray.to_gpu(self.EPhi2)
self.ERA_gpu = gpuarray.to_gpu(self.ERA)
self.ERB_gpu = gpuarray.to_gpu(self.ERB)
self.ERE_gpu = gpuarray.to_gpu(self.ERE)
self.ERF_gpu = gpuarray.to_gpu(self.ERF)
self.HPhi1_gpu = gpuarray.to_gpu(self.HPhi1)
self.HPhi2_gpu = gpuarray.to_gpu(self.HPhi2)
self.HRA_gpu = gpuarray.to_gpu(self.HRA)
self.HRB_gpu = gpuarray.to_gpu(self.HRB)
self.HRE_gpu = gpuarray.to_gpu(self.HRE)
self.HRF_gpu = gpuarray.to_gpu(self.HRF)
def gpu_get_update_funcs(self, kernels):
"""Get update functions from PML kernels.
Args:
kernels: PyCuda SourceModule containing PML kernels.
"""
from pycuda.compiler import SourceModule
self.update_electric_gpu = kernels.get_function('update_pml_' + str(len(self.CFS)) + 'order_electric_' + self.direction)
self.update_magnetic_gpu = kernels.get_function('update_pml_' + str(len(self.CFS)) + 'order_magnetic_' + self.direction)
def gpu_update_electric(self, G):
"""This functions updates electric field components with the PML correction on the GPU.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
self.update_electric_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.EPhi1.shape[1]), np.int32(self.EPhi1.shape[2]), np.int32(self.EPhi1.shape[3]), np.int32(self.EPhi2.shape[1]), np.int32(self.EPhi2.shape[2]), np.int32(self.EPhi2.shape[3]), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.EPhi1_gpu.gpudata, self.EPhi2_gpu.gpudata, self.ERA_gpu.gpudata, self.ERB_gpu.gpudata, self.ERE_gpu.gpudata, self.ERF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
def gpu_update_magnetic(self, G):
"""This functions updates magnetic field components with the PML correction on the GPU.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
self.update_magnetic_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.HPhi1.shape[1]), np.int32(self.HPhi1.shape[2]), np.int32(self.HPhi1.shape[3]), np.int32(self.HPhi2.shape[1]), np.int32(self.HPhi2.shape[2]), np.int32(self.HPhi2.shape[3]), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.HPhi1_gpu.gpudata, self.HPhi2_gpu.gpudata, self.HRA_gpu.gpudata, self.HRB_gpu.gpudata, self.HRE_gpu.gpudata, self.HRF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
def build_pmls(G, pbar):
"""

2027
gprMax/pml_updates_gpu.py 普通文件

文件差异内容过多而无法显示 加载差异

查看文件

@@ -27,6 +27,7 @@ class Rx(object):
"""Receiver output points."""
allowableoutputs = ['Ex', 'Ey', 'Ez', 'Hx', 'Hy', 'Hz', 'Ix', 'Iy', 'Iz']
gpu_allowableoutputs = allowableoutputs[:-3]
defaultoutputs = allowableoutputs[:-3]
def __init__(self):
@@ -39,3 +40,49 @@ class Rx(object):
self.xcoordorigin = None
self.ycoordorigin = None
self.zcoordorigin = None
def gpu_initialise_rx_arrays(G):
"""Initialise arrays on GPU for receiver coordinates and to store field components for receivers.
Args:
G (class): Grid class instance - holds essential parameters describing the model.
"""
import pycuda.gpuarray as gpuarray
# Array to store receiver coordinates on GPU
rxcoords = np.zeros((len(G.rxs), 3), dtype=np.int32)
for i, rx in enumerate(G.rxs):
rxcoords[i, 0] = rx.xcoord
rxcoords[i, 1] = rx.ycoord
rxcoords[i, 2] = rx.zcoord
# Array to store field components for receivers on GPU - rows are field components; columns are iterations; pages are receivers
rxs = np.zeros((len(Rx.gpu_allowableoutputs), G.iterations, len(G.rxs)), dtype=floattype)
# Copy arrays to GPU
rxcoords_gpu = gpuarray.to_gpu(rxcoords)
rxs_gpu = gpuarray.to_gpu(rxs)
return rxcoords_gpu, rxs_gpu
def gpu_get_rx_array(rxs_gpu, rxcoords_gpu, G):
"""Copy output from receivers array used on GPU back to receiver objects.
Args:
rxs_gpu (int): numpy array of receiver data from GPU - rows are field components; columns are iterations; pages are receivers.
rxcoords_gpu (float): numpy array of receiver coordinates from GPU.
G (class): Grid class instance - holds essential parameters describing the model.
"""
for rx in G.rxs:
for rxgpu in range(len(G.rxs)):
if rx.xcoord == rxcoords_gpu[rxgpu, 0] and rx.ycoord == rxcoords_gpu[rxgpu, 1] and rx.zcoord == rxcoords_gpu[rxgpu, 2]:
rx.outputs['Ex'] = rxs_gpu[0, :, rxgpu]
rx.outputs['Ey'] = rxs_gpu[1, :, rxgpu]
rx.outputs['Ez'] = rxs_gpu[2, :, rxgpu]
rx.outputs['Hx'] = rxs_gpu[3, :, rxgpu]
rx.outputs['Hy'] = rxs_gpu[4, :, rxgpu]
rx.outputs['Hz'] = rxs_gpu[5, :, rxgpu]

203
gprMax/source_updates_gpu.py 普通文件
查看文件

@@ -0,0 +1,203 @@
# Copyright (C) 2015-2017: The University of Edinburgh
# Authors: Craig Warren and Antonis Giannopoulos
#
# This file is part of gprMax.
#
# gprMax is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# gprMax is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with gprMax. If not, see <http://www.gnu.org/licenses/>.
from string import Template
kernels_template_sources = Template("""
// Macros for converting subscripts to linear index:
#define INDEX2D_MAT(m, n) (m)*($NY_MATCOEFFS)+(n)
#define INDEX2D_SRCINFO(m, n) (m)*$NY_SRCINFO+(n)
#define INDEX2D_SRCWAVES(m, n) (m)*($NY_SRCWAVES)+(n)
#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
#define INDEX4D_ID(p, i, j, k) (p)*($NX_ID)*($NY_ID)*($NZ_ID)+(i)*($NY_ID)*($NZ_ID)+(j)*($NZ_ID)+(k)
// Material coefficients (read-only) in constant memory (64KB)
__device__ __constant__ $REAL updatecoeffsE[$N_updatecoeffsE];
__device__ __constant__ $REAL updatecoeffsH[$N_updatecoeffsH];
///////////////////////////////////////////
// Hertzian dipole electric field update //
///////////////////////////////////////////
__global__ void update_hertzian_dipole(int NHERTZDIPOLE, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez) {
// This function updates electric field values for Hertzian dipole sources.
//
// Args:
// NHERTZDIPOLE: Total number of Hertzian dipoles in the model
// iteration: Iteration number of simulation
// dx, dy, dz: Spatial discretisations
// srcinfo1: Source cell coordinates and polarisation information
// srcinfo2: Other source information, e.g. length, resistance etc...
// srcwaveforms: Source waveform values
// ID, E: Access to ID and field component arrays
// Obtain the linear index corresponding to the current thread and use for each receiver
int src = blockIdx.x * blockDim.x + threadIdx.x;
if (src < NHERTZDIPOLE) {
$REAL dl;
int i, j, k, polarisation;
i = srcinfo1[INDEX2D_SRCINFO(src,0)];
j = srcinfo1[INDEX2D_SRCINFO(src,1)];
k = srcinfo1[INDEX2D_SRCINFO(src,2)];
polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
dl = srcinfo2[src];
// 'x' polarised source
if (polarisation == 0) {
int materialEx = ID[INDEX4D_ID(0,i,j,k)];
Ex[INDEX3D_FIELDS(i,j,k)] = Ex[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
}
// 'y' polarised source
else if (polarisation == 1) {
int materialEy = ID[INDEX4D_ID(1,i,j,k)];
Ey[INDEX3D_FIELDS(i,j,k)] = Ey[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
}
// 'z' polarised source
else if (polarisation == 2) {
int materialEz = ID[INDEX4D_ID(2,i,j,k)];
Ez[INDEX3D_FIELDS(i,j,k)] = Ez[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
}
}
}
///////////////////////////////////////////
// Magnetic dipole magnetic field update //
///////////////////////////////////////////
__global__ void update_magnetic_dipole(int NMAGDIPOLE, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Hx, $REAL *Hy, $REAL *Hz) {
// This function updates magnetic field values for magnetic dipole sources.
//
// Args:
// NMAGDIPOLE: Total number of magnetic dipoles in the model
// iteration: Iteration number of simulation
// dx, dy, dz: Spatial discretisations
// srcinfo1: Source cell coordinates and polarisation information
// srcinfo2: Other source information, e.g. length, resistance etc...
// srcwaveforms: Source waveform values
// ID, H: Access to ID and field component arrays
// Obtain the linear index corresponding to the current thread and use for each receiver
int src = blockIdx.x * blockDim.x + threadIdx.x;
if (src < NMAGDIPOLE) {
int i, j, k, polarisation;
i = srcinfo1[INDEX2D_SRCINFO(src,0)];
j = srcinfo1[INDEX2D_SRCINFO(src,1)];
k = srcinfo1[INDEX2D_SRCINFO(src,2)];
polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
// 'x' polarised source
if (polarisation == 0) {
int materialHx = ID[INDEX4D_ID(3,i,j,k)];
Hx[INDEX3D_FIELDS(i,j,k)] = Hx[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
}
// 'y' polarised source
else if (polarisation == 1) {
int materialHy = ID[INDEX4D_ID(4,i,j,k)];
Hy[INDEX3D_FIELDS(i,j,k)] = Hy[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
}
// 'z' polarised source
else if (polarisation == 2) {
int materialHz = ID[INDEX4D_ID(5,i,j,k)];
Hz[INDEX3D_FIELDS(i,j,k)] = Hz[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
}
}
}
//////////////////////////////////////////
// Voltage source electric field update //
//////////////////////////////////////////
__global__ void update_voltage_source(int NVOLTSRC, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez) {
// This function updates electric field values for voltage sources.
//
// Args:
// NVOLTSRC: Total number of voltage sources in the model
// iteration: Iteration number of simulation
// dx, dy, dz: Spatial discretisations
// srcinfo1: Source cell coordinates and polarisation information
// srcinfo2: Other source information, e.g. length, resistance etc...
// srcwaveforms: Source waveform values
// ID, E: Access to ID and field component arrays
// Obtain the linear index corresponding to the current thread and use for each receiver
int src = blockIdx.x * blockDim.x + threadIdx.x;
if (src < NVOLTSRC) {
$REAL resistance;
int i, j, k, polarisation;
i = srcinfo1[INDEX2D_SRCINFO(src,0)];
j = srcinfo1[INDEX2D_SRCINFO(src,1)];
k = srcinfo1[INDEX2D_SRCINFO(src,2)];
polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
resistance = srcinfo2[src];
// 'x' polarised source
if (polarisation == 0) {
if (resistance != 0) {
int materialEx = ID[INDEX4D_ID(0,i,j,k)];
Ex[INDEX3D_FIELDS(i,j,k)] = Ex[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dy * dz));
}
else {
Ex[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dx;
}
}
// 'y' polarised source
else if (polarisation == 1) {
if (resistance != 0) {
int materialEy = ID[INDEX4D_ID(1,i,j,k)];
Ey[INDEX3D_FIELDS(i,j,k)] = Ey[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dx * dz));
}
else {
Ey[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dy;
}
}
// 'z' polarised source
else if (polarisation == 2) {
if (resistance != 0) {
int materialEz = ID[INDEX4D_ID(2,i,j,k)];
Ez[INDEX3D_FIELDS(i,j,k)] = Ez[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dx * dy));
}
else {
Ez[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dz;
}
}
}
}
""")

查看文件

@@ -215,6 +215,52 @@ class MagneticDipole(Source):
Hz[i, j, k] -= updatecoeffsH[ID[G.IDlookup[componentID], i, j, k], 4] * self.waveformvaluesM[iteration] * (1 / (G.dx * G.dy * G.dz))
def gpu_initialise_src_arrays(sources, G):
"""Initialise arrays on GPU for source coordinates/polarisation, other source information, and source waveform values.
Args:
sources (list): List of sources of one class, e.g. HertzianDipoles.
G (class): Grid class instance - holds essential parameters describing the model.
Returns:
srcinfo1_gpu (int): numpy array of source cell coordinates and polarisation information.
srcinfo2_gpu (float): numpy array of other source information, e.g. length, resistance etc...
srcwaves_gpu (float): numpy array of source waveform values.
"""
import pycuda.gpuarray as gpuarray
srcinfo1 = np.zeros((len(sources), 4), dtype=np.int32)
srcinfo2 = np.zeros((len(sources)), dtype=floattype)
srcwaves = np.zeros((len(sources), G.iterations), dtype=floattype)
for i, src in enumerate(sources):
srcinfo1[i, 0] = src.xcoord
srcinfo1[i, 1] = src.ycoord
srcinfo1[i, 2] = src.zcoord
if src.polarisation == 'x':
srcinfo1[i, 3] = 0
elif src.polarisation == 'y':
srcinfo1[i, 3] = 1
elif src.polarisation == 'z':
srcinfo1[i, 3] = 2
if src.__class__.__name__ == 'HertzianDipole':
srcinfo2[i] = src.dl
srcwaves[i, :] = src.waveformvaluesJ
elif src.__class__.__name__ == 'VoltageSource':
srcinfo2[i] = src.resistance
srcwaves[i, :] = src.waveformvaluesJ
elif src.__class__.__name__ == 'MagneticDipole':
srcwaves[i, :] = src.waveformvaluesM
srcinfo1_gpu = gpuarray.to_gpu(srcinfo1)
srcinfo2_gpu = gpuarray.to_gpu(srcinfo2)
srcwaves_gpu = gpuarray.to_gpu(srcwaves)
return srcinfo1_gpu, srcinfo2_gpu, srcwaves_gpu
class TransmissionLine(Source):
"""
A transmission line source is a one-dimensional transmission

查看文件

@@ -32,7 +32,10 @@ from colorama import Style
init()
import numpy as np
from gprMax.constants import complextype
from gprMax.constants import floattype
from gprMax.exceptions import GeneralError
from gprMax.materials import Material
def get_terminal_width():
@@ -130,6 +133,11 @@ def round_value(value, decimalplaces=0):
return rounded
def round32(value):
"""Rounds up to nearest multiple of 32."""
return int(32 * np.ceil(float(value) / 32))
def human_size(size, a_kilobyte_is_1024_bytes=False):
"""Convert a file size to human-readable form.
@@ -287,6 +295,64 @@ def get_host_info():
return hostinfo
class GPU(object):
"""GPU information."""
def __init__(self, deviceID):
"""
Args:
deviceID (int): Device ID for GPU.
"""
self.deviceID = deviceID
self.name = None
self.pcibusID = None
self.constmem = None
self.totalmem = None
def get_gpu_info(self, drv):
"""Set information about GPU.
Args:
drv (object): PyCuda driver.
"""
self.name = drv.Device(self.deviceID).name()
self.pcibusID = drv.Device(self.deviceID).pci_bus_id()
self.constmem = drv.Device(self.deviceID).total_constant_memory
self.totalmem = drv.Device(self.deviceID).total_memory()
def detect_gpus():
"""Get information about Nvidia GPU(s).
Returns:
gpus (list): Detected GPU(s) object(s).
"""
try:
import pycuda.driver as drv
except ImportError:
raise ImportError('To use gprMax in GPU mode the pycuda package must be installed, and you must have a NVIDIA CUDA-Enabled GPU (https://developer.nvidia.com/cuda-gpus).')
drv.init()
# Check if there are any CUDA-Enabled GPUs
if drv.Device.count() == 0:
raise GeneralError('No NVIDIA CUDA-Enabled GPUs detected (https://developer.nvidia.com/cuda-gpus)')
# Print information about all detected GPUs
gpus = []
gputext = []
for i in range(drv.Device.count()):
gpu = GPU(deviceID=i)
gpu.get_gpu_info(drv)
gpus.append(gpu)
gputext.append('{} - {}, {} RAM'.format(gpu.deviceID, gpu.name, human_size(gpu.totalmem, a_kilobyte_is_1024_bytes=True)))
print('GPU(s) detected: {}'.format('; '.join(gputext)))
return gpus
def memory_usage(G):
"""Estimate the amount of memory (RAM) required to run a model.
@@ -307,6 +373,7 @@ def memory_usage(G):
# 12 x rigidE array components + 6 x rigidH array components
rigidarrays = (12 + 6) * G.nx * G.ny * G.nz * np.dtype(np.int8).itemsize
# PML arrays
pmlarrays = 0
for (k, v) in G.pmlthickness.items():
if v > 0:
@@ -326,6 +393,11 @@ def memory_usage(G):
pmlarrays += ((G.nx + 1) * G.ny * v)
pmlarrays += (G.nx * (G.ny + 1) * v)
memestimate = int(stdoverhead + fieldarrays + solidarray + rigidarrays + pmlarrays)
# Any dispersive material coefficients
disparrays = 0
if Material.maxpoles != 0:
disparrays = 3 * Material.maxpoles * (G.nx + 1) * (G.ny + 1) * (G.nz + 1) * np.dtype(complextype).itemsize
memestimate = int(stdoverhead + fieldarrays + solidarray + rigidarrays + pmlarrays + disparrays)
return memestimate

26
tests/models_basic/pmls/pmls.in 可执行文件
查看文件

@@ -0,0 +1,26 @@
#title: PMLs
#domain: 0.051 0.126 0.026
#dx_dy_dz: 0.001 0.001 0.001
#time_window: 2100
#pml_cells: 0 0 0 0 0 10
## Built-in 1st order PML
pml_cfs: constant forward 0 0 constant forward 1 1 quartic forward 0 None
## PMLs from http://dx.doi.org/10.1109/TAP.2011.2180344
## Standard PML
pml_cfs: constant forward 0 0 quartic forward 1 11 quartic forward 0 7.427
## CFS PML
pml_cfs: constant forward 0.05 0.05 quartic forward 1 7 quartic forward 0 11.671
## O2 RIPML
#pml_cfs: constant forward 0 0 constant forward 1 1 sextic forward 0 0.5836
#pml_cfs: constant forward 0.05 0.05 cubic forward 1 8 quadratic forward 0 5.8357
#waveform: gaussiandotnorm 1 9.42e9 mypulse
#hertzian_dipole: z 0.013 0.013 0.015 mypulse
#rx: 0.037 0.112 0.015
#box: 0.013 0.013 0.013 0.038 0.113 0.014 pec

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 331 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 330 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 339 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 341 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 317 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 318 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 332 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 330 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 340 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 342 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 316 KiB

二进制文件未显示。

之后

宽度:  |  高度:  |  大小: 318 KiB

查看文件

@@ -44,7 +44,7 @@ basepath += 'basic'
# basepath += 'advanced'
# List of available basic test models
testmodels = ['hertzian_dipole_fs_analytical', '2D_ExHyHz', '2D_EyHxHz', '2D_EzHxHy', 'cylinder_Ascan_2D', 'hertzian_dipole_fs', 'hertzian_dipole_hs', 'hertzian_dipole_dispersive', 'magnetic_dipole_fs']
testmodels = ['hertzian_dipole_fs_analytical', '2D_ExHyHz', '2D_EyHxHz', '2D_EzHxHy', 'cylinder_Ascan_2D', 'hertzian_dipole_fs', 'hertzian_dipole_hs', 'hertzian_dipole_dispersive', 'magnetic_dipole_fs', 'pmls']
# List of available advanced test models
# testmodels = ['antenna_GSSI_1500_fs', 'antenna_MALA_1200_fs']
@@ -63,7 +63,7 @@ for i, model in enumerate(testmodels):
testresults[model] = {}
# Run model
api(os.path.join(basepath, model + os.path.sep + model + '.in'))
api(os.path.join(basepath, model + os.path.sep + model + '.in'), gpu=None)
# Special case for analytical comparison
if model == 'hertzian_dipole_fs_analytical':