Updates for version 3.1.0 - Big Smoke. These updates are predominately to enable gprMax to be run on GPUs using NVIDIA CUDA.

2025-08-04 11:36:52 +08:00 · 2017-06-19 16:24:08 +01:00
--- a/10
+++ b/10
@@ -1,10 +0,0 @@
-# Copyright (C) 2015-2017: The University of Edinburgh
-#                 Authors: Craig Warren and Antonis Giannopoulos
-#
-# This is the official list of people who have contributed to the gprMax repository.
-#
-# Please keep the list sorted by surname.
-
-Oystein Bjorndal
-John Hartley
-Rajath Kumar
--- a/22
+++ b/22
@@ -0,0 +1,22 @@
+# Copyright (C) 2015-2017: The University of Edinburgh
+#                 Authors: Craig Warren and Antonis Giannopoulos
+#
+# This is the official list of entities and people who have contributed to gprMax.
+#
+# Please keep the lists of sorted alphabetically (by surname for individuals).
+
+
+gprMax has been supported through research projects funded by:
+
+The Defence Science and Technology Laboratory (Dstl)
+Google
+
+
+gprMax is a contribution to COST Action TU1208 'Civil Engineering Applications of Ground Penetrating Radar'
+
+
+As well as the aforementioned authors, the following individuals have contributed to gprMax:
+
+Oystein Bjorndal
+John Hartley
+Rajath Kumar
--- a/README.rst
+++ b/README.rst
@@ -168,6 +168,7 @@ Optional command line arguments
 Argument name          Type    Description
 ====================== ======= ===========
 ``-n``                 integer number of times to run the input file. This option can be used to run a series of models, e.g. to create a B-scan with 60 traces: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60``
+``-gpu``               integer NVIDIA CUDA device ID for a specific GPU card. If not specified will default to device ID 0.
 ``-restart``           integer model number to start/restart simulation from. It would typically be used to restart a series of models from a specific model number, with the ``-n`` argument, e.g. to restart from A-scan 45 when creating a B-scan with 60 traces: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 15 -restart 45``
 ``-task``              integer task identifier (model number) when running simulation as a job array on `Open Grid Scheduler/Grid Engine <http://gridscheduler.sourceforge.net/index.html>`_. For further details see the `parallel performance section of the User Guide <http://docs.gprmax.com/en/latest/openmp_mpi.html>`_
 ``-mpi``               integer number of Message Passing Interface (MPI) tasks, i.e. master + workers, for MPI task farm. This option is most usefully combined with ``-n`` to allow individual models to be farmed out using a MPI task farm, e.g. to create a B-scan with 60 traces and use MPI to farm out each trace: ``(gprMax)$ python -m gprMax user_models/cylinder_Bscan_2D.in -n 60 -mpi 61``. For further details see the `parallel performance section of the User Guide <http://docs.gprmax.com/en/latest/openmp_mpi.html>`_
--- a/conda_env.yml
+++ b/conda_env.yml
@@ -1,4 +1,4 @@
-# This file may be used to create an environment using:
+# This file may be used to create a conda environment using:
 # $ conda create --name gprMax --file <this file>

 name: gprMax
@@ -9,12 +9,28 @@ dependencies:
 - h5py
 - jupyter
 - matplotlib
- mpi4py
 - numpy
 - psutil
 - pyqt=4.11.4
 - scipy

 - pip:
+  - mpi4py
  - terminaltables
  - tqdm
+
+
+#name: gprMax-essential
+#dependencies:
+#- python>=3.4
+#- colorama
+#- cython
+#- h5py
+#- numpy
+#- psutil
+#- scipy
+#
+#- pip:
+#  - pycuda
+#  - terminaltables
+#  - tqdm
--- a/docs/source/gpu.rst
+++ b/docs/source/gpu.rst
@@ -0,0 +1,27 @@
+.. _gpu:
+
+**********
+Using GPUs
+**********
+
+The most computationally intensive parts of gprMax, which are the FDTD solver loops, can optionally be executed using General-purpose computing on graphics processing units (GPGPU). This has been achieved through use of the NVIDIA CUDA programming environment, therefore a `NVIDIA CUDA-Enabled GPU <https://developer.nvidia.com/cuda-gpus>`_ is required to take advantage of the GPU-based solver.
+
+Extra installation steps for GPU usage
+======================================
+
+The following steps provide guidance on how to install the extra components to allow gprMax to run on your GPU:
+
+1. Install the `NVIDIA CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_. You can follow the Installation Guides in the `NVIDIA CUDA Toolkit Documentation <http://docs.nvidia.com/cuda/index.html#installation-guides>`_
+2. Install the pycuda Python module. Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`source activate gprMax` (Linux/macOS) or :code:`activate gprMax` (Windows). Run :code:`conda install pycuda`
+
+
+Run a test model
+================
+
+Open a Terminal (Linux/macOS) or Command Prompt (Windows), navigate into the top-level gprMax directory, and if it is not already active, activate the gprMax conda environment :code:`source activate gprMax` (Linux/macOS) or :code:`activate gprMax` (Windows)
+
+Run one of the test models:
+
+.. code-block:: none
+
+    (gprMax)$ python -m gprMax user_models/cylinder_Ascan_2D.in -gpu
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -31,6 +31,7 @@ gprMax User Guide

    python_scripting
    openmp_mpi
+    gpu

 .. toctree::
    :maxdepth: 2
--- a/docs/source/openmp_mpi.rst
+++ b/docs/source/openmp_mpi.rst
@@ -1,8 +1,8 @@
 .. _openmp-mpi:

-******************
-Parallel execution
-******************
+*********************
+OpenMP, MPI, and HPCs
+*********************

 OpenMP
 ======
--- a/gprMax/_version.py
+++ b/gprMax/_version.py
@@ -1,4 +1,4 @@
 # This is where the version number is set and read by setup.py and conf.py (for the docs)

-__version__ = '3.0.20'
-codename = 'Bowmore'
+__version__ = '3.1.0'
+codename = 'Big Smoke'
--- a/gprMax/constants.pxd
+++ b/gprMax/constants.pxd
@@ -30,5 +30,5 @@ ctypedef np.float32_t floattype_t
 ctypedef np.complex64_t complextype_t

 # Double precision
-#ctypedef np.float64_t floattype_t
-#ctypedef np.complex128_t complextype_t
+# ctypedef np.float64_t floattype_t
+# ctypedef np.complex128_t complextype_t
--- a/gprMax/constants.py
+++ b/gprMax/constants.py
@@ -37,8 +37,8 @@ floattype = np.float32
 complextype = np.complex64

 # For C (CUDA) arrays
-cfloattype = 'float'
-ccomplextype = 'pycuda::complex<float>'
+cudafloattype = 'float'
+cudacomplextype = 'pycuda::complex<float>'

 # Double precision
 # For numpy arrays
@@ -46,5 +46,5 @@ ccomplextype = 'pycuda::complex<float>'
 # complextype = np.complex128

 # For C (CUDA) arrays
-# cfloattype = 'double'
-# ccomplextype = 'pycuda::complex<double>'
+# cudafloattype = 'double'
+# cudacomplextype = 'pycuda::complex<double>'
--- a/gprMax/exceptions.py
+++ b/gprMax/exceptions.py
@@ -23,7 +23,7 @@ from colorama import Fore

 init()

-sys.tracebacklimit = None
+#sys.tracebacklimit = None


 class GeneralError(ValueError):
--- a/gprMax/fields_outputs.py
+++ b/gprMax/fields_outputs.py
@@ -16,6 +16,8 @@
 # You should have received a copy of the GNU General Public License
 # along with gprMax.  If not, see <http://www.gnu.org/licenses/>.

+from string import Template
+
 import h5py

 from gprMax._version import __version__
@@ -46,6 +48,47 @@ def store_outputs(iteration, Ex, Ey, Ez, Hx, Hy, Hz, G):
        tl.Itotal[iteration] = tl.current[tl.antpos]


+kernel_template_store_outputs = Template("""
+
+// Macros for converting subscripts to linear index:
+#define INDEX2D_RXCOORDS(m, n) (m)*($NY_RXCOORDS)+(n)
+#define INDEX3D_RXS(i, j, k) (i)*($NY_RXS)*($NZ_RXS)+(j)*($NZ_RXS)+(k)
+#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
+
+//////////////////////////////////////////////////////
+// Stores field component values for every receiver //
+//////////////////////////////////////////////////////
+
+__global__ void store_outputs(int NRX, int iteration, const int* __restrict__ rxcoords, $REAL *rxs, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
+
+    //  This function stores field component values for every receiver in the model.
+    //
+    //  Args:
+    //      NRX: Total number of receivers in the model
+    //      rxs: Array to store field components for receivers - rows are field components; columns are iterations; pages are receivers
+    //      E, H: Access to field component arrays
+    
+    // Obtain the linear index corresponding to the current thread and use for each receiver
+    int rx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    int i, j, k;
+
+    if (rx < NRX) {
+        i = rxcoords[INDEX2D_RXCOORDS(rx,0)];
+        j = rxcoords[INDEX2D_RXCOORDS(rx,1)];
+        k = rxcoords[INDEX2D_RXCOORDS(rx,2)];
+        rxs[INDEX3D_RXS(0,iteration,rx)] = Ex[INDEX3D_FIELDS(i,j,k)];
+        rxs[INDEX3D_RXS(1,iteration,rx)] = Ey[INDEX3D_FIELDS(i,j,k)];
+        rxs[INDEX3D_RXS(2,iteration,rx)] = Ez[INDEX3D_FIELDS(i,j,k)];
+        rxs[INDEX3D_RXS(3,iteration,rx)] = Hx[INDEX3D_FIELDS(i,j,k)];
+        rxs[INDEX3D_RXS(4,iteration,rx)] = Hy[INDEX3D_FIELDS(i,j,k)];
+        rxs[INDEX3D_RXS(5,iteration,rx)] = Hz[INDEX3D_FIELDS(i,j,k)];
+    }
+}
+
+""")
+
+
 def write_hdf5_outputfile(outputfile, Ex, Ey, Ez, Hx, Hy, Hz, G):
    """Write an output file in HDF5 format.

--- a/gprMax/fields_updates_gpu.py
+++ b/gprMax/fields_updates_gpu.py
@@ -0,0 +1,242 @@
+# Copyright (C) 2015-2017: The University of Edinburgh
+#                 Authors: Craig Warren and Antonis Giannopoulos
+#
+# This file is part of gprMax.
+#
+# gprMax is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# gprMax is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with gprMax.  If not, see <http://www.gnu.org/licenses/>.
+
+from string import Template
+
+kernels_template_fields = Template("""
+
+#include <pycuda-complex.hpp>
+
+// Macros for converting subscripts to linear index:
+#define INDEX2D_MAT(m, n) (m)*($NY_MATCOEFFS)+(n)
+#define INDEX2D_MATDISP(m, n) (m)*($NY_MATDISPCOEFFS)+(n)
+#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
+#define INDEX4D_ID(p, i, j, k) (p)*($NX_ID)*($NY_ID)*($NZ_ID)+(i)*($NY_ID)*($NZ_ID)+(j)*($NZ_ID)+(k)
+#define INDEX4D_T(p, i, j, k) (p)*($NX_T)*($NY_T)*($NZ_T)+(i)*($NY_T)*($NZ_T)+(j)*($NZ_T)+(k)
+
+// Material coefficients (read-only) in constant memory (64KB) 
+__device__ __constant__ $REAL updatecoeffsE[$N_updatecoeffsE];
+__device__ __constant__ $REAL updatecoeffsH[$N_updatecoeffsH];
+
+/////////////////////////////////////////////////
+// Electric field updates - standard materials //
+/////////////////////////////////////////////////
+
+__global__ void update_e(int NX, int NY, int NZ, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
+
+    //  This function updates electric field values.
+    //
+    //  Args:
+    //      NX, NY, NZ: Number of cells of the model domain
+    //      ID, E, H: Access to ID and field component arrays
+    
+    // Obtain the linear index corresponding to the current thread
+    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    // Convert the linear index to subscripts for 3D field arrays
+    int i = idx / ($NY_FIELDS * $NZ_FIELDS);
+    int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
+    int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
+
+    // Convert the linear index to subscripts for 4D material ID array
+    int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
+    int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
+    int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
+
+    // Ex component
+    if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
+        int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
+        Ex[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEx,0)] * Ex[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEx,2)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEx,3)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i,j,k-1)]);
+    }
+    
+    // Ey component
+    if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
+        int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
+        Ey[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEy,0)] * Ey[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEy,3)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEy,1)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i-1,j,k)]);
+    }
+    
+    // Ez component
+    if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
+        int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
+        Ez[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEz,0)] * Ez[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEz,1)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,2)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j-1,k)]);
+    }
+}
+
+///////////////////////////////////////////////////
+// Electric field updates - dispersive materials //
+///////////////////////////////////////////////////
+
+__global__ void update_e_dispersive_A(int NX, int NY, int NZ, int MAXPOLES, const $COMPLEX* __restrict__ updatecoeffsdispersive, $COMPLEX *Tx, $COMPLEX *Ty, $COMPLEX *Tz, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez, const $REAL* __restrict__ Hx, const $REAL* __restrict__ Hy, const $REAL* __restrict__ Hz) {
+
+    //  This function is part A of updates to electric field values when dispersive materials (with multiple poles) are present.
+    //
+    //  Args:
+    //      NX, NY, NZ: Number of cells of the model domain
+    //      MAXPOLES: Maximum number of dispersive material poles present in model
+    //      updatedispersivecoeffs, T, ID, E, H: Access to update coefficients, dispersive, ID and field component arrays
+    
+    // Obtain the linear index corresponding to the current thread
+    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    // Convert the linear index to subscripts for 3D field arrays
+    int i = idx / ($NY_FIELDS * $NZ_FIELDS);
+    int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
+    int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
+
+    // Convert the linear index to subscripts for 4D material ID array
+    int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
+    int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
+    int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
+
+    // Convert the linear index to subscripts for 4D dispersive array
+    int i_T = (idx % ($NX_T * $NY_T * $NZ_T)) / ($NY_T * $NZ_T);
+    int j_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) / $NZ_T;
+    int k_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) % $NZ_T;
+
+    // Ex component
+    if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
+        int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
+        $REAL phi = 0;
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,pole*3)].real() * Tx[INDEX4D_T(pole,i_T,j_T,k_T)].real();
+            Tx[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,1+(pole*3))] * Tx[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,2+(pole*3))] * Ex[INDEX3D_FIELDS(i,j,k)];
+        }
+        Ex[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEx,0)] * Ex[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEx,2)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEx,3)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * phi;
+    }
+    
+    // Ey component
+    if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
+        int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
+        $REAL phi = 0;
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,pole*3)].real() * Ty[INDEX4D_T(pole,i_T,j_T,k_T)].real();
+            Ty[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,1+(pole*3))] * Ty[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,2+(pole*3))] * Ey[INDEX3D_FIELDS(i,j,k)];
+        }
+        Ey[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEy,0)] * Ey[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEy,3)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j,k-1)]) - updatecoeffsE[INDEX2D_MAT(materialEy,1)] * (Hz[INDEX3D_FIELDS(i,j,k)] - Hz[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * phi;
+    }
+    
+    // Ez component
+    if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
+        int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
+        $REAL phi = 0;
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            phi = phi + updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,pole*3)].real() * Tz[INDEX4D_T(pole,i_T,j_T,k_T)].real();
+            Tz[INDEX4D_T(pole,i_T,j_T,k_T)] = updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,1+(pole*3))] * Tz[INDEX4D_T(pole,i_T,j_T,k_T)] + updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,2+(pole*3))] * Ez[INDEX3D_FIELDS(i,j,k)];
+        }
+        Ez[INDEX3D_FIELDS(i,j,k)] = updatecoeffsE[INDEX2D_MAT(materialEz,0)] * Ez[INDEX3D_FIELDS(i,j,k)] + updatecoeffsE[INDEX2D_MAT(materialEz,1)] * (Hy[INDEX3D_FIELDS(i,j,k)] - Hy[INDEX3D_FIELDS(i-1,j,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,2)] * (Hx[INDEX3D_FIELDS(i,j,k)] - Hx[INDEX3D_FIELDS(i,j-1,k)]) - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * phi;
+    }
+}
+
+__global__ void update_e_dispersive_B(int NX, int NY, int NZ, int MAXPOLES, const $COMPLEX* __restrict__ updatecoeffsdispersive, $COMPLEX *Tx, $COMPLEX *Ty, $COMPLEX *Tz, const unsigned int* __restrict__ ID, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez) {
+
+    //  This function is part B which updates the dispersive field arrays when dispersive materials (with multiple poles) are present.
+    //
+    //  Args:
+    //      NX, NY, NZ: Number of cells of the model domain
+    //      MAXPOLES: Maximum number of dispersive material poles present in model
+    //      updatedispersivecoeffs, T, ID, E, H: Access to update coefficients, dispersive, ID and field component arrays
+    
+    // Obtain the linear index corresponding to the current thread
+    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    // Convert the linear index to subscripts for 3D field arrays
+    int i = idx / ($NY_FIELDS * $NZ_FIELDS);
+    int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
+    int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
+
+    // Convert the linear index to subscripts for 4D material ID array
+    int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
+    int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
+    int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
+
+    // Convert the linear index to subscripts for 4D dispersive array
+    int i_T = (idx % ($NX_T * $NY_T * $NZ_T)) / ($NY_T * $NZ_T);
+    int j_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) / $NZ_T;
+    int k_T = ((idx % ($NX_T * $NY_T * $NZ_T)) % ($NY_T * $NZ_T)) % $NZ_T;
+
+    // Ex component
+    if ((NY != 1 || NZ != 1) && i >= 0 && i < NX && j > 0 && j < NY && k > 0 && k < NZ) {
+        int materialEx = ID[INDEX4D_ID(0,i_ID,j_ID,k_ID)];
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            Tx[INDEX4D_T(pole,i_T,j_T,k_T)] = Tx[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEx,2+(pole*3))] * Ex[INDEX3D_FIELDS(i,j,k)];
+        }
+    }
+    
+    // Ey component
+    if ((NX != 1 || NZ != 1) && i > 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
+        int materialEy = ID[INDEX4D_ID(1,i_ID,j_ID,k_ID)];
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            Ty[INDEX4D_T(pole,i_T,j_T,k_T)] = Ty[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEy,2+(pole*3))] * Ey[INDEX3D_FIELDS(i,j,k)];
+        }
+    }
+    
+    // Ez component
+    if ((NX != 1 || NY != 1) && i > 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
+        int materialEz = ID[INDEX4D_ID(2,i_ID,j_ID,k_ID)];
+        for (int pole = 0; pole < MAXPOLES; pole++) {
+            Tz[INDEX4D_T(pole,i_T,j_T,k_T)] = Tz[INDEX4D_T(pole,i_T,j_T,k_T)] - updatecoeffsdispersive[INDEX2D_MATDISP(materialEz,2+(pole*3))] * Ez[INDEX3D_FIELDS(i,j,k)];
+        }
+    }
+}
+
+
+////////////////////////////
+// Magnetic field updates //
+////////////////////////////
+
+__global__ void update_h(int NX, int NY, int NZ, const unsigned int* __restrict__ ID, $REAL *Hx, $REAL *Hy, $REAL *Hz, const $REAL* __restrict__ Ex, const $REAL* __restrict__ Ey, const $REAL* __restrict__ Ez) {
+
+    //  This function updates magnetic field values.
+    //
+    //  Args:
+    //      NX, NY, NZ: Number of cells of the model domain
+    //      ID, E, H: Access to ID and field component arrays
+    
+    // Obtain the linear index corresponding to the current thread
+    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    // Convert the linear index to subscripts for 3D field arrays
+    int i = idx / ($NY_FIELDS * $NZ_FIELDS);
+    int j = (idx % ($NY_FIELDS * $NZ_FIELDS)) / $NZ_FIELDS;
+    int k = (idx % ($NY_FIELDS * $NZ_FIELDS)) % $NZ_FIELDS;
+
+    // Convert the linear index to subscripts for 4D material ID array
+    int i_ID = (idx % ($NX_ID * $NY_ID * $NZ_ID)) / ($NY_ID * $NZ_ID);
+    int j_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) / $NZ_ID;
+    int k_ID = ((idx % ($NX_ID * $NY_ID * $NZ_ID)) % ($NY_ID * $NZ_ID)) % $NZ_ID;
+
+    // Hx component
+    if (NX != 1 && i > 0 && i < NX && j >= 0 && j < NY && k >= 0 && k < NZ) {
+        int materialHx = ID[INDEX4D_ID(3,i_ID,j_ID,k_ID)];
+        Hx[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHx,0)] * Hx[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHx,2)] * (Ez[INDEX3D_FIELDS(i,j+1,k)] - Ez[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHx,3)] * (Ey[INDEX3D_FIELDS(i,j,k+1)] - Ey[INDEX3D_FIELDS(i,j,k)]);
+    }
+    
+    // Hy component
+    if (NY != 1 && i >= 0 && i < NX && j > 0 && j < NY && k >= 0 && k < NZ) {
+        int materialHy = ID[INDEX4D_ID(4,i_ID,j_ID,k_ID)];
+        Hy[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHy,0)] * Hy[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHy,3)] * (Ex[INDEX3D_FIELDS(i,j,k+1)] - Ex[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHy,1)] * (Ez[INDEX3D_FIELDS(i+1,j,k)] - Ez[INDEX3D_FIELDS(i,j,k)]);
+    }
+    
+    // Hz component
+    if (NZ != 1 && i >= 0 && i < NX && j >= 0 && j < NY && k > 0 && k < NZ) {
+        int materialHz = ID[INDEX4D_ID(5,i_ID,j_ID,k_ID)];
+        Hz[INDEX3D_FIELDS(i,j,k)] = updatecoeffsH[INDEX2D_MAT(materialHz,0)] * Hz[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHz,1)] * (Ey[INDEX3D_FIELDS(i+1,j,k)] - Ey[INDEX3D_FIELDS(i,j,k)]) + updatecoeffsH[INDEX2D_MAT(materialHz,2)] * (Ex[INDEX3D_FIELDS(i,j+1,k)] - Ex[INDEX3D_FIELDS(i,j,k)]);
+    }
+}
+
+""")
--- a/gprMax/fractals.py
+++ b/gprMax/fractals.py
@@ -20,6 +20,8 @@ import numpy as np

 from gprMax.constants import floattype
 from gprMax.constants import complextype
+from gprMax.fractals_generate import generate_fractal2D
+from gprMax.fractals_generate import generate_fractal3D
 from gprMax.utilities import round_value

 np.seterr(divide='raise')
@@ -58,8 +60,12 @@ class FractalSurface(object):
        self.filldepth = 0
        self.grass = []

-    def generate_fractal_surface(self):
-        """Generate a 2D array with a fractal distribution."""
+    def generate_fractal_surface(self, G):
+        """Generate a 2D array with a fractal distribution.
+            
+        Args:
+            G (class): Grid class instance - holds essential parameters describing the model.
+        """

        if self.xs == self.xf:
            surfacedims = (self.ny, self.nz)
@@ -82,16 +88,8 @@ class FractalSurface(object):
        # Shift the zero frequency component to the centre of the array
        A = np.fft.fftshift(A)

-        for i in range(surfacedims[0]):
-            for j in range(surfacedims[1]):
-                # Positional vector for current position
-                v2 = np.array([self.weighting[0] * i, self.weighting[1] * j])
-                rr = np.linalg.norm(v2 - v1)
-                try:
-                    self.fractalsurface[i, j] = A[i, j] * 1 / (rr**self.b)
-                except FloatingPointError:
-                    rr = 0.9
-                    self.fractalsurface[i, j] = A[i, j] * 1 / (rr**self.b)
+        # Generate fractal
+        generate_fractal2D(surfacedims[0], surfacedims[1], G.nthreads, self.b, self.weighting, v1, A, self.fractalsurface)

        # Shift the zero frequency component to start of the array
        self.fractalsurface = np.fft.ifftshift(self.fractalsurface)
@@ -135,8 +133,12 @@ class FractalVolume(object):
        self.nbins = 0
        self.fractalsurfaces = []

-    def generate_fractal_volume(self):
-        """Generate a 3D volume with a fractal distribution."""
+    def generate_fractal_volume(self, G):
+        """Generate a 3D volume with a fractal distribution.
+            
+        Args:
+            G (class): Grid class instance - holds essential parameters describing the model.
+        """

        # Scale filter according to size of fractal volume
        if self.nx == 1:
@@ -168,17 +170,8 @@ class FractalVolume(object):
        # Shift the zero frequency component to the centre of the array
        A = np.fft.fftshift(A)

-        for i in range(self.nx):
-            for j in range(self.ny):
-                for k in range(self.nz):
-                    # Positional vector for current position
-                    v2 = np.array([self.weighting[0] * i, self.weighting[1] * j, self.weighting[2] * k])
-                    rr = np.linalg.norm(v2 - v1)
-                    try:
-                        self.fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**self.b)
-                    except FloatingPointError:
-                        rr = 0.9
-                        self.fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**self.b)
+        # Generate fractal
+        generate_fractal3D(self.nx, self.ny, self.nz, G.nthreads, self.b, self.weighting, v1, A, self.fractalvolume)

        # Shift the zero frequency component to the start of the array
        self.fractalvolume = np.fft.ifftshift(self.fractalvolume)
--- a/gprMax/fractals_generate.pyx
+++ b/gprMax/fractals_generate.pyx
@@ -0,0 +1,89 @@
+# Copyright (C) 2015-2017: The University of Edinburgh
+#                 Authors: Craig Warren and Antonis Giannopoulos
+#
+# This file is part of gprMax.
+#
+# gprMax is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# gprMax is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with gprMax.  If not, see <http://www.gnu.org/licenses/>.
+
+import numpy as np
+cimport numpy as np
+from cython.parallel import prange
+
+from gprMax.constants cimport complextype_t
+
+
+cpdef void generate_fractal2D(int nx, int ny, int nthreads, int b, np.float64_t[:] weighting, np.float64_t[:] v1, np.complex128_t[:, ::1] A, complextype_t[:, ::1] fractalsurface):
+    """This function generates a fractal surface for a 2D array.
+        
+    Args:
+        nx, ny (int): Fractal surface size in cells
+        nthreads (int): Number of threads to use
+        b (int): Constant related to fractal dimension
+        weighting (memoryview): Access to weighting vector
+        v1 (memoryview): Access to positional vector at centre of array, scaled by weighting
+        A (memoryview): Access to array containing random numbers (to be convolved with fractal function)
+        fractalsurface (memoryview): Access to array containing fractal surface data
+    """
+
+    cdef Py_ssize_t i, j
+    cdef float v2x, v2y, rr
+
+    for i in prange(nx, nogil=True, schedule='static', num_threads=nthreads):
+        for j in range(ny):
+                # Positional vector for current position
+                v2x = weighting[0] * i
+                v2y = weighting[1] * j
+
+                # Calulate norm of v2 - v1
+                rr = ((v2x - v1[0])**2 + (v2y - v1[1])**2)**(1/2)
+
+                # Catch potential divide by zero
+                if rr == 0:
+                    rr = 0.9
+
+                fractalsurface[i, j] = A[i, j] * 1 / (rr**b)
+
+
+cpdef void generate_fractal3D(int nx, int ny, int nz, int nthreads, int b, np.float64_t[:] weighting, np.float64_t[:] v1, np.complex128_t[:, :, ::1] A, complextype_t[:, :, ::1] fractalvolume):
+    """This function generates a fractal volume for a 3D array.
+
+    Args:
+        nx, ny, nz (int): Fractal volume size in cells
+        nthreads (int): Number of threads to use
+        b (int): Constant related to fractal dimension
+        weighting (memoryview): Access to weighting vector
+        v1 (memoryview): Access to positional vector at centre of array, scaled by weighting
+        A (memoryview): Access to array containing random numbers (to be convolved with fractal function)
+        fractalvolume (memoryview): Access to array containing fractal volume data
+    """
+
+    cdef Py_ssize_t i, j, k
+    cdef float v2x, v2y, v2z, rr
+
+    for i in prange(nx, nogil=True, schedule='static', num_threads=nthreads):
+        for j in range(ny):
+            for k in range(nz):
+                # Positional vector for current position
+                v2x = weighting[0] * i
+                v2y = weighting[1] * j
+                v2z = weighting[2] * k
+
+                # Calulate norm of v2 - v1
+                rr = ((v2x - v1[0])**2 + (v2y - v1[1])**2 + (v2z - v1[2])**2)**(1/2)
+
+                # Catch potential divide by zero
+                if rr == 0:
+                    rr = 0.9
+
+                fractalvolume[i, j, k] = A[i, j, k] * 1 / (rr**b)
--- a/gprMax/gprMax.py
+++ b/gprMax/gprMax.py
@@ -36,6 +36,7 @@ from gprMax.constants import m0
 from gprMax.constants import z0
 from gprMax.exceptions import GeneralError
 from gprMax.model_build_run import run_model
+from gprMax.utilities import detect_gpus
 from gprMax.utilities import get_host_info
 from gprMax.utilities import get_terminal_width
 from gprMax.utilities import human_size
@@ -57,6 +58,7 @@ def main():
    parser.add_argument('-restart', type=int, help='model number to restart from, e.g. when creating B-scan')
    parser.add_argument('-mpi', type=int, help='number of MPI tasks, i.e. master + workers')
    parser.add_argument('--mpi-worker', action='store_true', default=False, help=argparse.SUPPRESS)
+    parser.add_argument('-gpu', type=int, action='append', nargs='?', const=True, help='flag to use Nvidia GPU (option to give device ID)')
    parser.add_argument('-benchmark', action='store_true', default=False, help='flag to switch on benchmarking mode')
    parser.add_argument('--geometry-only', action='store_true', default=False, help='flag to only build model and produce geometry file(s)')
    parser.add_argument('--geometry-fixed', action='store_true', default=False, help='flag to not reprocess model geometry, e.g. for B-scans where the geometry is fixed')
@@ -73,6 +75,7 @@ def api(
            task=None,
            restart=None,
            mpi=False,
+            gpu=None,
            benchmark=False,
            geometry_only=False,
            geometry_fixed=False,
@@ -94,6 +97,7 @@ def api(
    args.task = task
    args.restart = restart
    args.mpi = mpi
+    args.gpu = gpu
    args.benchmark = benchmark
    args.geometry_only = geometry_only
    args.geometry_fixed = geometry_fixed
@@ -117,6 +121,34 @@ def run_main(args):
        hostinfo = get_host_info()
        hyperthreading = ', {} cores with Hyper-Threading'.format(hostinfo['logicalcores']) if hostinfo['hyperthreading'] else ''
        print('\nHost: {}; {} x {} ({} cores{}); {} RAM; {}'.format(hostinfo['machineID'], hostinfo['sockets'], hostinfo['cpuID'], hostinfo['physicalcores'], hyperthreading, human_size(hostinfo['ram'], a_kilobyte_is_1024_bytes=True), hostinfo['osversion']))
+        
+        # Get information/setup Nvidia GPU(s)
+        if args.gpu is not None:
+            # Extract first item of list, either True to automatically determine device ID,
+            # or an integer to manually specify device ID
+            args.gpu = args.gpu[0]
+            gpus = detect_gpus()
+
+            # If a device ID is specified check it is valid
+            if not isinstance(args.gpu, bool):
+                if args.gpu > len(gpus) - 1:
+                    raise GeneralError('GPU with device ID {} does not exist'.format(args.gpu))
+                # Set args.gpu to GPU object to access elsewhere
+                args.gpu = next(gpu for gpu in gpus if gpu.deviceID == args.gpu)
+
+            # If no device ID is specified
+            else:
+                # If in MPI mode then set args.gpu to list of available GPUs
+                if args.mpi:
+                    if args.mpi - 1 > len(gpus):
+                        raise GeneralError('Too many MPI tasks requested ({}). The number of MPI tasks requested can only be a maximum of the number of GPU(s) detected plus one, i.e. {} GPU worker tasks + 1 CPU master task'.format(args.mpi, len(gpus)))
+                    args.gpu = gpus
+                # If benchmarking mode then set args.gpu to list of available GPUs
+                elif args.benchmark:
+                    args.gpu = gpus
+                # Otherwise set args.gpu to GPU object with default device ID (0) to access elsewhere
+                else:
+                    args.gpu = next(gpu for gpu in gpus if gpu.deviceID == 0)

        # Create a separate namespace that users can access in any Python code blocks in the input file
        usernamespace = {'c': c, 'e0': e0, 'm0': m0, 'z0': z0, 'number_model_runs': args.n, 'inputfile': os.path.abspath(inputfile.name)}
@@ -239,13 +271,41 @@ def run_benchmark_sim(args, inputfile, usernamespace):
    cputimes = np.zeros(len(cputhreads))

    numbermodelruns = len(cputhreads)
-    modelend = numbermodelruns + 1
+
+    # Both CPU and GPU benchmarking
+    gpus = None
+    gpuIDs = []
+    gputimes = np.array([])
+    if args.gpu is not None:
+        # Set size of array to store GPU runtimes and number of runs of model required
+        if isinstance(args.gpu, list):
+            for gpu in args.gpu:
+                gpuIDs.append(gpu.name)
+            gputimes = np.zeros(len(args.gpu))
+            numbermodelruns += len(args.gpu)
+        else:
+            gpuIDs.append(args.gpu.name)
+            gputimes = np.zeros(1)
+            numbermodelruns += 1
+        # Store GPU information in a temp variable and set args.gpu to None to do CPU benchmarking first
+        gpus = args.gpu
+        args.gpu = None

    usernamespace['number_model_runs'] = numbermodelruns
+    modelend = numbermodelruns + 1

    for currentmodelrun in range(1, modelend):
-        os.environ['OMP_NUM_THREADS'] = str(cputhreads[currentmodelrun - 1])
-        cputimes[currentmodelrun - 1] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)
+        # Set args.gpu if doing GPU benchmark
+        if currentmodelrun > len(cputhreads):
+            if isinstance(gpus, list):
+                args.gpu = gpus[(currentmodelrun - 1) - len(cputhreads)]
+            else:
+                args.gpu = gpus
+            # del os.environ['OMP_NUM_THREADS']
+            gputimes[(currentmodelrun - 1) - len(cputhreads)] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)
+        else:
+            os.environ['OMP_NUM_THREADS'] = str(cputhreads[currentmodelrun - 1])
+            cputimes[currentmodelrun - 1] = run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, usernamespace)

        # Get model size (in cells) and number of iterations
        if currentmodelrun == 1:
@@ -340,7 +400,15 @@ def run_mpi_sim(args, inputfile, usernamespace, optparams=None):
        for work in iter(lambda: comm.sendrecv(0, dest=0), StopIteration):
            currentmodelrun = work['currentmodelrun']

+            # Get info and setup device ID for GPU(s)
            gpuinfo = ''
+            if args.gpu is not None:
+                # Set device ID for multiple GPUs
+                if isinstance(args.gpu, list):
+                    deviceID = (rank - 1) % len(args.gpu)
+                    args.gpu = next(gpu for gpu in args.gpu if gpu.deviceID == deviceID)
+                gpuinfo = ' using {} - {}, {} RAM '.format(args.gpu.deviceID, args.gpu.name, human_size(args.gpu.totalmem, a_kilobyte_is_1024_bytes=True))
+
            print('MPI worker rank {} (PID {}) starting model {}/{}{} on {}'.format(rank, os.getpid(), currentmodelrun, numbermodelruns, gpuinfo, name))

            # If Taguchi optimistaion, add specific value for each parameter to
--- a/gprMax/grid.py
+++ b/gprMax/grid.py
@@ -87,8 +87,18 @@ class FDTDGrid(Grid):
        self.messages = True
        self.tqdmdisable = False

+        # Get information about host machine
+        self.hostinfo = None
+        
        # CPU - OpenMP threads
        self.nthreads = 0
+        
+        # GPU
+        # Threads per block
+        self.tpb = (256, 1, 1)
+        
+        # GPU object
+        self.gpu = None

        # Threshold (dB) down from maximum power (0dB) of main frequency used
        # to calculate highest frequency for numerical dispersion analysis
@@ -168,6 +178,33 @@ class FDTDGrid(Grid):
        self.Tz = np.zeros((Material.maxpoles, self.nx + 1, self.ny + 1, self.nz + 1), dtype=complextype)
        self.updatecoeffsdispersive = np.zeros((len(self.materials), 3 * Material.maxpoles), dtype=complextype)

+    def gpu_set_blocks_per_grid(self):
+        """Set the blocks per grid size used for updating the electric and magnetic field arrays on a GPU."""
+        self.bpg = (int(np.ceil(((self.nx + 1) * (self.ny + 1) * (self.nz + 1)) / self.tpb[0])), 1, 1)
+
+    def gpu_initialise_arrays(self):
+        """Initialise standard field arrays on GPU."""
+
+        import pycuda.gpuarray as gpuarray
+
+        self.ID_gpu = gpuarray.to_gpu(self.ID)
+        self.Ex_gpu = gpuarray.to_gpu(self.Ex)
+        self.Ey_gpu = gpuarray.to_gpu(self.Ey)
+        self.Ez_gpu = gpuarray.to_gpu(self.Ez)
+        self.Hx_gpu = gpuarray.to_gpu(self.Hx)
+        self.Hy_gpu = gpuarray.to_gpu(self.Hy)
+        self.Hz_gpu = gpuarray.to_gpu(self.Hz)
+
+    def gpu_initialise_dispersive_arrays(self):
+        """Initialise dispersive material coefficient arrays on GPU."""
+
+        import pycuda.gpuarray as gpuarray
+
+        self.Tx_gpu = gpuarray.to_gpu(self.Tx)
+        self.Ty_gpu = gpuarray.to_gpu(self.Ty)
+        self.Tz_gpu = gpuarray.to_gpu(self.Tz)
+        self.updatecoeffsdispersive_gpu = gpuarray.to_gpu(self.updatecoeffsdispersive)
+

 def dispersion_analysis(G):
    """
--- a/gprMax/input_cmds_geometry.py
+++ b/gprMax/input_cmds_geometry.py
@@ -876,7 +876,7 @@ def process_geometrycmds(geometry, G):
            volume.operatingonID = tmp[12]
            volume.nbins = nbins
            volume.seed = seed
-            volume.weighting = (float(tmp[8]), float(tmp[9]), float(tmp[10]))
+            volume.weighting = np.array([float(tmp[8]), float(tmp[9]), float(tmp[10])])
            try:
                volume.averaging = averagefractalbox
            except:
@@ -995,14 +995,14 @@ def process_geometrycmds(geometry, G):
                        surface.fractalrange = fractalrange
                        surface.operatingonID = volume.ID
                        surface.seed = seed
-                        surface.weighting = (float(tmp[8]), float(tmp[9]))
+                        surface.weighting = np.array([float(tmp[8]), float(tmp[9])])

                        # List of existing surfaces IDs
                        existingsurfaceIDs = [x.surfaceID for x in volume.fractalsurfaces]
                        if surface.surfaceID in existingsurfaceIDs:
                            raise CmdInputError("'" + ' '.join(tmp) + "'" + ' has already been used on the {} surface'.format(surface.surfaceID))

-                        surface.generate_fractal_surface()
+                        surface.generate_fractal_surface(G)
                        volume.fractalsurfaces.append(surface)

                        if G.messages:
@@ -1212,7 +1212,7 @@ def process_geometrycmds(geometry, G):
                        # Set the fractal range to scale the fractal distribution between zero and one
                        surface.fractalrange = (0, 1)
                        surface.operatingonID = volume.ID
-                        surface.generate_fractal_surface()
+                        surface.generate_fractal_surface(G)
                        if numblades > surface.fractalsurface.shape[0] * surface.fractalsurface.shape[1]:
                            raise CmdInputError("'" + ' '.join(tmp) + "'" + ' the specified surface is not large enough for the number of grass blades/roots specified')

@@ -1308,7 +1308,7 @@ def process_geometrycmds(geometry, G):
                    materialnumID = next(x.numID for x in G.materials if x.ID == volume.operatingonID)
                    volume.fractalvolume *= materialnumID
                else:
-                    volume.generate_fractal_volume()
+                    volume.generate_fractal_volume(G)
                    volume.fractalvolume += mixingmodel.startmaterialnum

                volume.generate_volume_mask()
@@ -1521,7 +1521,7 @@ def process_geometrycmds(geometry, G):
                if volume.nbins == 1:
                    raise CmdInputError("'" + ' '.join(tmp) + "'" + ' is being used with a single material and no modifications, therefore please use a #box command instead.')
                else:
-                    volume.generate_fractal_volume()
+                    volume.generate_fractal_volume(G)
                    volume.fractalvolume += mixingmodel.startmaterialnum

                data = volume.fractalvolume.astype('int16', order='C')
--- a/gprMax/input_cmds_multiuse.py
+++ b/gprMax/input_cmds_multiuse.py
@@ -295,6 +295,10 @@ def process_multicmds(multicmds, G):
            tmp = cmdinstance.split()
            if len(tmp) < 6:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires at least six parameters')
+            
+            # Warn about using a transmission line on GPU
+            if G.gpu is not None:
+                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' A #transmission_line cannot currently be used with GPU solving. Consider using a #voltage_source instead.')

            # Check polarity & position parameters
            polarisation = tmp[0].lower()
@@ -386,7 +390,10 @@ def process_multicmds(multicmds, G):
            else:
                r.ID = tmp[3]
                # Get allowable outputs
-                allowableoutputs = Rx.allowableoutputs
+                if G.gpu is not None:
+                    allowableoutputs = Rx.gpu_allowableoutputs
+                else:
+                    allowableoutputs = Rx.allowableoutputs
                # Check and add field output names
                for field in tmp[4::]:
                    if field in allowableoutputs:
@@ -428,17 +435,17 @@ def process_multicmds(multicmds, G):
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the lower coordinates should be less than the upper coordinates')
            if dx < 0 or dy < 0 or dz < 0:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
-            if dx < G.dx:
+            if dx < 1:
                if dx == 0:
                    dx = 1
                else:
                    raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
-            if dy < G.dy:
+            if dy < 1:
                if dy == 0:
                    dy = 1
                else:
                    raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
-            if dz < G.dz:
+            if dz < 1:
                if dz == 0:
                    dz = 1
                else:
@@ -471,6 +478,10 @@ def process_multicmds(multicmds, G):
            tmp = cmdinstance.split()
            if len(tmp) != 11:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires exactly eleven parameters')
+            
+            # Warn about using snapshots on GPU
+            if G.gpu is not None:
+                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' The #snapshot command cannot currently be used with GPU solving.')

            xs = G.calculate_coord('x', tmp[0])
            ys = G.calculate_coord('y', tmp[1])
@@ -502,7 +513,7 @@ def process_multicmds(multicmds, G):
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the lower coordinates should be less than the upper coordinates')
            if dx < 0 or dy < 0 or dz < 0:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
-            if dx < G.dx or dy < G.dy or dz < G.dz:
+            if dx < 1 or dy < 1 or dz < 1:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
            if time <= 0 or time > G.iterations:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' time value is not valid')
@@ -724,7 +735,7 @@ def process_multicmds(multicmds, G):
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than zero')
            if dx > G.nx or dy > G.ny or dz > G.nz:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should be less than the domain size')
-            if dx < G.dx or dy < G.dy or dz < G.dz:
+            if dx < 1 or dy < 1 or dz < 1:
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' the step size should not be less than the spatial discretisation')
            if tmp[10].lower() != 'n' and tmp[10].lower() != 'f':
                raise CmdInputError("'" + cmdname + ': ' + ' '.join(tmp) + "'" + ' requires type to be either n (normal) or f (fine)')
--- a/gprMax/input_cmds_singleuse.py
+++ b/gprMax/input_cmds_singleuse.py
@@ -77,7 +77,7 @@ def process_singlecmds(singlecmds, G):
    # os.environ['OMP_DISPLAY_ENV'] = 'TRUE' # Prints OMP version and environment variables (useful for debug)

    # Catch bug with Windows Subsystem for Linux (https://github.com/Microsoft/BashOnWindows/issues/785)
-    if 'Microsoft' in hostinfo['osversion']:
+    if 'Microsoft' in G.hostinfo['osversion']:
        os.environ['KMP_AFFINITY'] = 'disabled'
        del os.environ['OMP_PLACES']
        del os.environ['OMP_PROC_BIND']
@@ -99,9 +99,14 @@ def process_singlecmds(singlecmds, G):

    if G.messages:
        print('Number of CPU (OpenMP) threads: {}'.format(G.nthreads))
-    if G.nthreads > hostinfo['physicalcores']:
+    if G.nthreads > G.hostinfo['physicalcores']:
        print(Fore.RED + 'WARNING: You have specified more threads ({}) than available physical CPU cores ({}). This may lead to degraded performance.'.format(G.nthreads, hostinfo['physicalcores']) + Style.RESET_ALL)

+    # Print information about any GPU in use
+    if G.messages:
+        if G.gpu is not None:
+            print('GPU solving using: {} - {}'.format(G.gpu.deviceID, G.gpu.name))
+
    # Spatial discretisation
    cmd = '#dx_dy_dz'
    tmp = [float(x) for x in singlecmds[cmd].split()]
@@ -135,8 +140,13 @@ def process_singlecmds(singlecmds, G):
    # Estimate memory (RAM) usage
    memestimate = memory_usage(G)
    # Check if model can be built and/or run on host
-    if memestimate > hostinfo['ram']:
+    if memestimate > G.hostinfo['ram']:
        raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected!\n'.format(human_size(memestimate), human_size(hostinfo['ram'], a_kilobyte_is_1024_bytes=True)))
+
+    # Check if model can be run on specified GPU if required
+    if G.gpu is not None:
+        if memestimate > G.gpu.totalmem:
+            raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected on specified {} - {} GPU!\n'.format(human_size(memestimate), human_size(G.gpu.totalmem, a_kilobyte_is_1024_bytes=True), G.gpu.deviceID, G.gpu.name))
    if G.messages:
        print('Estimated memory (RAM) required: ~{}'.format(human_size(memestimate)))

--- a/gprMax/model_build_run.py
+++ b/gprMax/model_build_run.py
@@ -23,16 +23,19 @@ import psutil
 import sys
 from time import perf_counter

-from colorama import init, Fore, Style
+from colorama import init
+from colorama import Fore
+from colorama import Style
 init()
 import numpy as np
 from terminaltables import AsciiTable
 from tqdm import tqdm

-from gprMax.constants import floattype, cfloattype, ccomplextype
+from gprMax.constants import floattype, cudafloattype, cudacomplextype
 from gprMax.exceptions import GeneralError

 from gprMax.fields_outputs import store_outputs
+from gprMax.fields_outputs import kernel_template_store_outputs
 from gprMax.fields_outputs import write_hdf5_outputfile

 from gprMax.fields_updates import update_electric
@@ -41,6 +44,7 @@ from gprMax.fields_updates import update_electric_dispersive_multipole_A
 from gprMax.fields_updates import update_electric_dispersive_multipole_B
 from gprMax.fields_updates import update_electric_dispersive_1pole_A
 from gprMax.fields_updates import update_electric_dispersive_1pole_B
+from gprMax.fields_updates_gpu import kernels_template_fields

 from gprMax.grid import FDTDGrid
 from gprMax.grid import dispersion_analysis
@@ -53,9 +57,17 @@ from gprMax.input_cmds_singleuse import process_singlecmds
 from gprMax.materials import Material, process_materials
 from gprMax.pml import PML
 from gprMax.pml import build_pmls
+from gprMax.pml_updates_gpu import kernels_template_pml
+from gprMax.receivers import gpu_initialise_rx_arrays
+from gprMax.receivers import gpu_get_rx_array
+from gprMax.sources import gpu_initialise_src_arrays
+from gprMax.source_updates_gpu import kernels_template_sources
+from gprMax.utilities import get_host_info
 from gprMax.utilities import get_terminal_width
 from gprMax.utilities import human_size
+from gprMax.utilities import memory_usage
 from gprMax.utilities import open_path_file
+from gprMax.utilities import round32
 from gprMax.yee_cell_build import build_electric_components
 from gprMax.yee_cell_build import build_magnetic_components

@@ -90,6 +102,13 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern

        # Initialise an instance of the FDTDGrid class
        G = FDTDGrid()
+        
+        # Get information about host machine
+        G.hostinfo = get_host_info()
+
+        # Single GPU object
+        if args.gpu:
+            G.gpu = args.gpu

        G.inputfilename = os.path.split(inputfile.name)[1]
        G.inputdirectory = os.path.dirname(os.path.abspath(inputfile.name))
@@ -186,6 +205,19 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern
        # Initialise arrays of update coefficients and temporary values if
        # there are any dispersive materials
        if Material.maxpoles != 0:
+            # Update estimated memory (RAM) usage
+            memestimate = memory_usage(G)
+            # Check if model can be built and/or run on host
+            if memestimate > G.hostinfo['ram']:
+                raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected!\n'.format(human_size(memestimate), human_size(G.hostinfo['ram'], a_kilobyte_is_1024_bytes=True)))
+            
+            # Check if model can be run on specified GPU if required
+            if G.gpu is not None:
+                if memestimate > G.gpu.totalmem:
+                    raise GeneralError('Estimated memory (RAM) required ~{} exceeds {} detected on specified {} - {} GPU!\n'.format(human_size(memestimate), human_size(G.gpu.totalmem, a_kilobyte_is_1024_bytes=True), G.gpu.deviceID, G.gpu.name))
+            if G.messages:
+                print('Estimated memory (RAM) required: ~{}'.format(human_size(memestimate)))
+            
            G.initialise_dispersive_arrays()

        # Process complete list of materials - calculate update coefficients,
@@ -271,7 +303,10 @@ def run_model(args, currentmodelrun, modelend, numbermodelruns, inputfile, usern
        print('\nOutput file: {}\n'.format(outputfile))

        # Main FDTD solving functions for either CPU or GPU
-        tsolve = solve_cpu(currentmodelrun, modelend, G)
+        if G.gpu is None:
+            tsolve = solve_cpu(currentmodelrun, modelend, G)
+        else:
+            tsolve = solve_gpu(currentmodelrun, modelend, G)

        # Write an output file in HDF5 format
        write_hdf5_outputfile(outputfile, G.Ex, G.Ey, G.Ez, G.Hx, G.Hy, G.Hz, G)
@@ -358,3 +393,149 @@ def solve_cpu(currentmodelrun, modelend, G):
    tsolve = perf_counter() - tsolvestart

    return tsolve
+
+
+def solve_gpu(currentmodelrun, modelend, G):
+    """Solving using FDTD method on GPU. Implemented using Nvidia CUDA.
+
+    Args:
+        currentmodelrun (int): Current model run number.
+        modelend (int): Number of last model to run.
+        G (class): Grid class instance - holds essential parameters describing the model.
+
+    Returns:
+        tsolve (float): Time taken to execute solving
+    """
+   
+    import pycuda.driver as drv
+    from pycuda.compiler import SourceModule
+    drv.init()
+    
+    # Create device handle and context on specifc GPU device (and make it current context)
+    dev = drv.Device(G.gpu.deviceID)
+    ctx = dev.make_context()
+
+    # Electric and magnetic field updates - prepare kernels, and get kernel functions
+    if Material.maxpoles > 0:
+        kernels_fields = SourceModule(kernels_template_fields.substitute(REAL=cudafloattype, COMPLEX=cudacomplextype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_MATDISPCOEFFS=G.updatecoeffsdispersive.shape[1], NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3], NX_T=G.Tx.shape[1], NY_T=G.Tx.shape[2], NZ_T=G.Tx.shape[3]))
+    else:   # Set to one any substitutions for dispersive materials
+        kernels_fields = SourceModule(kernels_template_fields.substitute(REAL=cudafloattype, COMPLEX=cudacomplextype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_MATDISPCOEFFS=1, NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3], NX_T=1, NY_T=1, NZ_T=1))
+    update_e_gpu = kernels_fields.get_function("update_e")
+    update_h_gpu = kernels_fields.get_function("update_h")
+    
+    # Copy material coefficient arrays to constant memory of GPU (must be <64KB) for fields kernels
+    updatecoeffsE = kernels_fields.get_global('updatecoeffsE')[0]
+    updatecoeffsH = kernels_fields.get_global('updatecoeffsH')[0]
+    if G.updatecoeffsE.nbytes + G.updatecoeffsH.nbytes > G.gpu.constmem:
+        raise GeneralError('Too many materials in the model to fit onto constant memory of size {} on {} - {} GPU'.format(human_size(G.gpu.constmem), G.gpu.deviceID, G.gpu.name))
+    else:
+        drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
+        drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
+
+    # Electric and magnetic field updates - dispersive materials - get kernel functions
+    if Material.maxpoles > 0:  # If there are any dispersive materials (updates are split into two parts as they require present and updated electric field values).
+        update_e_dispersive_A_gpu = kernels_fields.get_function("update_e_dispersive_A")
+        update_e_dispersive_B_gpu = kernels_fields.get_function("update_e_dispersive_B")
+        G.gpu_initialise_dispersive_arrays()
+
+    # Electric and magnetic field updates - set blocks per grid and initialise field arrays on GPU
+    G.gpu_set_blocks_per_grid()
+    G.gpu_initialise_arrays()
+
+    # PML updates
+    if G.pmls:
+        # Prepare kernels
+        kernels_pml = SourceModule(kernels_template_pml.substitute(REAL=cudafloattype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_R=G.pmls[0].ERA.shape[1], NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3]))
+        # Copy material coefficient arrays to constant memory of GPU (must be <64KB) for PML kernels
+        updatecoeffsE = kernels_pml.get_global('updatecoeffsE')[0]
+        updatecoeffsH = kernels_pml.get_global('updatecoeffsH')[0]
+        drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
+        drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
+        # Set block per grid, initialise arrays on GPU, and get kernel functions
+        for pml in G.pmls:
+            pml.gpu_set_blocks_per_grid(G)
+            pml.gpu_initialise_arrays()
+            pml.gpu_get_update_funcs(kernels_pml)
+
+    # Receivers
+    if G.rxs:
+        # Initialise arrays on GPU
+        rxcoords_gpu, rxs_gpu = gpu_initialise_rx_arrays(G)
+        # Prepare kernel and get kernel function
+        kernel_store_outputs = SourceModule(kernel_template_store_outputs.substitute(REAL=cudafloattype, NY_RXCOORDS=3, NX_RXS=6, NY_RXS=G.iterations, NZ_RXS=len(G.rxs), NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2]))
+        store_outputs_gpu = kernel_store_outputs.get_function("store_outputs")
+
+    # Sources - initialise arrays on GPU, prepare kernel and get kernel functions
+    if G.voltagesources + G.hertziandipoles + G.magneticdipoles:
+        kernels_sources = SourceModule(kernels_template_sources.substitute(REAL=cudafloattype, N_updatecoeffsE=G.updatecoeffsE.size, N_updatecoeffsH=G.updatecoeffsH.size, NY_MATCOEFFS=G.updatecoeffsE.shape[1], NY_SRCINFO=4, NY_SRCWAVES=G.iterations, NX_FIELDS=G.Ex.shape[0], NY_FIELDS=G.Ex.shape[1], NZ_FIELDS=G.Ex.shape[2], NX_ID=G.ID.shape[1], NY_ID=G.ID.shape[2], NZ_ID=G.ID.shape[3]))
+        # Copy material coefficient arrays to constant memory of GPU (must be <64KB) for source kernels
+        updatecoeffsE = kernels_sources.get_global('updatecoeffsE')[0]
+        updatecoeffsH = kernels_sources.get_global('updatecoeffsH')[0]
+        drv.memcpy_htod(updatecoeffsE, G.updatecoeffsE)
+        drv.memcpy_htod(updatecoeffsH, G.updatecoeffsH)
+        if G.hertziandipoles:
+            srcinfo1_hertzian_gpu, srcinfo2_hertzian_gpu, srcwaves_hertzian_gpu = gpu_initialise_src_arrays(G.hertziandipoles, G)
+            update_hertzian_dipole_gpu = kernels_sources.get_function("update_hertzian_dipole")
+        if G.magneticdipoles:
+            srcinfo1_magnetic_gpu, srcinfo2_magnetic_gpu, srcwaves_magnetic_gpu = gpu_initialise_src_arrays(G.magneticdipoles, G)
+            update_magnetic_dipole_gpu = kernels_sources.get_function("update_magnetic_dipole")
+        if G.voltagesources:
+            srcinfo1_voltage_gpu, srcinfo2_voltage_gpu, srcwaves_voltage_gpu = gpu_initialise_src_arrays(G.voltagesources, G)
+            update_voltage_source_gpu = kernels_sources.get_function("update_voltage_source")
+
+    # Iteration loop timer
+    iterstart = drv.Event()
+    iterend = drv.Event()
+    iterstart.record()
+
+    for iteration in tqdm(range(G.iterations), desc='Running simulation, model ' + str(currentmodelrun) + '/' + str(modelend), ncols=get_terminal_width() - 1, file=sys.stdout, disable=G.tqdmdisable):
+        
+        # Store field component values for every receiver
+        if G.rxs:
+            store_outputs_gpu(np.int32(len(G.rxs)), np.int32(iteration), rxcoords_gpu.gpudata, rxs_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.rxs)), 1, 1))
+        
+        # Update magnetic field components
+        update_h_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), G.ID_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=G.tpb, grid=G.bpg)
+
+        # Update magnetic field components with the PML correction
+        for pml in G.pmls:
+            pml.gpu_update_magnetic(G)
+
+        # Update magnetic field components for magetic dipole sources
+        if G.magneticdipoles:
+            update_magnetic_dipole_gpu(np.int32(len(G.magneticdipoles)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_magnetic_gpu.gpudata, srcinfo2_magnetic_gpu.gpudata, srcwaves_magnetic_gpu.gpudata, G.ID_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.magneticdipoles)), 1, 1))
+
+        # Update electric field components
+        if Material.maxpoles == 0:  # If all materials are non-dispersive do standard update
+            update_e_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=G.tpb, grid=G.bpg)
+        else:  # If there are any dispersive materials do 1st part of dispersive update (it is split into two parts as it requires present and updated electric field values).
+            update_e_dispersive_A_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), np.int32(Material.maxpoles), G.updatecoeffsdispersive_gpu.gpudata, G.Tx_gpu.gpudata, G.Ty_gpu.gpudata, G.Tz_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, block=G.tpb, grid=G.bpg)
+
+        # Update electric field components with the PML correction
+        for pml in G.pmls:
+            pml.gpu_update_electric(G)
+
+        # Update electric field components for voltage sources
+        if G.voltagesources:
+            update_voltage_source_gpu(np.int32(len(G.voltagesources)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_voltage_gpu.gpudata, srcinfo2_voltage_gpu.gpudata, srcwaves_voltage_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.voltagesources)), 1, 1))
+
+        # Update electric field components for Hertzian dipole sources (update any Hertzian dipole sources last)
+        if G.hertziandipoles:
+            update_hertzian_dipole_gpu(np.int32(len(G.hertziandipoles)), np.int32(iteration), floattype(G.dx), floattype(G.dy), floattype(G.dz), srcinfo1_hertzian_gpu.gpudata, srcinfo2_hertzian_gpu.gpudata, srcwaves_hertzian_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=(1, 1, 1), grid=(round32(len(G.hertziandipoles)), 1, 1))
+
+        # If there are any dispersive materials do 2nd part of dispersive update (it is split into two parts as it requires present and updated electric field values). Therefore it can only be completely updated after the electric field has been updated by the PML and source updates.
+        if Material.maxpoles > 0:
+            update_e_dispersive_B_gpu(np.int32(G.nx), np.int32(G.ny), np.int32(G.nz), np.int32(Material.maxpoles), G.updatecoeffsdispersive_gpu.gpudata, G.Tx_gpu.gpudata, G.Ty_gpu.gpudata, G.Tz_gpu.gpudata, G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, block=G.tpb, grid=G.bpg)
+
+    # Copy output from receivers array back to correct receiver objects
+    gpu_get_rx_array(rxs_gpu.get(), rxcoords_gpu.get(), G)
+
+    iterend.record()
+    iterend.synchronize()
+    tsolve = iterstart.time_till(iterend) * 1e-3
+    
+    # Remove context from top of stack and delete
+    ctx.pop()
+    del ctx
+
+    return tsolve
--- a/gprMax/pml.py
+++ b/gprMax/pml.py
@@ -266,6 +266,63 @@ class PML(object):
        func = getattr(import_module('gprMax.pml_updates'), 'update_pml_' + str(len(self.CFS)) + 'order_magnetic_' + self.direction)
        func(self.xs, self.xf, self.ys, self.yf, self.zs, self.zf, G.nthreads, G.updatecoeffsH, G.ID, G.Ex, G.Ey, G.Ez, G.Hx, G.Hy, G.Hz, self.HPhi1, self.HPhi2, self.HRA, self.HRB, self.HRE, self.HRF, self.d)

+    def gpu_set_blocks_per_grid(self, G):
+        """Set the blocks per grid size used for updating the PML field arrays on a GPU.
+
+        Args:
+            G (class): Grid class instance - holds essential parameters describing the model.
+        """
+
+        self.bpg = (int(np.ceil(((self.EPhi1.shape[1] + 1) * (self.EPhi1.shape[2] + 1) * (self.EPhi1.shape[3] + 1)) / G.tpb[0])), 1, 1)
+
+    def gpu_initialise_arrays(self):
+        """Initialise PML field and coefficient arrays on GPU."""
+
+        import pycuda.gpuarray as gpuarray
+
+        self.EPhi1_gpu = gpuarray.to_gpu(self.EPhi1)
+        self.EPhi2_gpu = gpuarray.to_gpu(self.EPhi2)
+        self.ERA_gpu = gpuarray.to_gpu(self.ERA)
+        self.ERB_gpu = gpuarray.to_gpu(self.ERB)
+        self.ERE_gpu = gpuarray.to_gpu(self.ERE)
+        self.ERF_gpu = gpuarray.to_gpu(self.ERF)
+        self.HPhi1_gpu = gpuarray.to_gpu(self.HPhi1)
+        self.HPhi2_gpu = gpuarray.to_gpu(self.HPhi2)
+        self.HRA_gpu = gpuarray.to_gpu(self.HRA)
+        self.HRB_gpu = gpuarray.to_gpu(self.HRB)
+        self.HRE_gpu = gpuarray.to_gpu(self.HRE)
+        self.HRF_gpu = gpuarray.to_gpu(self.HRF)
+
+    def gpu_get_update_funcs(self, kernels):
+        """Get update functions from PML kernels.
+
+        Args:
+            kernels: PyCuda SourceModule containing PML kernels.
+        """
+
+        from pycuda.compiler import SourceModule
+
+        self.update_electric_gpu = kernels.get_function('update_pml_' + str(len(self.CFS)) + 'order_electric_' + self.direction)
+        self.update_magnetic_gpu = kernels.get_function('update_pml_' + str(len(self.CFS)) + 'order_magnetic_' + self.direction)
+
+    def gpu_update_electric(self, G):
+        """This functions updates electric field components with the PML correction on the GPU.
+
+        Args:
+            G (class): Grid class instance - holds essential parameters describing the model.
+        """
+
+        self.update_electric_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.EPhi1.shape[1]), np.int32(self.EPhi1.shape[2]), np.int32(self.EPhi1.shape[3]), np.int32(self.EPhi2.shape[1]), np.int32(self.EPhi2.shape[2]), np.int32(self.EPhi2.shape[3]), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.EPhi1_gpu.gpudata, self.EPhi2_gpu.gpudata, self.ERA_gpu.gpudata, self.ERB_gpu.gpudata, self.ERE_gpu.gpudata, self.ERF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
+
+    def gpu_update_magnetic(self, G):
+        """This functions updates magnetic field components with the PML correction on the GPU.
+
+        Args:
+            G (class): Grid class instance - holds essential parameters describing the model.
+        """
+
+        self.update_magnetic_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.HPhi1.shape[1]), np.int32(self.HPhi1.shape[2]), np.int32(self.HPhi1.shape[3]), np.int32(self.HPhi2.shape[1]), np.int32(self.HPhi2.shape[2]), np.int32(self.HPhi2.shape[3]), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.HPhi1_gpu.gpudata, self.HPhi2_gpu.gpudata, self.HRA_gpu.gpudata, self.HRB_gpu.gpudata, self.HRE_gpu.gpudata, self.HRF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
+

 def build_pmls(G, pbar):
    """
--- a/gprMax/pml_updates_gpu.py
+++ b/gprMax/pml_updates_gpu.py
--- a/gprMax/receivers.py
+++ b/gprMax/receivers.py
@@ -27,6 +27,7 @@ class Rx(object):
    """Receiver output points."""

    allowableoutputs = ['Ex', 'Ey', 'Ez', 'Hx', 'Hy', 'Hz', 'Ix', 'Iy', 'Iz']
+    gpu_allowableoutputs = allowableoutputs[:-3]
    defaultoutputs = allowableoutputs[:-3]

    def __init__(self):
@@ -39,3 +40,49 @@ class Rx(object):
        self.xcoordorigin = None
        self.ycoordorigin = None
        self.zcoordorigin = None
+
+
+def gpu_initialise_rx_arrays(G):
+    """Initialise arrays on GPU for receiver coordinates and to store field components for receivers.
+        
+    Args:
+        G (class): Grid class instance - holds essential parameters describing the model.
+    """
+
+    import pycuda.gpuarray as gpuarray
+
+    # Array to store receiver coordinates on GPU
+    rxcoords = np.zeros((len(G.rxs), 3), dtype=np.int32)
+    for i, rx in enumerate(G.rxs):
+        rxcoords[i, 0] = rx.xcoord
+        rxcoords[i, 1] = rx.ycoord
+        rxcoords[i, 2] = rx.zcoord
+
+    # Array to store field components for receivers on GPU - rows are field components; columns are iterations; pages are receivers
+    rxs = np.zeros((len(Rx.gpu_allowableoutputs), G.iterations, len(G.rxs)), dtype=floattype)
+
+    # Copy arrays to GPU
+    rxcoords_gpu = gpuarray.to_gpu(rxcoords)
+    rxs_gpu = gpuarray.to_gpu(rxs)
+
+    return rxcoords_gpu, rxs_gpu
+
+
+def gpu_get_rx_array(rxs_gpu, rxcoords_gpu, G):
+    """Copy output from receivers array used on GPU back to receiver objects.
+
+    Args:
+        rxs_gpu (int): numpy array of receiver data from GPU - rows are field components; columns are iterations; pages are receivers.
+        rxcoords_gpu (float): numpy array of receiver coordinates from GPU.
+        G (class): Grid class instance - holds essential parameters describing the model.
+    """
+
+    for rx in G.rxs:
+        for rxgpu in range(len(G.rxs)):
+            if rx.xcoord == rxcoords_gpu[rxgpu, 0] and rx.ycoord == rxcoords_gpu[rxgpu, 1] and rx.zcoord == rxcoords_gpu[rxgpu, 2]:
+                rx.outputs['Ex'] = rxs_gpu[0, :, rxgpu]
+                rx.outputs['Ey'] = rxs_gpu[1, :, rxgpu]
+                rx.outputs['Ez'] = rxs_gpu[2, :, rxgpu]
+                rx.outputs['Hx'] = rxs_gpu[3, :, rxgpu]
+                rx.outputs['Hy'] = rxs_gpu[4, :, rxgpu]
+                rx.outputs['Hz'] = rxs_gpu[5, :, rxgpu]
--- a/gprMax/source_updates_gpu.py
+++ b/gprMax/source_updates_gpu.py
@@ -0,0 +1,203 @@
+# Copyright (C) 2015-2017: The University of Edinburgh
+#                 Authors: Craig Warren and Antonis Giannopoulos
+#
+# This file is part of gprMax.
+#
+# gprMax is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# gprMax is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with gprMax.  If not, see <http://www.gnu.org/licenses/>.
+
+from string import Template
+
+kernels_template_sources = Template("""
+
+// Macros for converting subscripts to linear index:
+#define INDEX2D_MAT(m, n) (m)*($NY_MATCOEFFS)+(n)
+#define INDEX2D_SRCINFO(m, n) (m)*$NY_SRCINFO+(n)
+#define INDEX2D_SRCWAVES(m, n) (m)*($NY_SRCWAVES)+(n)
+#define INDEX3D_FIELDS(i, j, k) (i)*($NY_FIELDS)*($NZ_FIELDS)+(j)*($NZ_FIELDS)+(k)
+#define INDEX4D_ID(p, i, j, k) (p)*($NX_ID)*($NY_ID)*($NZ_ID)+(i)*($NY_ID)*($NZ_ID)+(j)*($NZ_ID)+(k)
+
+// Material coefficients (read-only) in constant memory (64KB)
+__device__ __constant__ $REAL updatecoeffsE[$N_updatecoeffsE];
+__device__ __constant__ $REAL updatecoeffsH[$N_updatecoeffsH];
+
+///////////////////////////////////////////
+// Hertzian dipole electric field update //
+///////////////////////////////////////////
+
+__global__ void update_hertzian_dipole(int NHERTZDIPOLE, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez) {
+
+    //  This function updates electric field values for Hertzian dipole sources.
+    //
+    //  Args:
+    //      NHERTZDIPOLE: Total number of Hertzian dipoles in the model
+    //      iteration: Iteration number of simulation
+    //      dx, dy, dz: Spatial discretisations
+    //      srcinfo1: Source cell coordinates and polarisation information
+    //      srcinfo2: Other source information, e.g. length, resistance etc...
+    //      srcwaveforms: Source waveform values
+    //      ID, E: Access to ID and field component arrays
+
+    // Obtain the linear index corresponding to the current thread and use for each receiver
+    int src = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (src < NHERTZDIPOLE) {
+    
+        $REAL dl;
+        int i, j, k, polarisation;
+
+        i = srcinfo1[INDEX2D_SRCINFO(src,0)];
+        j = srcinfo1[INDEX2D_SRCINFO(src,1)];
+        k = srcinfo1[INDEX2D_SRCINFO(src,2)];
+        polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
+        dl = srcinfo2[src];
+
+        // 'x' polarised source
+        if (polarisation == 0) {
+            int materialEx = ID[INDEX4D_ID(0,i,j,k)];
+            Ex[INDEX3D_FIELDS(i,j,k)] = Ex[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
+        }
+
+        // 'y' polarised source
+        else if (polarisation == 1) {
+            int materialEy = ID[INDEX4D_ID(1,i,j,k)];
+            Ey[INDEX3D_FIELDS(i,j,k)] = Ey[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
+        }
+
+        // 'z' polarised source
+        else if (polarisation == 2) {
+            int materialEz = ID[INDEX4D_ID(2,i,j,k)];
+            Ez[INDEX3D_FIELDS(i,j,k)] = Ez[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * dl * (1 / (dx * dy * dz));
+        }
+    }
+}
+
+
+///////////////////////////////////////////
+// Magnetic dipole magnetic field update //
+///////////////////////////////////////////
+
+__global__ void update_magnetic_dipole(int NMAGDIPOLE, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Hx, $REAL *Hy, $REAL *Hz) {
+
+    //  This function updates magnetic field values for magnetic dipole sources.
+    //
+    //  Args:
+    //      NMAGDIPOLE: Total number of magnetic dipoles in the model
+    //      iteration: Iteration number of simulation
+    //      dx, dy, dz: Spatial discretisations
+    //      srcinfo1: Source cell coordinates and polarisation information
+    //      srcinfo2: Other source information, e.g. length, resistance etc...
+    //      srcwaveforms: Source waveform values
+    //      ID, H: Access to ID and field component arrays
+
+    // Obtain the linear index corresponding to the current thread and use for each receiver
+    int src = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (src < NMAGDIPOLE) {
+    
+        int i, j, k, polarisation;
+
+        i = srcinfo1[INDEX2D_SRCINFO(src,0)];
+        j = srcinfo1[INDEX2D_SRCINFO(src,1)];
+        k = srcinfo1[INDEX2D_SRCINFO(src,2)];
+        polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
+
+        // 'x' polarised source
+        if (polarisation == 0) {
+            int materialHx = ID[INDEX4D_ID(3,i,j,k)];
+            Hx[INDEX3D_FIELDS(i,j,k)] = Hx[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
+        }
+
+        // 'y' polarised source
+        else if (polarisation == 1) {
+            int materialHy = ID[INDEX4D_ID(4,i,j,k)];
+            Hy[INDEX3D_FIELDS(i,j,k)] = Hy[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
+        }
+
+        // 'z' polarised source
+        else if (polarisation == 2) {
+            int materialHz = ID[INDEX4D_ID(5,i,j,k)];
+            Hz[INDEX3D_FIELDS(i,j,k)] = Hz[INDEX3D_FIELDS(i,j,k)] - updatecoeffsH[INDEX2D_MAT(materialHz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (dx * dy * dz));
+        }
+    }
+}
+
+
+//////////////////////////////////////////
+// Voltage source electric field update //
+//////////////////////////////////////////
+
+__global__ void update_voltage_source(int NVOLTSRC, int iteration, $REAL dx, $REAL dy, $REAL dz, const int* __restrict__ srcinfo1, const $REAL* __restrict__ srcinfo2, const $REAL* __restrict__ srcwaveforms, const unsigned int* __restrict__ ID, $REAL *Ex, $REAL *Ey, $REAL *Ez) {
+
+    //  This function updates electric field values for voltage sources.
+    //
+    //  Args:
+    //      NVOLTSRC: Total number of voltage sources in the model
+    //      iteration: Iteration number of simulation
+    //      dx, dy, dz: Spatial discretisations
+    //      srcinfo1: Source cell coordinates and polarisation information
+    //      srcinfo2: Other source information, e.g. length, resistance etc...
+    //      srcwaveforms: Source waveform values
+    //      ID, E: Access to ID and field component arrays
+
+    // Obtain the linear index corresponding to the current thread and use for each receiver
+    int src = blockIdx.x * blockDim.x + threadIdx.x;
+
+    if (src < NVOLTSRC) {
+    
+        $REAL resistance;
+        int i, j, k, polarisation;
+    
+        i = srcinfo1[INDEX2D_SRCINFO(src,0)];
+        j = srcinfo1[INDEX2D_SRCINFO(src,1)];
+        k = srcinfo1[INDEX2D_SRCINFO(src,2)];
+        polarisation = srcinfo1[INDEX2D_SRCINFO(src,3)];
+        resistance = srcinfo2[src];
+
+        // 'x' polarised source
+        if (polarisation == 0) {
+            if (resistance != 0) {
+                int materialEx = ID[INDEX4D_ID(0,i,j,k)];
+                Ex[INDEX3D_FIELDS(i,j,k)] = Ex[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEx,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dy * dz));
+            }
+            else {
+                Ex[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dx;
+            }
+        }
+
+        // 'y' polarised source
+        else if (polarisation == 1) {
+            if (resistance != 0) {
+                int materialEy = ID[INDEX4D_ID(1,i,j,k)];
+                Ey[INDEX3D_FIELDS(i,j,k)] = Ey[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEy,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dx * dz));
+            }
+            else {
+                Ey[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dy;
+            }
+        }
+
+        // 'z' polarised source
+        else if (polarisation == 2) {
+            if (resistance != 0) {
+                int materialEz = ID[INDEX4D_ID(2,i,j,k)];
+                Ez[INDEX3D_FIELDS(i,j,k)] = Ez[INDEX3D_FIELDS(i,j,k)] - updatecoeffsE[INDEX2D_MAT(materialEz,4)] * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] * (1 / (resistance * dx * dy));
+            }
+            else {
+                Ez[INDEX3D_FIELDS(i,j,k)] = -1 * srcwaveforms[INDEX2D_SRCWAVES(src,iteration)] / dz;
+            }
+        }
+    }
+}
+
+
+""")
--- a/gprMax/sources.py
+++ b/gprMax/sources.py
@@ -215,6 +215,52 @@ class MagneticDipole(Source):
                Hz[i, j, k] -= updatecoeffsH[ID[G.IDlookup[componentID], i, j, k], 4] * self.waveformvaluesM[iteration] * (1 / (G.dx * G.dy * G.dz))


+def gpu_initialise_src_arrays(sources, G):
+    """Initialise arrays on GPU for source coordinates/polarisation, other source information, and source waveform values.
+        
+    Args:
+        sources (list): List of sources of one class, e.g. HertzianDipoles.
+        G (class): Grid class instance - holds essential parameters describing the model.
+        
+    Returns:
+        srcinfo1_gpu (int): numpy array of source cell coordinates and polarisation information.
+        srcinfo2_gpu (float): numpy array of other source information, e.g. length, resistance etc...
+        srcwaves_gpu (float): numpy array of source waveform values.
+    """
+        
+    import pycuda.gpuarray as gpuarray
+
+    srcinfo1 = np.zeros((len(sources), 4), dtype=np.int32)
+    srcinfo2 = np.zeros((len(sources)), dtype=floattype)
+    srcwaves = np.zeros((len(sources), G.iterations), dtype=floattype)
+    for i, src in enumerate(sources):
+        srcinfo1[i, 0] = src.xcoord
+        srcinfo1[i, 1] = src.ycoord
+        srcinfo1[i, 2] = src.zcoord
+        
+        if src.polarisation == 'x':
+            srcinfo1[i, 3] = 0
+        elif src.polarisation == 'y':
+            srcinfo1[i, 3] = 1
+        elif src.polarisation == 'z':
+            srcinfo1[i, 3] = 2
+        
+        if src.__class__.__name__ == 'HertzianDipole':
+            srcinfo2[i] = src.dl
+            srcwaves[i, :] = src.waveformvaluesJ
+        elif src.__class__.__name__ == 'VoltageSource':
+            srcinfo2[i] = src.resistance
+            srcwaves[i, :] = src.waveformvaluesJ
+        elif src.__class__.__name__ == 'MagneticDipole':
+            srcwaves[i, :] = src.waveformvaluesM
+        
+    srcinfo1_gpu = gpuarray.to_gpu(srcinfo1)
+    srcinfo2_gpu = gpuarray.to_gpu(srcinfo2)
+    srcwaves_gpu = gpuarray.to_gpu(srcwaves)
+
+    return srcinfo1_gpu, srcinfo2_gpu, srcwaves_gpu
+
+
 class TransmissionLine(Source):
    """
    A transmission line source is a one-dimensional transmission
--- a/gprMax/utilities.py
+++ b/gprMax/utilities.py
@@ -32,7 +32,10 @@ from colorama import Style
 init()
 import numpy as np

+from gprMax.constants import complextype
 from gprMax.constants import floattype
+from gprMax.exceptions import GeneralError
+from gprMax.materials import Material


 def get_terminal_width():
@@ -130,6 +133,11 @@ def round_value(value, decimalplaces=0):
    return rounded


+def round32(value):
+    """Rounds up to nearest multiple of 32."""
+    return int(32 * np.ceil(float(value) / 32))
+
+
 def human_size(size, a_kilobyte_is_1024_bytes=False):
    """Convert a file size to human-readable form.

@@ -287,6 +295,64 @@ def get_host_info():
    return hostinfo


+class GPU(object):
+    """GPU information."""
+    
+    def __init__(self, deviceID):
+        """
+        Args:
+            deviceID (int): Device ID for GPU.
+        """
+        
+        self.deviceID = deviceID
+        self.name = None
+        self.pcibusID = None
+        self.constmem = None
+        self.totalmem = None
+
+    def get_gpu_info(self, drv):
+        """Set information about GPU.
+            
+        Args:
+            drv (object): PyCuda driver.
+        """
+        
+        self.name = drv.Device(self.deviceID).name()
+        self.pcibusID = drv.Device(self.deviceID).pci_bus_id()
+        self.constmem = drv.Device(self.deviceID).total_constant_memory
+        self.totalmem = drv.Device(self.deviceID).total_memory()
+
+
+def detect_gpus():
+    """Get information about Nvidia GPU(s).
+        
+    Returns:
+        gpus (list): Detected GPU(s) object(s).
+    """
+    
+    try:
+        import pycuda.driver as drv
+    except ImportError:
+        raise ImportError('To use gprMax in GPU mode the pycuda package must be installed, and you must have a NVIDIA CUDA-Enabled GPU (https://developer.nvidia.com/cuda-gpus).')
+    drv.init()
+
+    # Check if there are any CUDA-Enabled GPUs
+    if drv.Device.count() == 0:
+        raise GeneralError('No NVIDIA CUDA-Enabled GPUs detected (https://developer.nvidia.com/cuda-gpus)')
+
+    # Print information about all detected GPUs
+    gpus = []
+    gputext = []
+    for i in range(drv.Device.count()):
+        gpu = GPU(deviceID=i)
+        gpu.get_gpu_info(drv)
+        gpus.append(gpu)
+        gputext.append('{} - {}, {} RAM'.format(gpu.deviceID, gpu.name, human_size(gpu.totalmem, a_kilobyte_is_1024_bytes=True)))
+    print('GPU(s) detected: {}'.format('; '.join(gputext)))
+
+    return gpus
+
+
 def memory_usage(G):
    """Estimate the amount of memory (RAM) required to run a model.

@@ -307,6 +373,7 @@ def memory_usage(G):
    # 12 x rigidE array components + 6 x rigidH array components
    rigidarrays = (12 + 6) * G.nx * G.ny * G.nz * np.dtype(np.int8).itemsize

+    # PML arrays
    pmlarrays = 0
    for (k, v) in G.pmlthickness.items():
        if v > 0:
@@ -326,6 +393,11 @@ def memory_usage(G):
                pmlarrays += ((G.nx + 1) * G.ny * v)
                pmlarrays += (G.nx * (G.ny + 1) * v)

-    memestimate = int(stdoverhead + fieldarrays + solidarray + rigidarrays + pmlarrays)
+    # Any dispersive material coefficients
+    disparrays = 0
+    if Material.maxpoles != 0:
+        disparrays = 3 * Material.maxpoles * (G.nx + 1) * (G.ny + 1) * (G.nz + 1) * np.dtype(complextype).itemsize
+
+    memestimate = int(stdoverhead + fieldarrays + solidarray + rigidarrays + pmlarrays + disparrays)

    return memestimate
--- a/tests/models_basic/pmls/pmls.in
+++ b/tests/models_basic/pmls/pmls.in
@@ -0,0 +1,26 @@
+#title: PMLs
+#domain: 0.051 0.126 0.026
+#dx_dy_dz: 0.001 0.001 0.001
+#time_window: 2100
+
+#pml_cells: 0 0 0 0 0 10
+
+## Built-in 1st order PML
+pml_cfs: constant forward 0 0 constant forward 1 1 quartic forward 0 None
+
+## PMLs from http://dx.doi.org/10.1109/TAP.2011.2180344
+## Standard PML
+pml_cfs: constant forward 0 0 quartic forward 1 11 quartic forward 0 7.427
+
+## CFS PML
+pml_cfs: constant forward 0.05 0.05 quartic forward 1 7 quartic forward 0 11.671
+
+## O2 RIPML
+#pml_cfs: constant forward 0 0 constant forward 1 1 sextic forward 0 0.5836
+#pml_cfs: constant forward 0.05 0.05 cubic forward 1 8 quadratic forward 0 5.8357
+
+#waveform: gaussiandotnorm 1 9.42e9 mypulse
+#hertzian_dipole: z 0.013 0.013 0.015 mypulse
+#rx: 0.037 0.112 0.015
+
+#box: 0.013 0.013 0.013 0.038 0.113 0.014 pec
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_x0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_x0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_xmax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_xmax.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_y0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_y0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_ymax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_ymax.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_z0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_z0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_zmax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2CPU_zmax.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_x0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_x0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_xmax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_xmax.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_y0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_y0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_ymax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_ymax.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_z0.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_z0.png
--- a/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_zmax.png
+++ b/tests/models_basic/pmls/pmls_diffs_stdCPUvsO2GPU_zmax.png
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -44,7 +44,7 @@ basepath += 'basic'
 # basepath += 'advanced'

 # List of available basic test models
-testmodels = ['hertzian_dipole_fs_analytical', '2D_ExHyHz', '2D_EyHxHz', '2D_EzHxHy', 'cylinder_Ascan_2D', 'hertzian_dipole_fs', 'hertzian_dipole_hs', 'hertzian_dipole_dispersive', 'magnetic_dipole_fs']
+testmodels = ['hertzian_dipole_fs_analytical', '2D_ExHyHz', '2D_EyHxHz', '2D_EzHxHy', 'cylinder_Ascan_2D', 'hertzian_dipole_fs', 'hertzian_dipole_hs', 'hertzian_dipole_dispersive', 'magnetic_dipole_fs', 'pmls']

 # List of available advanced test models
 # testmodels = ['antenna_GSSI_1500_fs', 'antenna_MALA_1200_fs']
@@ -63,7 +63,7 @@ for i, model in enumerate(testmodels):
    testresults[model] = {}

    # Run model
-    api(os.path.join(basepath, model + os.path.sep + model + '.in'))
+    api(os.path.join(basepath, model + os.path.sep + model + '.in'), gpu=None)

    # Special case for analytical comparison
    if model == 'hertzian_dipole_fs_analytical':