Merge remote-tracking branch 'origin/master'

2025-08-07 04:56:51 +08:00 · 2016-02-25 09:40:08 +00:00
--- a/docs/source/benchmarking.rst
+++ b/docs/source/benchmarking.rst
@@ -9,29 +9,51 @@ This section provides information and results from performance benchmarking of g
 How to benchmark?
 =================

-The following simple model is an example (found in the ``tests/benchmarking`` sub-package) that can be used to benchmark gprMax on your own system. The model contains a simple source in free space.
+The following simple models (found in the ``tests/benchmarking`` sub-package) can be used to benchmark gprMax on your own system. The models feature different domain sizes and contain a simple source in free space.
+
+:download:`bench_100x100x100.in <../../tests/benchmarking/bench_100x100x100.in>`

 .. literalinclude:: ../../tests/benchmarking/bench_100x100x100.in
    :language: none
    :linenos:

-The ``#num_threads`` command should be adjusted from 1 up to the number of physical CPU cores on your machine, the model run, and the solving time recorded.
+:download:`bench_150x150x150.in <../../tests/benchmarking/bench_150x150x150.in>`

+.. literalinclude:: ../../tests/benchmarking/bench_150x150x150.in
+    :language: none
+    :linenos:
+
+The ``#num_threads`` command can be adjusted to benchmark running the code with different numbers of OpenMP threads.
+
+Using the following steps to collect and report benchmarking results:
+
+1. Run each model with different ``#num_threads`` values - from 1 thread up to the number of physical CPU cores on your machine.
+2. Note the ``Solving took ..`` time reported by the simulation for each model run.
+3. Use the ``save_results.py`` script to enter and save your results in a Numpy archive. You will need to enter some machine identification information in the script.
+4. Use the ``plot_time_speedup.py`` script to create plots of the execution time and speed-up.
+5. Commit the Numpy archive and plot file using Git

 Results
 =======

+Zero threads indicates that the code was compiled serially, i.e. without using OpenMP.
+
 Mac OS X
 --------

-iMac (Retina 5K, 27-inch, Late 2014), Mac OS X 10.11.3
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+iMac15,1
+^^^^^^^^

-.. figure:: ../../tests/benchmarking/results/MacOSX/Darwin-15.3.0-x86_64-i386-64bit.png
+.. figure:: ../../tests/benchmarking/results/MacOSX/iMac15,1+Ccode.png
    :width: 600px

-    Execution time and speed-up factor plots for gprMax (v3b21) and GprMax (v2).
+    Execution time and speed-up factor plots for Python/Cython-based gprMax and previous version C-based code.

-The results demonstrate that the new (v3) code written in Python and Cython is faster, in these two benchmarks, that the old (v2) code which was written in C. It also shows that the performance scaling with multiple OpenMP threads is better with the old (v2) code.
+The results demonstrate that the Python/Cython-based code is faster, in these two benchmarks, than the previous version which was written in C. It also shows that the performance scaling with multiple OpenMP threads is better with the C-based code. Results from the C-based code show that when it is compiled serially the performance is approximately the same as when it is compiled with OpenMP and run with a single thread. With the Python/Cython-based code this is not the case. The overhead in setting up and tearing down the OpenMP threads means that for a single thread the performance is worse than the serially-compiled version.

-Zero threads signifies that the code was compiled serially, i.e. without using OpenMP. Results from the old (v2) code show that when it is compiled serially the performance is approximately the same as when it is compiled with OpenMP and run with a single thread. With the new (v3) code this is not the case. The overhead in setting up and tearing down the OpenMP threads means that for a single thread the performance is worse than the serially-compiled version.
+
+MacPro1,1
+^^^^^^^^^
+
+.. figure:: ../../tests/benchmarking/results/MacOSX/MacPro1,1.png
+    :width: 600px
--- a/tests/benchmarking/init.py
+++ b/tests/benchmarking/init.py
--- a/tests/benchmarking/plot_time_speedup.py
+++ b/tests/benchmarking/plot_time_speedup.py
@@ -3,37 +3,31 @@ import numpy as np
 import matplotlib.pyplot as plt
 import matplotlib.gridspec as gridspec

+from gprMax._version import __version__
+
 moduledirectory = os.path.dirname(os.path.abspath(__file__))

 # Machine identifier
 platformID = platform.platform()
-platformlongID = 'iMac (Retina 5K, 27-inch, Late 2014); 4GHz Intel Core i7; Mac OS X 10.11.3'
+#machineID = 'MacPro1,1'
+#machineIDlong = machineID + ' (2006); 2 x 2.66 GHz Quad-Core Intel Xeon; Mac OS X 10.11.3'
+machineID = 'iMac15,1'
+machineIDlong = machineID + ' (Retina 5K, 27-inch, Late 2014); 4GHz Intel Core i7; Mac OS X 10.11.3'

-# Nmber of physical CPU cores on machine
-phycores = psutil.cpu_count(logical=False)
-
-# Number of threads (0 signifies serial compiled code)
-threads = np.array([0, 1, 2, 4])
-
-# 100 x 100 x 100 cell model execution times (seconds)
-bench1 = np.array([40, 48, 37, 32])
-bench1c = np.array([76, 77, 46, 32])
-
-# 150 x 150 x 150 cell model execution times (seconds)
-bench2 = np.array([108, 133, 93, 75])
-bench2c = np.array([220, 220, 132, 94])
+# Load results
+results = np.load(os.path.join(moduledirectory, machineID + '.npz'))

 # Plot colours from http://tools.medialab.sciences-po.fr/iwanthue/index.php
 colors = ['#5CB7C6', '#E60D30', '#A21797', '#A3B347']

-fig, ax = plt.subplots(num=platformID, figsize=(20, 10), facecolor='w', edgecolor='w')
-fig.suptitle(platformlongID)
+fig, ax = plt.subplots(num=machineIDlong, figsize=(20, 10), facecolor='w', edgecolor='w')
+fig.suptitle(machineIDlong)
 gs = gridspec.GridSpec(1, 2, hspace=0.5)
 ax = plt.subplot(gs[0, 0])
-ax.plot(threads, bench1, color=colors[1], marker='.', ms=10, lw=2, label='1e6 cells (gprMax v3b21)')
-ax.plot(threads, bench1c, color=colors[0], marker='.', ms=10, lw=2, label='1e6 cells (gprMax v2)')
-ax.plot(threads, bench2, color=colors[1], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (gprMax v3b21)')
-ax.plot(threads, bench2c, color=colors[0], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (gprMax v2)')
+ax.plot(results['threads'], results['bench1'], color=colors[1], marker='.', ms=10, lw=2, label='1e6 cells (v' + __version__ + ')')
+ax.plot(results['threads'], results['bench1c'], color=colors[0], marker='.', ms=10, lw=2, label='1e6 cells (v2)')
+ax.plot(results['threads'], results['bench2'], color=colors[1], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (v' + __version__ + ')')
+ax.plot(results['threads'], results['bench2c'], color=colors[0], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (v2)')

 ax.set_xlabel('Number of threads')
 ax.set_ylabel('Time [s]')
@@ -43,15 +37,15 @@ legend = ax.legend(loc=1)
 frame = legend.get_frame()
 frame.set_edgecolor('white')

-ax.set_xlim([0, phycores])
-ax.set_xticks(threads)
-ax.set_ylim(top=ax.get_ylim()[1] * 1.1)
+ax.set_xlim([0, results['threads'][-1] * 1.1])
+ax.set_xticks(results['threads'])
+ax.set_ylim(0, top=ax.get_ylim()[1] * 1.1)

 ax = plt.subplot(gs[0, 1])
-ax.plot(threads, bench1[1] / bench1, color=colors[1], marker='.', ms=10, lw=2, label='1e6 cells (gprMax v3b21)')
-ax.plot(threads, bench1c[1] / bench1c, color=colors[0], marker='.', ms=10, lw=2, label='1e6 cells (gprMax v2)')
-ax.plot(threads, bench2[1] / bench2, color=colors[1], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (gprMax v3b21)')
-ax.plot(threads, bench2c[1] / bench2c, color=colors[0], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (gprMax v2)')
+ax.plot(results['threads'], results['bench1'][1] / results['bench1'], color=colors[1], marker='.', ms=10, lw=2, label='1e6 cells (v' + __version__ + ')')
+ax.plot(results['threads'], results['bench1c'][1] / results['bench1c'], color=colors[0], marker='.', ms=10, lw=2, label='1e6 cells (v2)')
+ax.plot(results['threads'], results['bench2'][1] / results['bench2'], color=colors[1], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (v' + __version__ + ')')
+ax.plot(results['threads'], results['bench2c'][1] / results['bench2c'], color=colors[0], marker='.', ms=10, lw=2, ls='--', label='3.375e6 cells (v2)')

 ax.set_xlabel('Number of threads')
 ax.set_ylabel('Speed-up factor')
@@ -61,12 +55,12 @@ legend = ax.legend(loc=1)
 frame = legend.get_frame()
 frame.set_edgecolor('white')

-ax.set_xlim([0, phycores])
-ax.set_xticks(threads)
-ax.set_ylim(top=ax.get_ylim()[1] * 1.1)
+ax.set_xlim([0, results['threads'][-1] * 1.1])
+ax.set_xticks(results['threads'])
+ax.set_ylim(bottom=1, top=ax.get_ylim()[1] * 1.1)

 # Save a pdf of the plot
-fig.savefig(os.path.join(moduledirectory, platformID + '.png'), dpi=150, format='png', bbox_inches='tight', pad_inches=0.1)
+fig.savefig(os.path.join(moduledirectory, machineID + '.png'), dpi=150, format='png', bbox_inches='tight', pad_inches=0.1)

 plt.show()

--- a/tests/benchmarking/results/MacOSX/Darwin-15.3.0-x86_64-i386-64bit.png
+++ b/tests/benchmarking/results/MacOSX/Darwin-15.3.0-x86_64-i386-64bit.png
--- a/tests/benchmarking/results/MacOSX/MacPro1,1.npz
+++ b/tests/benchmarking/results/MacOSX/MacPro1,1.npz
--- a/tests/benchmarking/results/MacOSX/MacPro1,1.png
+++ b/tests/benchmarking/results/MacOSX/MacPro1,1.png
--- a/tests/benchmarking/results/MacOSX/iMac15,1+Ccode.png
+++ b/tests/benchmarking/results/MacOSX/iMac15,1+Ccode.png
--- a/tests/benchmarking/results/MacOSX/iMac15,1.npz
+++ b/tests/benchmarking/results/MacOSX/iMac15,1.npz
--- a/tests/benchmarking/results/MacOSX/iMac15,1.png
+++ b/tests/benchmarking/results/MacOSX/iMac15,1.png
--- a/tests/benchmarking/save_results.py
+++ b/tests/benchmarking/save_results.py
@@ -0,0 +1,25 @@
+import os, platform
+import numpy as np
+
+moduledirectory = os.path.dirname(os.path.abspath(__file__))
+
+# Machine identifier
+platformID = platform.platform()
+machineID = 'MacPro1,1'
+machineIDlong = machineID + ' (2006); 2 x 2.66 GHz Quad-Core Intel Xeon; Mac OS X 10.11.3'
+#machineID = 'iMac15,1'
+#machineIDlong = machineID + ' (Retina 5K, 27-inch, Late 2014); 4GHz Intel Core i7; Mac OS X 10.11.3'
+
+# Number of threads (0 signifies serial compiled code)
+threads = np.array([1, 2, 4, 8])
+
+# 100 x 100 x 100 cell model execution times (seconds)
+bench1 = np.array([149, 115, 100, 107])
+
+# 150 x 150 x 150 cell model execution times (seconds)
+bench2 = np.array([393, 289, 243, 235])
+
+# Save to file
+np.savez(os.path.join(moduledirectory, machineID), threads=threads, bench1=bench1, bench2=bench2)
+
+