My Fortran Compiler Tests:

I've been doing Fortran compiler tests since 2010. I use PROP_DESIGN_MAPS for this purpose. I chose this version of PROP_DESIGN because it's compute intensive, without causing excessive wait times. In reality, any version of PROP_DESIGN could be used.

 

The main reason I perform these tests is to optimize the performance of PROP_DESIGN across as many CPUs/APUs as possible. You can also use my 'Fortran Compiler Tests' download to find the best Fortran compiler and/or the best compiler options. You may want to edit the included batch files (*.bat), to optimize them for your CPU/APU.

 

The current version of my 'Fortran Compiler Tests' download pits Intel Fortran against gfortran and flang. Intel Fortran has always produced the fastest executable files, in my testing. Currently, I'm seeing runtimes that are greater than 3x faster than gfortran and flang.

 

I created a slightly modified version of PROP_DESIGN_MAPS, to aid benchmarking. This program is called PROP_DESIGN_MAPS_BENCHMARK. Since modern CPUs/APUs don't run at constant speeds, there is a lot of run-to-run variability. PROP_DESIGN_MAPS_BENCHMARK runs PROP_DESIGN_MAPS five times in a row and reports the average runtime (in seconds). This makes it easier to determine how effective any given compiler or compiler option is. The new benchmarking program is included in the download below.

 

Note; MSYS2 was used to download/install gfortran and flang.

 

Last Updated On; 04/26/24

I've tried keeping a changelog several times. However, due to the vast amount of updates, they quickly became way too long. Instead, I keep a list of the most recent updates below:

 

04/26/24:

  • Increased the significant digits shown for the performance metrics
  • Code refactoring

04/24/24:

  • Updated the spreadsheet, to indicate that I tested the latest versions of gfortran and flang. I didn't observe any improvement in runtimes, so I left the previous results in place
  • Added some additional info to the pacman text file

04/07/24 to 04/``22/24:

  • Updated MAPS_BENCHMARK, to keep it consistent with updates to MAPS

04/04/24:

  • Recently, I activated text wrapping, in all 'Command Prompt' and 'Intel Fortran' windows. This allowed you to more easily see long compiler strings. Unfortunately, it also messes up the screen output for certain PROP_DESIGN codes. Thus, I deactivated text wrapping

03/29/24 Update 2:

  • Made some minor improvement to the results spreadsheet
  • Re-ran some of the ifx tests
  • The 'Command Prompt' and 'Intel Fortran' windows are now set to wrap text

03/29/24 Update 1:

  • Added statistical analysis for all test cases. This was done in a spreadsheet, using output from the benchmarking program
  • Updated the output of the benchmarking program, to make it easy to do statistical analysis
  • Added compiler options for ifort and ifx. These were recommended by an Intel employee. They helped to improve the speed of ifx generated code. It's still a bit slower than ifort generated code. However, they did bring the two compilers closer together

03/28/24:

  • Added a test case for the ifx compiler. There are now equivalent tests for the ifort and ifx compilers

03/27/24:

  • Updated the 'Auto-Parallelization Notes.txt' file, so that it matches the benchmarking programs line numbers. It was originally referencing MAPS line numbers
  • Fixed another typo in the spreadsheet

03/26/24 Update 3:

  • Added an additional test case for the ifort and ifx compilers

03/26/24 Update 2:

  • I thought of a better way to calculate total cpu runtime, for the benchmarking program. I introduced this output yesterday
  • Fixed a typo in the spreadsheet

03/26/24 Update 1:

  • Added the CPU model, that I used for testing, to the spreadsheet
  • Re-ran some of the Intel Fortran compiler tests
  • Improved the spreadsheet, so that it automatically formats certain cells (based on their values)
  • Improved the spreadsheet, so that the issues with the Intel Fortran compiler are easier for others to see

03/25/24 Update 2:

  • I added more test cases for the ifx compiler, since there is a regression showing up. I ran all the same tests as the ifort compiler. This gives a little more insight into the issue. It appears that the /O1 and /O3 level optimizations are not performing as well, when comparing ifx to ifort
  • Another interesting oddity is, unoptimized ifort and ifx code outperforms fully optimized gfortran and flang code
  • Also, unoptimized ifort and ifx code performs about the same. So the regression is purely with the optimization levels. In fact, ifx /O2 level optimizations outperform ifort /O2

03/25/24 Update 1:

  • Updated the compiler comparison, to include the latest version of flang. There was no significant change in performance
  • Added the compiler version numbers, that were tested, to the spreadsheet
  • Added an additional ifx compiler test
  • Added total cpu runtime, in minutes, to the output of the Fortran test code

03/18/24 Update 3:

  • I decided to go ahead and change the default compiler options used in the c.bat files. The affect of auto-vectorization and fuse multiply add are both too small for me to accurately capture. That means there is no harm in removing those options. Removing fuse multiply add is particularly important, since a lot of processors do not support that feature
  • I updated the Intel test cases and spreadsheet

03/18/24 Update 2:

  • Added more test cases for the Intel Fortran compiler
  • Even with the added averaging, the run-to-run variability is still larger than some of the compiler option affects. Increasing the averaging should help, however, it would take way too long to run all the tests. So, I don't want to do that. Also, there comes a point where there will always be an issue, because the OS interrupts things as well
  • Auto-vectorization and fma seem to be within the run-to-run variability. I currently have them both on. However, you may find that turning them off is slightly better, for your particular processor. I may turn them off in the future. This would allow more processors to run the codes. That has always been my primary objective. Performance comes in second

03/18/24 Update 1:

  • Added a slightly modified version of PROP_DESIGN_MAPS, to deal with run-to-run variability. The new program is called PROP_DESIGN_MAPS_BENCHMARK. It runs PROP_DESIGN_MAPS five times in a row and reports the average runtime (in seconds)
  • Updated the batch files
  • The spreadsheet has been updated. It now includes the results from PROP_DESIGN_MAPS_BENCHMARK
  • Updated the README file
  • Based on what I saw today, I made a slight change to the default compiler options used in all c.bat files. The change should cause a slight speedup

03/17/24:

  • Re-ran all benchmarks. This was to compare with the latest version of flang. Several compiler options, that worked with the previous version of flang, no longer work. They were removed from the batch files. Moreover, -mconsole -mthreads -static were removed. -static is the most important of these. Without, -static I couldn't distribute the codes. -static often gets broken for gcc and flang. It will most likely work again, at a later date
  • There was no significant change in compiler performance. This is what I usually observe. As always, Intel Fortran is the best compiler

02/24/24:

  • Minor improvements to the documentation

01/22/24:

  • Fixed a display glitch with one of the plots in the spreadsheet

01/09/24 Update 2:

  • Updated the 'README' LibreOffice Writer document, so that it has the same page style as the other PROP_DESIGN LibreOffice Writer documents

01/09/24 Update 1:

  • Updated the test results, since the flang compiler was updated. No significant differences were observed
  • Fixed a small bug. I had renamed all of the batch files, awhile ago. However, one of them didn't get the intended update. That is fixed now
  • Added auto-parallelization notes

01/05/24:

  • I removed PROP_DESIGN_MAPS from this download. This makes the download smaller and reduces redundancy. The download contains all the information and files needed, to run Fortran compiler tests for yourself. You simply copy the additional batch files to the PROP_DESIGN 'MAPS' folder
  • Reorganized the information in this download
  • Updated the README file
  • Provided *.pdf files, in case you don't have LibreOffice installed

12/12/23 Update 2:

  • There was a batch file entitled 'i6b.bat' that I used as a double check. However, I hadn't documented it. That's fixed now

12/12/23 Update 1:

  • Rerelease of my 'Fortran Compiler Tests' download. I haven't had it posted for many years

Polyhedron Fortran Benchmarks:

Around 2010, I was interested in using PROP_DESIGN for Fortran benchmarking and CPU burn in. I wrote a special version of PROP_DESIGN_MAPS called MP_PROP_DESIGN. I provided this code to Polyhedron to use in their benchmarks. When the auto-parallelization feature came out, they started showing MP_PROP_DESIGN benefiting greatly from this feature. I was never able to duplicate their results, using Windows 10 and an AMD APU. I tried to contact them to determine how they arrived at their results. They declined to answer. I also tried to get them to use PROP_DESIGN_MAPS and they also declined.

 

Since auto-parallelization has problems with loop nests, I deleted MP_PROP_DESIGN and reverted back to just using PROP_DESIGN_MAPS. Recently, I created a program called PROP_DESIGN_MAPS_BENCHMARK. It runs PROP_DESIGN_MAPS five times in a row and reports the average runtime (in seconds). This was done to try and deal with run-to-run variability. It's a tradeoff to do this. It makes it even harder for auto-parallelization but makes it easier to benchmark. Since, auto-parallelization is a useless feature, I think it's better to reduce run-to-run variability. Any version, of MP_PROP_DESIGN that you find, is extremely out of date. Over the years, I've made substantial improvements to the underlying propeller design theory.

 

In my own testing, I have never seen a run time improvement, using auto-parallelization. I've tested auto-parallelization with gfortran and Intel Fortran. Unfortunately, many websites are using the Polyhedron results to 'sell' the performance benefits of various compilers. I would ignore any MP_PROP_DESIGN results, in this instance. Moreover, confirm the compiler performance, using the current version of PROP_DESIGN_MAPS or PROP_DESIGN_MAPS_BENCHMARK. This whole situation is very disappointing. I feel the Polyhedron results are misleading a lot of people, since they are referenced often.

 

The only way to get PROP_DESIGN to be a proper multi-threaded application would be to rewrite it. I'm not able to do it myself. I learned programming a long time ago. Well before multi-core even existed. So everything I write is serial Fortran 77 code. There are areas that could benefit from parallelization. However, the auto-parallelization feature is not able to exploit them. The codes already run very fast, using just one core. So it's not really worth your time to rewrite them.