Linpack benchmark on T3...

: ejolson

: Posts: 11; Joined: Wed Sep 14, 2016 3:28 am

Thu Sep 15, 2016 8:38 am

by ejolson » Thu Sep 15, 2016 8:38 am

I just compiled the High-Performance Linpack benchmark version 2.2 linked with OpenBLAS version 0.2.19. These are the current versions as of today. Results indicate that the NanoPi T3 is capable of 12.49 Gflops when solving systems of linear equations. This is the standard benchmark used to rank the Top 500 Supercomputers in the world. For reference, this makes the NanoPi T3 about 7.5 million times slower than the Sunway TaihuLight, which is currently the fastest computer in the world. Another point of reference is the Raspberry Pi 3B, which scores about 6.2 Gflops on the same benchmark.

Note that in order to repeatedly get the fastest timing it was necessary to take the cover off the T3 and cool it with a fan as shown below.

.

Other single-board computers require similar cooling. The output of the benchmark was

Code: Select all

$ ./xhpl 
================================================================================
HPLinpack 2.2  --  High-Performance Linpack benchmark  --   February 24, 2016
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    8000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   2ring 
DEPTH  :       0 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR02R2L2        8000   256     1     1              27.33              1.249e+01
HPL_pdgesv() start time Thu Sep 15 16:07:17 2016

HPL_pdgesv() end time   Thu Sep 15 16:07:44 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

Linpack benchmark on T3...

Linpack benchmark on T3...

Who is online