Power design error?
Posted: Fri Mar 17, 2017 2:40 pm
Gents,
I was looking at the NanoPi M3 as an interesting module for a cluster for computation of large networks daily because of:
- cost
- 8 cores
- Gigabyte network
So I got hold on 6 of them and I started to test them for performance. Here are the parameters for testing:
- the M3 uses the dedicated heatsink; in addition I have used some high performance GELID thermal pad on the CPU and kept the original (trimmed) pad for the rest of the surface; I'll preempt some remarks that I will have later, but in this configuration I have not seen the temperature going over 65 degrees (but you will see there are other issues)
- I power the board through the GPIO pins using AWG20 cables to a bench power supply that can provide 30A
So until now there should be no problem with the power.
In terms of software testing I have installed OpenBLAS and let it build with the defaults (it will build armv7l - since we don't have yet an armv8 kernel..., with 8 threads). I've done the update-alternatives and then started to run the standard benchmarks from OpenBLAS with numpy.
So here things started to get interesting.
When testing single precision matrix multiplication (sgemm.py) I have no problems and I can easily go to 5000 x 5000 matrices. The average processing is 16-17 Gflops (reported by the benchmark) and the board doesn't even get a sweat: it barely goes to 62 degrees.
The issues start with double precision matrix multiplication(dgemm.py). If you run it straight as it comes, the board quickly hangs. Obviously I though initially that there is a software problem but I then started to investigate.
First I started reducing the number of cores that OpenBLAS uses (OPENBLAS_NUM_THREADS=x). At 1.4GHz I can safely run for long time the double precision benchmark with 6 cores. That benchmark reports aprox. 12 Gflops and the temperature stays bellow 75 degrees - there is no throthling. Running more than 6 threads at 1.4 GHz quickly hangs the board.
The other option is to reduce the frequency but keep the 8 threads. With this I have managed to run safely up to 1GHz. At 1.1 GHz the processing is unpredictable - it might work for a while and then suddenly hang. In all cases the temperature is not the problem as well as the power supply.
The conclusion that I have is that the power supply provided by the AXP228 is insufficient for the 8 core CPU. Since the only diagrams of the board are for version 1604 (mine says 1605) I only can assume they are the same and I can notice there that you only use DC3 for the powering of the cores, while DC2 stays unused. The data sheet indicate that the DC2 and DC3 can provide up to 2.5A, so in this case you are providing max 2.5A to the Samsung cores. If that design works for the 4 core boards (ex. M1, Tx, etc.) I'm afraid that for M3 the CPU is massively underpowered.
At rest the overall board consumption is 0.45A (input on from the bench power supply). The maximum consumption before the board hangs is 2.32A. Of course that not all the power goes to the cores but most of it does. Let's do a little math: the difference in power consumption is 5V x (2.32A - 0.45A) = 9.35W. Let's say half of this goes to the CPU and then we have aprox 15% loss in the DC-DC switching regulator => 3.97W. Since the core is powered at 1V (1.25 max according to the scalling voltages) it's obvious that the current will be significantly over 2.5A.
I think you need to revisit the design for the board and provide an additional core power from the DC2 output of the AXP228, otherwise the full power of the chip will never be usable.
I was looking at the NanoPi M3 as an interesting module for a cluster for computation of large networks daily because of:
- cost
- 8 cores
- Gigabyte network
So I got hold on 6 of them and I started to test them for performance. Here are the parameters for testing:
- the M3 uses the dedicated heatsink; in addition I have used some high performance GELID thermal pad on the CPU and kept the original (trimmed) pad for the rest of the surface; I'll preempt some remarks that I will have later, but in this configuration I have not seen the temperature going over 65 degrees (but you will see there are other issues)
- I power the board through the GPIO pins using AWG20 cables to a bench power supply that can provide 30A
So until now there should be no problem with the power.
In terms of software testing I have installed OpenBLAS and let it build with the defaults (it will build armv7l - since we don't have yet an armv8 kernel..., with 8 threads). I've done the update-alternatives and then started to run the standard benchmarks from OpenBLAS with numpy.
So here things started to get interesting.
When testing single precision matrix multiplication (sgemm.py) I have no problems and I can easily go to 5000 x 5000 matrices. The average processing is 16-17 Gflops (reported by the benchmark) and the board doesn't even get a sweat: it barely goes to 62 degrees.
The issues start with double precision matrix multiplication(dgemm.py). If you run it straight as it comes, the board quickly hangs. Obviously I though initially that there is a software problem but I then started to investigate.
First I started reducing the number of cores that OpenBLAS uses (OPENBLAS_NUM_THREADS=x). At 1.4GHz I can safely run for long time the double precision benchmark with 6 cores. That benchmark reports aprox. 12 Gflops and the temperature stays bellow 75 degrees - there is no throthling. Running more than 6 threads at 1.4 GHz quickly hangs the board.
The other option is to reduce the frequency but keep the 8 threads. With this I have managed to run safely up to 1GHz. At 1.1 GHz the processing is unpredictable - it might work for a while and then suddenly hang. In all cases the temperature is not the problem as well as the power supply.
The conclusion that I have is that the power supply provided by the AXP228 is insufficient for the 8 core CPU. Since the only diagrams of the board are for version 1604 (mine says 1605) I only can assume they are the same and I can notice there that you only use DC3 for the powering of the cores, while DC2 stays unused. The data sheet indicate that the DC2 and DC3 can provide up to 2.5A, so in this case you are providing max 2.5A to the Samsung cores. If that design works for the 4 core boards (ex. M1, Tx, etc.) I'm afraid that for M3 the CPU is massively underpowered.
At rest the overall board consumption is 0.45A (input on from the bench power supply). The maximum consumption before the board hangs is 2.32A. Of course that not all the power goes to the cores but most of it does. Let's do a little math: the difference in power consumption is 5V x (2.32A - 0.45A) = 9.35W. Let's say half of this goes to the CPU and then we have aprox 15% loss in the DC-DC switching regulator => 3.97W. Since the core is powered at 1V (1.25 max according to the scalling voltages) it's obvious that the current will be significantly over 2.5A.
I think you need to revisit the design for the board and provide an additional core power from the DC2 output of the AXP228, otherwise the full power of the chip will never be usable.