FriendlyELEC-Forum

Posted: **Sat Aug 27, 2016 1:12 pm**

I learned a lot on Arm v6 cycle count register with Raspberry Pi Zero in this thread:
https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=155830

Since Nanopi Neo and M1 are armv7 and quad core there are some differences in how to measure time very precisely.

In postings in this forum I described how to compile loadable kernel modules for Neo/M1, here is the last:
http://www.friendlyarm.com/Forum/viewtopic.php?f=47&t=240&p=810#p810

I found the complete code for cycle counting on armv7 in this stackoverflow posting:
http://stackoverflow.com/a/31649809

This is the corresponding armv7 spec section:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464f/BIIBDHAF.html

"enable_ccnt_read()" does two things in that code, enables (CPU clock) cycle counter and enables user land code access to cycle counter (by default cycle counter can only be accessed from kernel space). The provided user space program just reads the cycle counter. The important difference to Pi Zero code is not the different assembler commands used for armv7 instead of armv6 (Pi Zero). The difference is that all enablings need to be done on each of the 4 CPU cores [by "on_each_cpu(enable_ccnt_read, NULL, 1)"] because you don't know on which CPU your program will run on normally.

Fur ultra precise time measurements with cycle counter register I disabled 3 CPUs just to be sure where the action is:

Code: Select all

root@FriendlyARM:~# echo 0 > /sys/devices/system/cpu/cpu2/online 
root@FriendlyARM:~# echo 0 > /sys/devices/system/cpu/cpu3/online 
root@FriendlyARM:~# echo 0 > /sys/devices/system/cpu/cpu1/online 
root@FriendlyARM:~#

This is the corresponding "dmesg" output:

Code: Select all

[  610.674308] CPU2: shutdown
[  610.674352] [hotplug]: cpu(3) try to kill cpu(2)
[  610.675445] [hotplug]: cpu2 is killed! .
[  613.290250] CPU3: shutdown
[  613.290287] [hotplug]: cpu(0) try to kill cpu(3)
[  613.291379] [hotplug]: cpu3 is killed! .
[  615.497689] CPU1: shutdown
[  615.497729] [hotplug]: cpu(0) try to kill cpu(1)
[  615.498826] [hotplug]: cpu1 is killed! .

Next I fixed cpu0 frequency to the minimum of 480MHz for initial investigation:

Code: Select all

root@FriendlyARM:~# echo 480000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 
root@FriendlyARM:~# echo 480000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 
root@FriendlyARM:~# cpu_freq 
CPU0 online=1 temp=46 governor=interactive cur_freq=480000
DDR governor=userspace cur_freq=432000 max=432000 min=408000
root@FriendlyARM:~#

Then I used below described kernel space program for two measurements, then did set min and max cpu0 frequency to 1.2GHz and did two measurements again:

Code: Select all

root@FriendlyARM:~/ccnt-2# insmod ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# rmmod -f ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# insmod ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# rmmod -f ccnt-2.ko
root@FriendlyARM:~/ccnt-2# echo 1200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
root@FriendlyARM:~/ccnt-2# echo 1200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
root@FriendlyARM:~/ccnt-2# cpu_freq 
CPU0 online=1 temp=47 governor=interactive cur_freq=1200000
DDR governor=userspace cur_freq=432000 max=432000 min=408000
root@FriendlyARM:~/ccnt-2# insmod ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# rmmod -f ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# insmod ccnt-2.ko 
root@FriendlyARM:~/ccnt-2# rmmod -f ccnt-2.ko 
root@FriendlyARM:~/ccnt-2#

This is what I was after, the corresponding "dmesg" output:

Code: Select all

[  705.247173] 130 135 5
[  705.247202] 135 481201 481066
[  705.247219] 481201 481206 5
[  717.199641] Disabling lock debugging due to kernel taint
[  720.169848] 967406470 967406475 5
[  720.169876] 967406475 967887431 480956
[  720.169894] 967887431 967887436 5
[  807.820957] 2970431661 2970431666 5
[  807.820972] 2970431666 2971632605 1200939
[  807.820979] 2971632605 2971632610 5
[  811.965180] 3322501780 3322501785 5
[  811.965197] 3322501785 3323702717 1200932
[  811.965205] 3323702717 3323702722 5

So what can we see from this?
First the overhead of doing cycle counter measurements, it is always 5 clock ticks, regardless on whether CPU runs at 480MHz or at 1.2GHz,
Second we see roughly 480000 reported as difference for "usleep(1000)" or 1ms, which nearly perfectly fits 480MHz.
Finally we see roughly 1.2 million as difference for 1ms usleep() which matches 1.2GHz CPU frequency.

So these measurements confirm cycle counter readings are depending on CPU frequency and are really precise, and that the overhead for reading cycle counter registers (the difference of two consecutive register readings) is 5 which is less than the overhead of 8 clock ticks for armv6 Pi Zero, and even less considering that a clock tick on Pi Zero is 1ns while it is 0.83ns(!) in case you set minimal CPU frequency to 1200000.

Last, but not least, the code:

Code: Select all

root@FriendlyARM:~/ccnt-2# cat Makefile 
obj-m += ccnt-2.o

all:
   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

root@FriendlyARM:~/ccnt-2# 
root@FriendlyARM:~/ccnt-2# cat ccnt-2.c 
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/delay.h>

static void enable_ccnt_read(void* data)
{
  // PMCR.E (bit 0) = 1
  asm volatile ("mcr p15, 0, %0, c9, c12, 0" :: "r"(1));

  // PMCNTENSET.C (bit 31) = 1
  asm volatile ("mcr p15, 0, %0, c9, c12, 1" :: "r"(1 << 31));
}

int init_module()
{
  volatile unsigned cc1,cc2,cc3,cc4;
  on_each_cpu(enable_ccnt_read, NULL, 1);

  asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (cc1));
  asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (cc2));
  udelay(1000);
  asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (cc3));
  asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (cc4));
  printk("%u %u %u\n",cc1,cc2,cc2-cc1);
  printk("%u %u %u\n",cc2,cc3,cc3-cc2);
  printk("%u %u %u\n",cc3,cc4,cc4-cc3);
  return 0;
}

void cleanup_module()
{
}

MODULE_LICENSE("GPL");
root@FriendlyARM:~/ccnt-2#

Hermann.

Posted: **Mon Aug 29, 2016 9:13 pm**

I did compare high speed GPIO port reads of Arduino Due and Pi Zero in this posting:
http://forum.arduino.cc/index.php?topic=406165.msg2864084#msg2864084

While 84MHz Arduino Due did it with 35.7ns between readings, 1GHz Pi Zero did need more than 50ns between.

Today I did the measurements (see attachment) for Nanopi Neo, I disabled CPUs 1-3 and set min and max frequency for CPU0 to 1.2GHz:

Code: Select all

root@FriendlyARM:~/ccnt-2# ./ccnt2b 
1099932594 1099932599 5
1099932599 1099933043 444
404
404
404
404
root@FriendlyARM:~/ccnt-2#

4 GPIO reads took 444 clock cycles, that is 444/4*1000/1200=92.5ns per read, nearly double the time of Raspberry Pi Zero GPIO read.

Hermann.

FriendlyELEC-Forum

Nanopi Neo/M1 0.83ns "cycle counter register"

Nanopi Neo/M1 0.83ns "cycle counter register"

Re: Nanopi Neo/M1 0.83ns "cycle counter register"