神威太湖之光上机摸索记录

CPU端序

sw26010和x86一样都是小端序。测试程序如下,输出均为1

1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>
int is_little_endian() {
short s = 0x0110;
char *p = (char *) &s;
return (p[0] == 0x10);
}

int main() {
printf("%d\n", is_little_endian() );
return 0;
}

编译相关

输出头文件路径

1
2
echo | sw5cc -host -E -Wp,-v -
echo | sw5cc -slave -E -Wp,-v -

关键的头文件路径

  • /usr/sw-mpp/swcc/lib/gcc-lib/sw_64-swcc-linux/5.421-sw-500/include 包含SIMDDMA相关函数
  • /usr/sw-mpp/swcc/sw5gcc-binary/include 包含LDMFFTAthread相关函数

调试相关

作业系统

设置Log Level

1
RMS_DEBUG=7

System相关

x86节点系统:Red Hat Enterprise Linux Server release 6.6 sw节点系统:RaiseOS

RaiseOS

大家都觉得这系统应该跟Busybox构建的rootfs差不多,也就是说整个申威节点就是大号开发板。下面是申威节点上某时刻的进程列表。顺带一提,这系统连bash都没有。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
PID   USER     TIME   COMMAND
1 root 0:16 init
2 root 0:00 [kthreadd]
3 root 0:00 [ksoftirqd/0]
5 root 0:00 [kworker/0:0H]
6 root 0:00 [kworker/u:0]
7 root 0:00 [kworker/u:0H]
8 root 0:00 [migration/0]
9 root 0:00 [rcu_bh]
10 root 0:26 [rcu_sched]
11 root 0:00 [ksoftirqd/4]
12 root 0:02 [migration/4]
13 root 0:00 [kworker/4:0]
14 root 0:00 [kworker/4:0H]
15 root 0:00 [ksoftirqd/8]
16 root 0:00 [migration/8]
17 root 0:00 [kworker/8:0]
18 root 0:00 [kworker/8:0H]
19 root 0:00 [ksoftirqd/12]
20 root 0:00 [migration/12]
21 root 0:00 [kworker/12:0]
22 root 0:00 [kworker/12:0H]
23 root 0:00 [cpuset]
24 root 0:00 [khelper]
25 root 0:00 [netns]
26 root 0:00 [bdi-default]
27 root 0:00 [kblockd]
28 root 0:00 [rpciod]
29 root 0:06 [kworker/12:1]
30 root 0:01 [kswapd0]
31 root 0:00 [kswapd1]
32 root 0:00 [kswapd2]
33 root 0:00 [kswapd3]
34 root 0:00 [nfsiod]
35 root 0:00 [mlx4]
36 root 0:06 [kworker/8:1]
37 root 0:08 [kworker/4:1]
38 root 0:00 [kworker/0:1]
39 root 0:00 [ib_mcast]
40 root 0:00 [ib_cm]
41 root 0:00 [iw_cm_wq]
42 root 0:00 [ib_addr]
43 root 0:00 [rdma_cm]
44 root 0:00 [mthca_catas]
45 root 0:00 [mlx4_ib]
46 root 0:00 [mlx4_ib_mcg]
47 root 0:00 [ib_mad1]
48 root 0:00 [deferwq]
49 root 0:00 [kworker/u:1]
50 root 0:00 {rcS} /bin/sh /etc/init.d/rcS
120 root 0:00 /usr/sbin/telnetd
127 root 0:02 /usr/sw-mpp/sbin/rmsd_100p_std
130 root 0:00 /usr/sw-mpp/sbin/swres -c
135 root 0:00 sh /sbin/start_online1.sh
136 root 0:00 /sbin/ntpd -p *** -Nn
148 root 1:13 /usr/local/sbin/lwfs -f /etc/lwfs/lwfs.vol -l /dev/shm/lw
187 root 0:00 sh
197 root 0:41 /sbin/sotailf_brief -t *** -p *** /dev/shm/lwfs_onl
206 root 0:32 [kworker/0:2]
3406 root 0:00 sleep 600
3413 root 0:00 [flush-0:14]
3420 root 0:00 /usr/sw-mpp/sbin/taskstarter -jobid 49200448 -rh mn005 -r
3421 * 0:00 /bin/ps -ef

顺带放出来cpuinfo和meminfo的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cpu                     : SW_64
cpu model : SW5
cpu variation : 1
cpu revision : 0
cpu serial number :
system type : shenwei
system variation : 0
system revision : 0
system serial number :
cycle frequency [Hz] : 1450000000
timer frequency [Hz] : 0.24
page size [bytes] : 8192
phys. address bits : 44
max. addr. space # : 255
BogoMIPS : 0.81
kernel unaligned acc : 0 (pc=0,va=0)
user unaligned acc : 88465541 (pc=4ff0423298,va=5000281dcc)
platform string : N/A
cpus detected : 4
cpus active : 4
cpu active mask : 0000000000001111
cpus core_start : 000000000000000f
mem cycle freq : 500
L1 Icache : 64K, 2-way, 64b line
L1 Dcache : 64K, 2-way, 64b line
L2 cache : n/a
L3 cache : n/a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
MemTotal:        2038472 kB
MemFree: 582544 kB
Buffers: 0 kB
Cached: 1199688 kB
SwapCached: 0 kB
Active: 830256 kB
Inactive: 429920 kB
Active(anon): 133568 kB
Inactive(anon): 1120 kB
Active(file): 696688 kB
Inactive(file): 428800 kB
Unevictable: 70912 kB
Mlocked: 2904 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 131376 kB
Mapped: 4224 kB
Shmem: 3888 kB
Slab: 22480 kB
SReclaimable: 6392 kB
SUnreclaim: 16088 kB
KernelStack: 1808 kB
PageTables: 352 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1019232 kB
Committed_AS: 201832 kB
VmallocTotal: 8388608 kB
VmallocUsed: 12208 kB
VmallocChunk: 8376400 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 8192 kB
======================= cg0 =====================
UserPages_Mem_size: 8192 MB
UserPages_Conti_Total: 7680 MB
UserPages_Conti_Free: 7680 MB
UserPages_Conti_Used: 0 MB
UserPages_Cross_Size: 0 MB
======================= cg1 =====================
UserPages_Mem_size: 8192 MB
UserPages_Conti_Total: 7680 MB
UserPages_Conti_Free: 7680 MB
UserPages_Conti_Used: 0 MB
UserPages_Cross_Size: 0 MB
======================= cg2 =====================
UserPages_Mem_size: 8192 MB
UserPages_Conti_Total: 7680 MB
UserPages_Conti_Free: 7680 MB
UserPages_Conti_Used: 0 MB
UserPages_Cross_Size: 0 MB
======================= cg3 =====================
UserPages_Mem_size: 8192 MB
UserPages_Conti_Total: 7680 MB
UserPages_Conti_Free: 7680 MB
UserPages_Conti_Used: 0 MB
UserPages_Cross_Size: 0 MB

Runtime相关

从核

从核可以直接调用C语言的函数,这点比CUDA Kernel强不少。一个简单的测试程序,发现printfrandmemset函数可以正常调用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// host.c
#include <stdio.h>
#include <athread.h>

extern void SLAVE_FUN(cpe_func)();

int main() {
printf("Hello world from MPE.\n");
athread_init();
athread_spawn(cpe_func, NULL);
athread_join();
return 0;
}

// slave.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "slave.h"

void cpe_func() {
int thread_id;
thread_id = athread_get_id(-1);
srand(time(NULL));
int arr[10], i;
for(i=0; i<10; ++i) { arr[i] = rand(); }
memset(&arr[5], 0, sizeof(int)*3);
if(thread_id==0) {
printf("Hello World from CPE. Generating Array:\n");
for(i=0; i<10; ++i) { printf("%d ", arr[i]); }
printf("\n");
}
}

奇怪的问题

函数命名问题

函数名字不要以slave_开头,否则会引发undefined reference to slave_slave_***的错误。从核函数在编译过程中会被重命名为slave_加原名的函数。从编译器内置的一些宏可以看出来这点。

1
2
#define SLAVE_FUN(x)        slave_##x
#define athread_spawn(y,z) __real_athread_spawn(slave_##y,z)