Ctypes混合编程速成指南

这次课程设计作业是关于操作系统的, 因此设计要求的语言也很操作系统, 是 C.

缘起

我上次使用 C/C++ 语言写代码还是去年三月的数据结构课程设计. 当时也是要求使用 C/C++ 作为设计的语言, 配合 Qt5 做了第一个有 GUI 界面的程序. 从那以后一年多, 我便再也没有用过 C/C++. 除非已经有要求语言, 否则在开放选择的状态下我还是比较喜欢选择 Python 作为首选语言.

这次的任务比较不一样, 核心算法的代码要求使用 C 语言, 并且要尽量少使用库, 不过 GUI 的语言不做要求. 因此我想说试验一下, 用 Python 的 Tkinter 库制作 GUI, 再想办法把 C 语言的程序使用 Python 调用. Python 拥有众多的库, 因此实现这个应该不是大问题.

Google 后, 找到了一份文档 Python Tips - 21. Python C extensions . 里面提到了三个可以用 Python 代码调用 C 函数的方法: ctypes, SWIG and Python/C API.

Ctypes 看起来足够简单易上手; SWIG 因为极其复杂用户较少; Python/C API 虽然使用的人多, 但是看起来也挺复杂的. 因此我最终决定选择 Ctypes 作为连接 Python 和 C 的胶水.

一个简单的应用例子

一个简单的求和函数, 保存在 add.c

//sample C file to add 2 numbers - int and floats

#include <stdio.h>

int add_int(int, int);
float add_float(float, float);

int add_int(int num1, int num2){
    return num1 + num2;
}

float add_float(float num1, float num2){
    return num1 + num2;
}

把 C 文件编译为 *.so 或 *.dll 文件

#For Linux
$  gcc -shared -Wl,-soname,adder -o adder.so -fPIC add.c

#For Mac
$ gcc -shared -Wl,-install_name,adder.so -o adder.so -fPIC add.c

Python 调用代码如下:

from ctypes import *

#load the shared object file
adder = CDLL('./adder.so')

#Find sum of integers
res_int = adder.add_int(4,5)
print "Sum of 4 and 5 = " + str(res_int)

#Find sum of floats
a = c_float(5.5)
b = c_float(4.1)

add_float = adder.add_float
add_float.restype = c_float
print "Sum of 5.5 and 4.1 = ", str(add_float(a, b))

输出结果:

Sum of 4 and 5 = 9
Sum of 5.5 and 4.1 =  9.60000038147

更多更复杂的内容, 请参考官方文档的 ctypes 部分 ctypes — A foreign function library for Python . 因此章节翻译不全, 中文版可见 16.16. ctypes - Python的外部函数库 (此版本为 3.5.2 落后于最新版本).

Ctypes 速成指南

这里对我这两天从入门到精通(算吧)的心得做一个总结. 我把碰到的一些 Ctypes 常见问题做了总结.

0.位对应
1.基础类型
2.文件编码
3.结构体
4.指针
5.声明返回值和参数类型
6.内容丢失

0.位对应

*.so 和 *.dll 文件也分为 32 bit 和 64 bit, 只能和位对应的 Python 一起运作. 具体来说, 32 位的 *.so 和 *.dll 文件只能和 32 位的 Python 一起运行, 64 位同理.

上文提到的编译命令默认产生的似乎都是 32 位的文件. 我在加入了 -m64 之后, gcc 就会发生错误, 而没有 -m64 的命令, 或是 -m32 的命令就不会出错(没有去追究为什么).

Stackoverflow 上有人说可以用 VS2015 x64 原生工具生成 64位 *.so 和 *.dll 文件. 我倾向于使用 32位的 Python, 去下载一个 Python 然后在 IDE 里面替换原先的配置就可以用了.

1.基础类型

Ctypes 有几个需要注意的地方, 其一就是基础类型(Ctypes/C/Python 对应类型). 官方文档的内容很长, 看着比较费神, 把这个表拿出来应该就比较容易懂了.

怎么使用我在之后会提到, 这里只要先有个 需要类型转换 的概念即可.

此部分内容摘自文档的 ctypes 部分 ctypes — A foreign function library for Python .

ctypes type	C type	Python type
c_bool	_Bool	bool (1)
c_char	char	1-character bytes object
c_wchar	wchar_t	1-character string
c_byte	char	int
c_ubyte	unsigned char	int
c_short	short	int
c_ushort	unsigned short	int
c_int	int	int
c_uint	unsigned int	int
c_long	long	int
c_ulong	unsigned long	int
c_longlong	__int64 or long long	int
c_ulonglong	unsigned __int64 or unsigned long long	int
c_size_t	size_t	int
c_ssize_t	ssize_t or Py_ssize_t	int
c_float	float	float
c_double	double	float
c_longdouble	long double	float
c_char_p	char * (NUL terminated)	bytes object or None
c_wchar_p	wchar_t * (NUL terminated)	string or None
c_void_p	void *	int or None

左列是 Ctypes 中的类型, 中间是对应的 C 类型, 右边是 Python 中对应的类型. 在使用 Ctypes 的时候只要关注左边两列的对应关系即可. Python 比较灵活, 不同类型轻松转换就行了.

2.文件编码

C 的文件编码格式是 utf-8/ascii 的, 而 Python 采用的是 Unicode 编码. 因此在两个语言间传输内容, 比如字符串的时候, 需要把得到的结果进行重新编码.

举个例子:

void readFile(char *fileName){
    FILE *fp;
    char ch;
    fp = fopen(fileName, "r");
    if (fp == NULL){
        printf("[ERROR] FILE OPEN FAILED!!");
        exit(-1);
    }
    ...
    fclose(fp);
}

如果你是想从 Python 调用这个函数, 首先要把字符串编码为 utf-8 或是 ascii. c_char_p() 是为了把类型转化为 C 对应的类型, 之后会对这部分内容进行更详细的讲解. 很多问题就是因为没有参照第一部分的内容做类型转换而导致的, 而编译器又不怎么会对 Ctypes 做检查(至少 PyCharm 是这样的), 因此需要特别注意.

c_filename = c_char_p(self.filename.encode('utf-8'))
file.readFile(c_filename)

3.结构体

如果你的 C 文件里面有建立结构体, 那么你需要在 Python 文件里使用类定义一个对应的结构.

举个简单的例子, 其他类型请对应第一部分的转换表.

typedef struct Pages{
    int capacity;                    // capacity
    int load;                        // load
    char page[MAX_CAPACITY];         // curPage
    int pagePointer;                 // pointer(CLOCK)
    int pageTime[MAX_CAPACITY];      // exist time(FIFO/LRU)
    int postPageTime[MAX_CAPACITY];  // next time to use(OPT)
}Pages;

Python 中的结构如下:

class Pages(Structure):
    _fields_ = [("capacity", c_int), ("load", c_int), ("page", c_char * 8),
                ("pagePointer", c_int), ("pageTime", c_int * 8), ("postPageTime", c_int * 8)]

4.指针

如果你使用了指针, 那么也需要有一个类型转换的部分.

void init_pages(Pages *pages, Info *info, PagesHistory *pagesHistory){
    ...
    }
}

那么, 在 Python 中的调用方式需要转换.

pages_replacement.init_pages(POINTER(self.pages), POINTER(self.info), POINTER(self.pages_history))

5.声明返回值和参数类型

不过, 我还是推荐在使用 C 的函数前先进行声明, 这样看起来格式比较统一.

比如, 上面的 Python调用也可以这么写:

class Test:
  def __init__(self):
    # 实例化三个结构体
    self.pages = Pages()
    self.pages_history = PagesHistory()
    self.info = Info()

    # 对 init_pages 的返回值类型声明(void 对应的是 None, 默认的 c_int 似乎也可以)
    pages_replacement.init_pages.restype = None
    # 对 init_pages 的参数类型声明(argtypes 会被 PyCharm 加下划线, 但是就是这么写)
    pages_replacement.init_pages.argtypes = [POINTER(Pages), POINTER(Info), POINTER(PagesHistory)]
    # 调用函数
    pages_replacement.init_pages(self.pages, self.info, self.pages_history)
    ...
    # 再次调用函数
    pages_replacement.init_pages(self.pages, self.info, self.pages_history)

注意: 返回值类型声明是 restype, 参数类型声明是 argtypes. 一个是单数一个是复数.

这样一来, 就可以先在函数使用前声明返回值类型和参数类型, 不然每次调用都要在参数内进行一次类型转换, 既不美观又容易出错.

再举个额外的例子:

void readReplayFile(PagesHistory *pagesHistory, char *fileName){
    FILE *fp;
    char str[MAX_CAPACITY*MAX_LENGTH];
    fp = fopen(fileName, "r");
    if (fp == NULL){
        printf("[ERROR] FILE OPEN FAILED!!");
        exit(-1);
    }
    ...
}

file.readReplayFile.restype = None
file.readReplayFile.argtypes = [POINTER(PagesHistory), c_char_p]
c_filename = c_char_p(self.filename.encode('utf-8'))
file.readReplayFile(self.pages_history, c_filename)

6.内容丢失

这几天在查资料的时候也有看到很多人提到, 在 C 文件内有时内存会被释, 导致数据丢失. 看 Stackoverflow 的普遍解法是在 C 里面使用 malloc 分配动态内存.

万幸, 我自己是没有遇到这个状况. 在这里简单提一句, 也顺便记录一下.