Python 之禅♂第三条
Simple is better than complex.
在 Python 中, string 中的每个字符占的空间大小是 8 bit.
>>> import sys
>>> sys.getsizeof('')
37
>>> sys.getsizeof('a')
38
可以看到, 空字符占用37个 byte, 长度为1的字符串 'a' 占内存 38个 byte. 多了一个字符 a 之后多了 1 个 byte.
在 Python 内部, string 是这样实现的 (
http://svn.python.org/projects/python/trunk/Include/stringobject.h)
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;
每个 char 就是存在 ob_sval 里面的, 占大小 8bit. 余下的36个 byte 主要来自于宏 PyObject_VAR_HEAD. 实际上 python 的string实现还用到了一个叫 *interned 的全局变量, 里面可以存长度为 0 或 1 的字符串, 也就是 char, 可以节省空间并且加快速度.
/* This dictionary holds all interned strings. Note that references to
strings in this dictionary are *not* counted in the string's ob_refcnt.
When the interned string reaches a refcnt of 0 the string deallocation
function will delete the reference from this dictionary.
Another way to look at this is that to say that the actual reference
count of a string is: s->ob_refcnt + (s->ob_sstate?2:0)
*/
static PyObject *interned;
实际上在 python 里既没有指针也没有"裸露的数据结构" (非对象), 连最简单的整数 integer 都是这样实现的
typedef struct {
PyObject_HEAD
long ob_ival;
} PyIntObject;
总而言之, 这样的设计满足 python 的 "一切都是对♂象♂", "一切都尽可能simple" 的设计思想.