Skip to content

Latest commit

 

History

History
90 lines (54 loc) · 2.6 KB

File metadata and controls

90 lines (54 loc) · 2.6 KB

bytes

contents

related file

  • cpython/Objects/bytesobject.c
  • cpython/Include/bytesobject.h
  • cpython/Objects/clinic/bytesobject.c.h

memory layout

memory layout

The memory layout of PyBytesObject looks like memory layout of tuple object and memory layout of int object, but simpler than any of them.

example

empty bytes

bytes object is an immutable object, whenever you need to modify a bytes object, you need to create a new one, which keeps the implementation simple.

s = b""

empty

ascii characters

Let's initialize a bytes object with ASCII characters

s = b"abcdefg123"

ascii

nonascii characters

s = "我是帅哥".encode("utf8")

nonascii

summary

ob_shash

The field ob_shash should store the hash value of the byte object, value -1 means not computed yet.

The first time the hash value is computed, it will be cached in the ob_shash field.

The cached hash value can save recalculation and speed up dictionary lookups

ob_size

The field ob_size is inside every PyVarObject. PyBytesObject uses this field to store size information to keep O(1) time complexity for the len() operation and to track the size of non-ASCII strings (which may contain null characters inside)

summary

The PyBytesObject is a Python wrapper of C-style null-terminated strings, with ob_shash for caching the hash value and ob_size for storing size information

The implementation of PyBytesObject looks like the embstr encoding in redis

redis-cli
127.0.0.1:6379> set a "hello"
OK
127.0.0.1:6379> object encoding a
"embstr"