Oh-Numpy
1. 导入Numpy¶
import numpy as np
2. 打印Numpy版本号及其配置¶
np.__version__
'2.2.2'
np.show_config()
Build Dependencies: blas: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.28 lapack: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.28 Compilers: c: commands: cc linker: ld.bfd name: gcc version: 10.2.1 c++: commands: c++ linker: ld.bfd name: gcc version: 10.2.1 cython: commands: cython linker: cython name: cython version: 3.0.11 Machine Information: build: cpu: x86_64 endian: little family: x86_64 system: linux host: cpu: x86_64 endian: little family: x86_64 system: linux Python Information: path: /tmp/build-env-6dq_gol7/bin/python version: '3.13' SIMD Extensions: baseline: - SSE - SSE2 - SSE3 found: - SSSE3 - SSE41 - POPCNT - SSE42 - AVX - F16C - FMA3 - AVX2 not found: - AVX512F - AVX512CD - AVX512_KNL - AVX512_KNM - AVX512_SKX - AVX512_CLX - AVX512_CNL - AVX512_ICL
3. 创建一个长度(size)为10的向量(vector)¶
注意这里就是创建一个np.ndarray
z = np.zeros(10)
z
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
4. 计算数组的内存大小¶
z = np.zeros((10, 10))
print("%d bytes" % (z.size * z.itemsize))
800 bytes
注意这里就是简单的把总元素个数z.size
(100), 乘上单个元素所占的内存z.itemsize
(8 bytes).
5. 在命令行打印出Numpy中add函数的帮助文档信息¶
这里主要是有关命令行调用Python的问题,我们可以在命令行从python --help
开始, 找到python -c
符合我们的要求,所以这里只需要在命令行执行python -c "import numpy; numpy.info(numpy.add)"
即可。就等同于在Python解释器中执行如下程序:
import numpy
numpy.info(numpy.add)
此外,针对Numpy,我们可以有很多种方式查看文档:上面的np.info(np.add)
, 以及利用help的help(np.add)
, 和比较少用的doc方法的调用print(np.add.__doc__)
(输出和np.info
一致), 一般来说用np.info
就可以了,也比较方便。如果使用IPython
的话,可以直接np.add?
回车
6. 创建一个长度为10的向量,其第五个值为1, 其他为0¶
z = np.zeros(10)
z[4] = 1
z
array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])
7. 创建一个包含从10到49所有整数的向量¶
z = np.arange(10, 50)
z
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
注意Python也有内建函数range
具有相似的功能,相对而言,Numpy的arange
由于使用了内存优化技术,其效率要高很多。我们可以做个简单的水平对比。例子来自Scipy Lecture Notes
%timeit [i**2 for i in range(1000)]
36.6 μs ± 266 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit a = np.arange(1000) ** 2
1.23 μs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
8. 反转一个向量(逆序)¶
z = np.arange(10)
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
z = z[::-1]
z
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
9. 创建一个3x3的矩阵, 包含数字0到8¶
z = np.arange(0, 9).reshape(3, 3)
z
array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
这里可以在原题目的基础上进行拓展,上面的实现中0到8是可以看作以行为顺序,如我我们希望0到8以列为顺序排列呢?
# 列为顺序
z = np.arange(0, 9).reshape(3, 3).T
z
array([[0, 3, 6], [1, 4, 7], [2, 5, 8]])
其实只需要把原来的矩阵转置就可以了:-)
10. 找出[1,2,0,0,4,0] 中非0数字的位置¶
z = np.array([1,2,0,0,4,0])
z.nonzero()
(array([0, 1, 4]),)
11. 创建3x3的单位矩阵¶
z = np.eye(3)
z
array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
12. 创建3x3x3数组,以随机数字填充¶
z = np.random.random((3, 3, 3))
z
array([[[0.10123423, 0.21966112, 0.18899876], [0.11908867, 0.37073854, 0.131811 ], [0.36358605, 0.14836074, 0.09695158]], [[0.28505722, 0.0198312 , 0.82655425], [0.29247706, 0.92982139, 0.96068727], [0.34935797, 0.98729162, 0.57438858]], [[0.0751635 , 0.05223692, 0.92594158], [0.23688837, 0.21482642, 0.66818089], [0.06488248, 0.73560372, 0.68136474]]])
13. 创建10x10数组,以随机数字填充,并找出其中的最大值和最小值¶
z = np.random.random((10, 10))
z
array([[0.44113 , 0.3808509 , 0.02746347, 0.96999726, 0.60104135, 0.81909443, 0.46735998, 0.27213177, 0.30070838, 0.45319489], [0.56033826, 0.49942057, 0.07625198, 0.1580878 , 0.23466143, 0.24319792, 0.18267022, 0.9957254 , 0.09175414, 0.58720205], [0.83122233, 0.85223567, 0.68272392, 0.47054804, 0.1430139 , 0.63418961, 0.8363594 , 0.6349586 , 0.14426594, 0.82809394], [0.39554447, 0.70009108, 0.27523732, 0.46250024, 0.34815483, 0.08845787, 0.9116974 , 0.32023201, 0.68993804, 0.0167846 ], [0.7360231 , 0.94009083, 0.08273843, 0.06748668, 0.3254564 , 0.62978564, 0.41144829, 0.76430348, 0.31722187, 0.24191262], [0.42281886, 0.74644283, 0.54280632, 0.1105537 , 0.8243719 , 0.05425636, 0.28906517, 0.72102695, 0.02699448, 0.843014 ], [0.00782847, 0.35312571, 0.40465317, 0.63028924, 0.9963226 , 0.42614551, 0.54660927, 0.38908202, 0.82726585, 0.07093775], [0.01410549, 0.23904577, 0.54863807, 0.75826918, 0.57869667, 0.50594964, 0.97115857, 0.6493405 , 0.49316914, 0.59393411], [0.91357726, 0.94371939, 0.16684762, 0.02326755, 0.24744347, 0.85828464, 0.73675366, 0.09264827, 0.23496345, 0.94535806], [0.89210238, 0.20623822, 0.95908579, 0.21091583, 0.5893621 , 0.44828595, 0.11474124, 0.64756439, 0.35082326, 0.40222248]])
z.max(), z.min()
(np.float64(0.996322599906688), np.float64(0.007828471058030417))
14. 创建长度为10的随机向量,并计算其均值¶
z = np.random.random((10))
z
array([0.47956777, 0.6263883 , 0.67986127, 0.58973534, 0.87198368, 0.75330813, 0.68956554, 0.60547743, 0.05354576, 0.8230607 ])
z.mean()
np.float64(0.6172493912958619)
15. 创建一个二维数组,其边界值为1,内部值为0¶
z = np.ones((5, 5))
z[1:-1, 1:-1] = 0
z
array([[1., 1., 1., 1., 1.], [1., 0., 0., 0., 1.], [1., 0., 0., 0., 1.], [1., 0., 0., 0., 1.], [1., 1., 1., 1., 1.]])
16. 将现有的数组(nxn)用0组成的边界包裹¶
z = np.ones((5, 5))
z
array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
m = np.pad(z, (1, 1), mode='constant', constant_values=0)
m
array([[0., 0., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 0., 0., 0., 0., 0., 0.]])
17. 下列表达式的结果是什么¶
0 * np.nan
np.nan == np.nan
np.inf > np.nan
np.nan - np.nan
np.nan in set([np.nan])
0.3 == 3 * 0.1
0 * np.nan
nan
有np.nan
参与的算术操作返回均为np.nan
np.nan == np.nan
False
这里是合理的,比如我们从数据集读出两列数据全部是np.nan
, 如果上面的表达式设计为返回True
,
那么我们在完全不知道两列数据的情况下就判定二者是相等的,这显然是不合理的。所以这里返回的是False
.
np.inf > np.nan
False
同样地,我们不能比较无穷大与缺失值的大小
np.nan in set([np.nan])
True
0.3 == 3 * 0.1
False
由于浮点数(float
)运算存在误差,我们不能直接比较其大小。Numpy为我们提供了np.allclose
函数来比较浮点数之间的近似相等。此外,此函数还支持np.ndarray
的比较。
np.allclose(0.3, 3 * 0.1)
True
18. 创建一个5x5的矩阵,其中1,2,3,4正好在矩阵对角线元素下方¶
z = np.diag(np.arange(1, 5), k=-1)
z
array([[0, 0, 0, 0, 0], [1, 0, 0, 0, 0], [0, 2, 0, 0, 0], [0, 0, 3, 0, 0], [0, 0, 0, 4, 0]])
19. 创建一个8x8的矩阵,并用0,1标记为国际象棋棋盘的形式¶
如下所示, 黑色部分标记为1.
z = np.zeros((8, 8))
z[1::2, ::2] = 1 # 第2, 4, 6, 8行填充
z[::2, 1::2] = 1 # 第1, 3, 5, 7行填充
z
array([[0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.]])
20. 现有维度为6x7x8的数组,找出其中第100个元素的索引(x, y, z)¶
print(np.unravel_index(99, (6, 7, 8)))
(np.int64(1), np.int64(5), np.int64(3))
上面的是给出的答案,一开始我并不知道这个函数,采用了下面的方法,可以作为参考。
z = np.arange(6*7*8).reshape(6, 7, 8)
np.where(z == 99)
(array([1]), array([5]), array([3]))
这是通过Numpy找出来具体位置,但是具体计算的方法并未给出,这里简单解释下。
首先,我们可以形象第考虑“数组的维度越往后,对应数据的颗粒度越小”,也就是说,在上面的例子中,我们可以认为6x7x8的立方体是通过如下的方法来构建的:先将所有的一列值(6 * 7 * 8)排成一行,之后每8个组成一个“长条”, 这样就有6*7个长条;之后将每7个长条,上下拼接,铺成一个平面;这样我们就有6个平面,将这6个平面堆起来,就得到了我们最终的“立方体”。
那么第100个元素又在哪里呢?为方便起见,我们从“颗粒度”大的开始,依次定位其位置。首先,可以知道每一层含有7*8=56个元素,所以由100 // 56 = 1
得其位于第二层, 对应到该维度得到索引就是1,即返回的array[1]
。之后在第二层中继续定位, 去除第一层的56个元素,这里还剩下44个。又由于平面为7x8的,所以由44 // 8 = 5
得其位于第6行,对应该维度的索引是5, 即返回的array[5]
, 最后剩下4个元素在新的一行,对应维度的索引为3, 即返回的array[3]
.由此得到最终的索引为(1, 5, 3)
21. 用tile
函数创建一个8x8的棋盘¶
unit = np.array([[0, 1], [1, 0]])
z = np.tile(unit, (4, 4))
z
array([[0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0]])
tile
的原意就是铺瓷砖,是其作用的一个形象的比喻,这里我们把8x8的棋盘划分为4x4=16块“瓷砖”(这里的unit
), 之后将其平铺在一起即可。
22. 标准化一个5x5的随机矩阵¶
z = np.random.random((5, 5))
z = (z - z.mean()) / z.std()
z
array([[-0.70910491, 0.48982157, 1.3361218 , 1.02789447, 0.3519378 ], [-1.62182557, -1.47935268, 1.11603484, 0.63989904, 1.26060013], [ 1.1193641 , -1.26279274, -1.39294185, 0.3160582 , -1.38386977], [ 0.57000585, -1.36694524, 1.28394106, -0.39278706, -0.84427307], [ 0.64713634, 0.83918882, -0.75341648, 0.15040405, 0.05890133]])
23. 创建一个自定义的包含四个无符号字节(RGBA)的dtype
来描述颜色¶
color = np.dtype([("r", np.ubyte, 1),
("g", np.ubyte, 1),
("b", np.ubyte, 1),
("a", np.ubyte, 1)])
color
dtype([('r', 'u1', (1,)), ('g', 'u1', (1,)), ('b', 'u1', (1,)), ('a', 'u1', (1,))])
24. 5x3的矩阵与3x2的矩阵相乘¶
z = np.dot(np.ones((5, 3)), np.ones((3, 2)))
z
array([[3., 3.], [3., 3.], [3., 3.], [3., 3.], [3., 3.]])
# 也可以使用操作符 @
z = np.ones((5, 3)) @ np.ones((3, 2))
z
array([[3., 3.], [3., 3.], [3., 3.], [3., 3.], [3., 3.]])
25. 给定一个一维数组,将值在3和8之间的数字变为其负数¶
z = np.arange(10)
z[(z > 3) & (z < 8)] *= -1
z
array([ 0, 1, 2, 3, -4, -5, -6, -7, 8, 9])
26. 下面脚本的输出是什么¶
# Author: Jake VanderPlas
print(sum(range(5),-1))
from numpy import *
print(sum(range(5),-1))
sum(range(5), -1)
9
这里是使用Python内置的sum
函数, 它把所有的参数都当作求和的一部分相加, 这里就是简单地将所有的数字相加,10 - 1 = 9
np.sum(range(5), -1)
np.int64(10)
这里使用的是Numpy中的np.sum
, 这里的-1
并非待加的数字,而是另外一个参数的值,代表多维数组在求和时各个轴求和的顺序。具体可以help(np.sum)
27. z
是整数组成的向量,判断下列表达式是否正确¶
z = np.arange(10)
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# 1
z ** z
array([ 1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489])
前面提到过, 对于数组之间的**
等算术运算, 是元素一一对应进行运算的(element-wise), 如这里387420489,就等于9**9
# 2
2 << z >> 2
array([ 0, 1, 2, 4, 8, 16, 32, 64, 128, 256])
本质上进行两次移位运算,也就是等于(2 << z) >> 2
.下面将其拆开来看。
2 << z
array([ 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024])
就是将2分别左移0, 1, 2, ..., 9位,得到的就是2 << 0, 2<<1, ..., 2<<9
,如下所示:
part1 = [2 << i for i in range(10)]
part1
[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
可以看到与2 << z
的输出是一致的。
之后就是进行右移位操作,不同之处在于,这里是对于数组2 << z
中的每个元素进行右移位,分别右移2个位置。每个数值右移2代表对每个值x
,取x // 4
, 对part1
继续处理
[i // 4 for i in part1]
[0, 1, 2, 4, 8, 16, 32, 64, 128, 256]
可以看到其与2 << z >> 2
的输出是一致的。
# 3
z <- z
array([False, False, False, False, False, False, False, False, False, False])
这里涉及的主要是优先级的问题,随便找个操作符,如<
,通过help("<")
即可查看所有操作符的优先级,默认是从低优先级到高优先级。可以看到,-1
相比<
具有更高的优先级, 所以这里就等同于z < (-z)
,测试如下
z < (-z)
array([False, False, False, False, False, False, False, False, False, False])
# 4
1j * z
array([0.+0.j, 0.+1.j, 0.+2.j, 0.+3.j, 0.+4.j, 0.+5.j, 0.+6.j, 0.+7.j, 0.+8.j, 0.+9.j])
对复数的运算的支持
# 5
z/1/1
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
也就是(z/1)/1
# 6
try:
z < z > z
except Exception as e:
print(e)
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
这里参考了下Python表达式的文档,和stackoverflow 找到
Formally, if
a, b, c, …, y, z
are expressions andop1, op2, …, opN
are comparison operators, then aop1 b op2 c ... y opN z
is equivalent toa op1 b and b op2 c and ... y opN z
, except that each expression is evaluated at most once.
就是说,在一个表达式里面进行连续比较的时候,如x < y <= z
,首先是符合语法的,其等同于x < y adn y <=z
, 只不过对于重复的元素(这里的y
)只估计一次。所以我们在Python原生的list
中进行上述z
的运算是可以正常返回的,代码如下。
l1 = [1, 2]
l1 < l1 > l1
False
(l1 < l1) and (l1 > l1)
False
因为l1 < l1
和l1 > l1
全部是False
,所以其and
也是False
.但是对于我们的z
,也就是np.ndarray
类型的数据,情况就有所不同。
这里z < z > z
依旧是估计为z < z and z > z
, 其中z < z
与z > z
都是可以正常返回的,且结果都是一个长度为z.size()
的array
, 元素全部是布尔值。
z < z
array([False, False, False, False, False, False, False, False, False, False])
z > z
array([False, False, False, False, False, False, False, False, False, False])
不可行的是两者之间的and
。 因为在进行and
操作时,Numpy无法确切地知道形如array([False, False, ...])
的数组到底是估计为False
,还是True
, 因为这里有两种方法来定义一个数组的布尔值:其一是all
,即所有的元素全是True
才判定为True
,否则为False
; 另外一种方法是any
, 即只要数组中有一个True
,我们就判定其为True
, 否则判定为False
. 正是这种不确定性使得Numpy报错,并建议使用any
或者all
.
28. 下列表达式的结果是什么¶
np.array(0) / np.array(0)
np.array(0) // np.array(0)
np.array([np.nan]).astype(int).astype(float)
np.array(0) / np.array(0)
/tmp/ipykernel_73994/873513115.py:1: RuntimeWarning: invalid value encountered in divide np.array(0) / np.array(0)
np.float64(nan)
返回nan
并带有警告说在进行真除(true_divide)的时候出现问题,即0做分母。
np.array(0) // np.array(0)
/tmp/ipykernel_73994/2018018105.py:1: RuntimeWarning: divide by zero encountered in floor_divide np.array(0) // np.array(0)
np.int64(0)
返回0, 并带有警告说在进行地板除(floor_divide)的时候出现问题,即0做分母。
np.array([np.nan]).astype(int).astype(float)
/tmp/ipykernel_73994/699728972.py:1: RuntimeWarning: invalid value encountered in cast np.array([np.nan]).astype(int).astype(float)
array([-9.22337204e+18])
29. 舍入浮点数数组,使其尽可能远离0¶
即-0.3, -0.5, -0.6等近似为-1,而非0; 0.3, 0.5, 0.6等近似为1,而非0.
z = np.random.uniform(-10, 10, 10)
z
array([ 2.72380024, -5.97068664, 1.02848316, -6.37947006, -8.47874052, 5.51427231, -9.2120279 , -6.1678368 , -9.5128257 , 8.92989504])
np.copysign(np.ceil(np.abs(z)), z)
array([ 3., -6., 2., -7., -9., 6., -10., -7., -10., 9.])
30. 找到两个数组中相同的元素¶
~我们首先考虑内置的函数,但是我们不知道是否有类似的函数,所以我们可以灵活使用np.lookfor
来找出我们要的函数.~
np.lookfor
was removed in the NumPy 2.0 release. Search NumPy's documentation directly.
np.info(np.intersect1d)
Find the intersection of two arrays. Return the sorted, unique values that are in both of the input arrays. Parameters ---------- ar1, ar2 : array_like Input arrays. Will be flattened if not already 1D. assume_unique : bool If True, the input arrays are both assumed to be unique, which can speed up the calculation. If True but ``ar1`` or ``ar2`` are not unique, incorrect results and out-of-bounds indices could result. Default is False. return_indices : bool If True, the indices which correspond to the intersection of the two arrays are returned. The first instance of a value is used if there are multiple. Default is False. Returns ------- intersect1d : ndarray Sorted 1D array of common and unique elements. comm1 : ndarray The indices of the first occurrences of the common values in `ar1`. Only provided if `return_indices` is True. comm2 : ndarray The indices of the first occurrences of the common values in `ar2`. Only provided if `return_indices` is True. Examples -------- >>> import numpy as np >>> np.intersect1d([1, 3, 4, 3], [3, 1, 2, 1]) array([1, 3]) To intersect more than two arrays, use functools.reduce: >>> from functools import reduce >>> reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2])) array([3]) To return the indices of the values common to the input arrays along with the intersected values: >>> x = np.array([1, 1, 2, 3, 4]) >>> y = np.array([2, 1, 4, 6]) >>> xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True) >>> x_ind, y_ind (array([0, 2, 4]), array([1, 0, 2])) >>> xy, x[x_ind], y[y_ind] (array([1, 2, 4]), array([1, 2, 4]), array([1, 2, 4]))
根据文档就可以直接使用了。
z1 = np.arange(-5, 5)
z2 = np.arange(10)
np.intersect1d(z1, z2)
array([0, 1, 2, 3, 4])
31. 如何忽视所有Numpy的警告(不推荐)¶
# 自杀模式启动:-)
defaults = np.seterr(all="ignore")
Z = np.ones(1) / 0
# 恢复理智
_ = np.seterr(**defaults)
# 也可以定义错误处理的细节
with np.errstate(divide='warn'):
Z = np.ones(1) / 0
/tmp/ipykernel_73994/1125215274.py:3: RuntimeWarning: divide by zero encountered in divide Z = np.ones(1) / 0
32. 下面的表达式会返回True
吗¶
np.sqrt(-1) == np.emath.sqrt(-1)
np.sqrt(-1) == np.emath.sqrt(-1)
/tmp/ipykernel_73994/244602691.py:1: RuntimeWarning: invalid value encountered in sqrt np.sqrt(-1) == np.emath.sqrt(-1)
np.False_
np.sqrt(-1), np.emath.sqrt(-1)
/tmp/ipykernel_73994/178168484.py:1: RuntimeWarning: invalid value encountered in sqrt np.sqrt(-1), np.emath.sqrt(-1)
(np.float64(nan), np.complex128(1j))
33. 如何获取昨天,今天,明天的日期¶
yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
today = np.datetime64('today', 'D')
tomorrow = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
yesterday, today, tomorrow
(np.datetime64('2025-01-22'), np.datetime64('2025-01-23'), np.datetime64('2025-01-24'))
34. 如何获取2016年7月全部31天的日期¶
z = np.arange('2016-07', '2016-08', dtype='datetime64[D]')
z
array(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04', '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08', '2016-07-09', '2016-07-10', '2016-07-11', '2016-07-12', '2016-07-13', '2016-07-14', '2016-07-15', '2016-07-16', '2016-07-17', '2016-07-18', '2016-07-19', '2016-07-20', '2016-07-21', '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25', '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29', '2016-07-30', '2016-07-31'], dtype='datetime64[D]')
35. 如何以替换的方式(in place)计算$((A+B)*(-A/2))$(不通过复制)¶
A = np.ones(3)*1
B = np.ones(3)*2
C = np.ones(3)*3
np.add(A,B,out=B)
np.divide(A,2,out=A)
np.negative(A,out=A)
np.multiply(A,B,out=A)
array([-1.5, -1.5, -1.5])
36. 用5种方法提取随机数组中的整数部分¶
z = np.random.uniform(0, 10, 10)
z
array([8.02607027, 7.46437046, 7.66450121, 3.82822738, 5.41068636, 3.32256939, 9.27552892, 5.33239052, 7.91256662, 7.92753186])
# 1
z - z % 1
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 2
np.floor(z)
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 3
np.ceil(z) - 1
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 4
z.astype(int)
array([8, 7, 7, 3, 5, 3, 9, 5, 7, 7])
# 5
np.trunc(z)
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
37. 创建一个5x5的矩阵,每行均为0到4¶
# 1,答案的方法
z = np.zeros((5, 5))
z += np.arange(5)
z
array([[0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.]])
# 2,利用tile
z = np.tile(np.arange(5), (5, 1))
z
array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]])
38. 现有一个可以生成10个整数的生成器函数,利用其建立一个数组¶
# 1, 答案的方法
def gen():
for i in range(10):
yield i
z = np.fromiter(gen(), dtype=float, count=-1)
z
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
# 2, 列表解析
z = np.array([i for i in gen()])
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
39. 创建一个长度为10,范围从0到1的向量(不包括0,1)¶
z = np.linspace(0, 1, 11, endpoint=False)[1:]
z
array([0.09090909, 0.18181818, 0.27272727, 0.36363636, 0.45454545, 0.54545455, 0.63636364, 0.72727273, 0.81818182, 0.90909091])
40. 创建一个长度为10的随机数组并排序¶
z = np.random.random(10)
z
array([0.65281861, 0.32809737, 0.3265718 , 0.56394927, 0.65133119, 0.33415774, 0.12370651, 0.80815425, 0.25217611, 0.77599015])
z.sort()
z
array([0.12370651, 0.25217611, 0.3265718 , 0.32809737, 0.33415774, 0.56394927, 0.65133119, 0.65281861, 0.77599015, 0.80815425])
41. 对于长度较小的数组,如何更高效地求和(相对np.sum
)¶
z = np.arange(10)
np.add.reduce(z)
np.int64(45)
%timeit np.add.reduce(z)
800 ns ± 3.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%timeit np.sum(z)
1.6 μs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
可以看到np.add.reduce
此时差不多快上一倍
42. 检查两个数组A, B是否相等¶
A = np.random.randint(0,2,5)
B = np.random.randint(0,2,5)
# 1, 已知A,B的shape相等
# 存在容错,适用于浮点数的比较
np.allclose(A, B)
True
# 2. 同时检查shape与数值
# 要求数值完全相等
np.array_equal(A, B)
True
43. 限制数组为不可变数组(read only)¶
z = np.zeros(10)
z.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
z.flags.writeable = False
z.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : False ALIGNED : True WRITEBACKIFCOPY : False
try:
z[0] = 1
except Exception as e:
print(e)
assignment destination is read-only
44. 给定10x2矩阵代表平面座标系中座标,将其转化为极座标系座标¶
z = np.random.random((10, 2))
x, y = z[:, 0], z[:, 1]
r = np.sqrt(x**2 + y**2)
theta = np.arctan2(y, x)
r, theta
(array([0.33820331, 1.31494561, 0.69826993, 0.65504453, 1.16197209, 0.76539786, 0.79805555, 1.24695648, 0.53250039, 0.57669093]), array([0.47338591, 0.80078892, 1.27664152, 1.31267719, 1.00810859, 0.39820197, 0.12497058, 0.76005997, 0.32861008, 1.12416952]))
45. 创建一个长度为10的随机向量,并将其中最大的数改为0¶
z = np.random.random(10)
z
array([0.00997928, 0.95496242, 0.63923037, 0.27144055, 0.19121816, 0.72670271, 0.63900711, 0.95916728, 0.52407859, 0.25870044])
z[z.argmax()] = 0
z
array([0.00997928, 0.95496242, 0.63923037, 0.27144055, 0.19121816, 0.72670271, 0.63900711, 0. , 0.52407859, 0.25870044])
46. 创建一个结构化的数组,其元素为x轴,y轴的座标,并覆盖[0,1]x[0, 1]¶
z = np.zeros((5, 5), [('x', float), ('y', float)])
z = np.meshgrid(np.linspace(0, 1, 5),
np.linspace(0, 1, 5))
z
(array([[0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ]]), array([[0. , 0. , 0. , 0. , 0. ], [0.25, 0.25, 0.25, 0.25, 0.25], [0.5 , 0.5 , 0.5 , 0.5 , 0.5 ], [0.75, 0.75, 0.75, 0.75, 0.75], [1. , 1. , 1. , 1. , 1. ]]))
47. 给定两个数组X,Y, 计算其柯西矩阵C(Cauchy Matrix)并求其行列式¶
$$C_{ij} = \frac{1}{x_i-y_j}$$
x = np.arange(8)
y = x + 0.5
C = 1 / np.subtract.outer(x, y)
C
array([[-2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615, -0.13333333], [ 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615], [ 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818], [ 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222], [ 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429], [ 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 ], [ 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667], [ 0.15384615, 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. ]])
np.linalg.det(C)
np.float64(3638.163637117973)
其实这里np.subtract.outer
就等于进行了broadcast,我们也可以像下面这样写。
C_test = 1 / (x.reshape(8, 1)- y.reshape(1, 8))
C_test
array([[-2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615, -0.13333333], [ 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615], [ 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818], [ 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222], [ 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429], [ 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 ], [ 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667], [ 0.15384615, 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. ]])
np.linalg.det(C_test)
np.float64(3638.163637117973)
# 测试两种方法返回的C是否相同
np.array_equal(C, C_test)
True
48. 打印Numpy所有标量类型(scalar type)可表示的最值¶
for dtype in [np.int8, np.int16, np.int32, np.int64]:
info = np.iinfo(dtype)
print(f"{dtype}: min={info.min}, max={info.max}")
for dtype in [np.float16, np.float32, np.float64,]:
info = np.finfo(dtype)
print(f"{dtype}: min={info.min}, max={info.max}")
<class 'numpy.int8'>: min=-128, max=127 <class 'numpy.int16'>: min=-32768, max=32767 <class 'numpy.int32'>: min=-2147483648, max=2147483647 <class 'numpy.int64'>: min=-9223372036854775808, max=9223372036854775807 <class 'numpy.float16'>: min=-65504.0, max=65504.0 <class 'numpy.float32'>: min=-3.4028234663852886e+38, max=3.4028234663852886e+38 <class 'numpy.float64'>: min=-1.7976931348623157e+308, max=1.7976931348623157e+308
49. 打印数组所有元素(不省略)¶
with np.printoptions(threshold=np.inf):
z = np.ones((10, 10))
print(z)
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
50. 给定一个数,在数组中找出距离其最近的数¶
# 给定的数组
z = np.random.uniform(0, 1, 10)
z
array([0.97025996, 0.36085926, 0.09509649, 0.43063634, 0.85445517, 0.44517651, 0.9316781 , 0.47078064, 0.80017135, 0.17822901])
# 给定的数
x = 0.5
# 定位距离最近数的位置
index = np.abs(z - x).argmin()
# 找到该数字
z[index]
np.float64(0.47078064496366556)
z = np.zeros(10, [('position', [('x', float, 1),
('y', float, 1)]),
('color', [('r', float, 1),
('g', float, 1),
('b', float, 1)])
]
)
z
array([(([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.]))], dtype=[('position', [('x', '<f8', (1,)), ('y', '<f8', (1,))]), ('color', [('r', '<f8', (1,)), ('g', '<f8', (1,)), ('b', '<f8', (1,))])])
52. 考虑一个形状为(10, 2)的随机向量,若其代表二维平面中的点,求各点之间的距离¶
z = np.random.random((10, 2))
x, y = np.atleast_2d(z[:, 0], z[:, 1])
d = np.sqrt((x - x.T)**2 + (y - y.T)**2)
d
array([[0. , 0.36156554, 0.25792355, 0.69852624, 0.06046602, 0.55974987, 0.44745959, 0.51535957, 0.91050336, 0.68792257], [0.36156554, 0. , 0.51496789, 0.34198684, 0.31153682, 0.57121176, 0.69180151, 0.34077339, 0.74043802, 0.80160021], [0.25792355, 0.51496789, 0. , 0.80367147, 0.25447245, 0.39136883, 0.19261297, 0.49162044, 0.81302103, 0.44748816], [0.69852624, 0.34198684, 0.80367147, 0. , 0.64409442, 0.69889723, 0.95232077, 0.40096981, 0.64195424, 0.96127519], [0.06046602, 0.31153682, 0.25447245, 0.64409442, 0. , 0.51617982, 0.44708542, 0.45590656, 0.85391212, 0.66273654], [0.55974987, 0.57121176, 0.39136883, 0.69889723, 0.51617982, 0. , 0.39188871, 0.29794784, 0.44238147, 0.26334318], [0.44745959, 0.69180151, 0.19261297, 0.95232077, 0.44708542, 0.39188871, 0. , 0.59452006, 0.83401915, 0.32480842], [0.51535957, 0.34077339, 0.49162044, 0.40096981, 0.45590656, 0.29794784, 0.59452006, 0. , 0.4130361 , 0.56045551], [0.91050336, 0.74043802, 0.81302103, 0.64195424, 0.85391212, 0.44238147, 0.83401915, 0.4130361 , 0. , 0.62292175], [0.68792257, 0.80160021, 0.44748816, 0.96127519, 0.66273654, 0.26334318, 0.32480842, 0.56045551, 0.62292175, 0. ]])
这里使用np.atleast_2d
使得我们得到的x
, y
直接就是2维的数组,方便了我们后面直接使用broadcasting
.我们也可以采用下面的方法代替这行,但是不够简洁:
x = z[:, 0].reshape(10, 1)
y = z[:, 1].reshape(1, 10)
此外我们也可以使用scipy
内置的函数,其效率要高一些。
import scipy
import scipy.spatial
d = scipy.spatial.distance.cdist(z, z)
d
array([[0. , 0.36156554, 0.25792355, 0.69852624, 0.06046602, 0.55974987, 0.44745959, 0.51535957, 0.91050336, 0.68792257], [0.36156554, 0. , 0.51496789, 0.34198684, 0.31153682, 0.57121176, 0.69180151, 0.34077339, 0.74043802, 0.80160021], [0.25792355, 0.51496789, 0. , 0.80367147, 0.25447245, 0.39136883, 0.19261297, 0.49162044, 0.81302103, 0.44748816], [0.69852624, 0.34198684, 0.80367147, 0. , 0.64409442, 0.69889723, 0.95232077, 0.40096981, 0.64195424, 0.96127519], [0.06046602, 0.31153682, 0.25447245, 0.64409442, 0. , 0.51617982, 0.44708542, 0.45590656, 0.85391212, 0.66273654], [0.55974987, 0.57121176, 0.39136883, 0.69889723, 0.51617982, 0. , 0.39188871, 0.29794784, 0.44238147, 0.26334318], [0.44745959, 0.69180151, 0.19261297, 0.95232077, 0.44708542, 0.39188871, 0. , 0.59452006, 0.83401915, 0.32480842], [0.51535957, 0.34077339, 0.49162044, 0.40096981, 0.45590656, 0.29794784, 0.59452006, 0. , 0.4130361 , 0.56045551], [0.91050336, 0.74043802, 0.81302103, 0.64195424, 0.85391212, 0.44238147, 0.83401915, 0.4130361 , 0. , 0.62292175], [0.68792257, 0.80160021, 0.44748816, 0.96127519, 0.66273654, 0.26334318, 0.32480842, 0.56045551, 0.62292175, 0. ]])
53. 将一个32位的浮点数数组,(不使用额外内存)转化为32为的整数数组¶
z = np.zeros(10, dtype=np.float32)
z
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
z = z.astype(np.int32, copy=False)
z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
54. 如何读取下面的文件¶
1, 2, 3, 4, 5
6, , , 7, 8
, , 9,10,11
from io import StringIO
# “假”的文件
s = StringIO("""1, 2, 3, 4, 5\n
6, , , 7, 8\n
, , 9,10,11\n""")
z = np.genfromtxt(s, delimiter=",", missing_values=' ')
z
array([[ 1., 2., 3., 4., 5.], [ 6., nan, nan, 7., 8.], [nan, nan, 9., 10., 11.]])
55. Python内置有enumerate
,Numpy中与之对应的是?¶
Z = np.arange(9).reshape(3,3)
for index, value in np.ndenumerate(Z):
print(index, value)
(0, 0) 0 (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 1) 4 (1, 2) 5 (2, 0) 6 (2, 1) 7 (2, 2) 8
for index in np.ndindex(Z.shape):
print(index, Z[index])
(0, 0) 0 (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 1) 4 (1, 2) 5 (2, 0) 6 (2, 1) 7 (2, 2) 8
56. 生成二维高斯分布¶
X, Y = np.meshgrid(np.linspace(-1,1,10), np.linspace(-1,1,10))
D = np.sqrt(X*X+Y*Y)
sigma, mu = 1.0, 0.0
G = np.exp(-( (D-mu)**2 / ( 2.0 * sigma**2 ) ) )
G
array([[0.36787944, 0.44822088, 0.51979489, 0.57375342, 0.60279818, 0.60279818, 0.57375342, 0.51979489, 0.44822088, 0.36787944], [0.44822088, 0.54610814, 0.63331324, 0.69905581, 0.73444367, 0.73444367, 0.69905581, 0.63331324, 0.54610814, 0.44822088], [0.51979489, 0.63331324, 0.73444367, 0.81068432, 0.85172308, 0.85172308, 0.81068432, 0.73444367, 0.63331324, 0.51979489], [0.57375342, 0.69905581, 0.81068432, 0.89483932, 0.9401382 , 0.9401382 , 0.89483932, 0.81068432, 0.69905581, 0.57375342], [0.60279818, 0.73444367, 0.85172308, 0.9401382 , 0.98773022, 0.98773022, 0.9401382 , 0.85172308, 0.73444367, 0.60279818], [0.60279818, 0.73444367, 0.85172308, 0.9401382 , 0.98773022, 0.98773022, 0.9401382 , 0.85172308, 0.73444367, 0.60279818], [0.57375342, 0.69905581, 0.81068432, 0.89483932, 0.9401382 , 0.9401382 , 0.89483932, 0.81068432, 0.69905581, 0.57375342], [0.51979489, 0.63331324, 0.73444367, 0.81068432, 0.85172308, 0.85172308, 0.81068432, 0.73444367, 0.63331324, 0.51979489], [0.44822088, 0.54610814, 0.63331324, 0.69905581, 0.73444367, 0.73444367, 0.69905581, 0.63331324, 0.54610814, 0.44822088], [0.36787944, 0.44822088, 0.51979489, 0.57375342, 0.60279818, 0.60279818, 0.57375342, 0.51979489, 0.44822088, 0.36787944]])
57. 随机地在二维数组中放置p个元素¶
n = 5
p = 3
z = np.zeros((n, n))
np.put(z, np.random.choice(range(n*n), p), 1)
z
array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 1.], [0., 0., 0., 0., 0.], [0., 0., 0., 1., 0.], [1., 0., 0., 0., 0.]])
58. 矩阵每行进行中心化(减去均值)¶
# 1, 答案的方法
z = np.arange(10).reshape(2, 5)
z
array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
z_new = z - z.mean(axis=1, keepdims=True)
z_new
array([[-2., -1., 0., 1., 2.], [-2., -1., 0., 1., 2.]])
注意这里,设置keepdims
可以方便进行broadcasting
, 免去手动reshape
的流程。
我们也可以考虑对矩阵每一行应用一个中心化的函数来完成任务。
# 2, apply方法
def centered(xs):
return xs - xs.mean()
z_new = np.apply_along_axis(centered, 1, z)
z_new
array([[-2., -1., 0., 1., 2.], [-2., -1., 0., 1., 2.]])
59. 根据某列数据来排列数组¶
z = np.random.randint(0, 10, (3, 3))
z
array([[3, 5, 8], [3, 0, 3], [3, 5, 9]])
# 根据第二列顺序排列
z[z[:, 1].argsort(), ]
array([[3, 0, 3], [3, 5, 8], [3, 5, 9]])
60. 判断二维数组是否含有空列(全为0)¶
z = np.random.randint(0, 3, (3, 10))
z
array([[0, 2, 2, 2, 2, 2, 0, 2, 0, 0], [2, 0, 0, 2, 2, 1, 1, 2, 1, 0], [0, 2, 0, 1, 0, 1, 2, 1, 0, 2]])
print((~z.any(axis=0)).any())
False
一旦有空列的时候, z.any(axis=0)
返回False
, 即~z.any(axis=0)
返回True
, 之后再应用any
, 则比返回True
。 反之,若无任何空列,~z.any(axis=0)
全部返回False
, 应用any
,依旧返回False
。
61. 给定一个数,在数组中找出距离其最近的数¶
与50题重复.