Oh-Numpy
1. 导入Numpy¶
import numpy as np
2. 打印Numpy版本号及其配置¶
np.__version__
'2.2.2'
np.show_config()
Build Dependencies: blas: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.28 lapack: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.13.0/lib/python3.13/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.28 Compilers: c: commands: cc linker: ld.bfd name: gcc version: 10.2.1 c++: commands: c++ linker: ld.bfd name: gcc version: 10.2.1 cython: commands: cython linker: cython name: cython version: 3.0.11 Machine Information: build: cpu: x86_64 endian: little family: x86_64 system: linux host: cpu: x86_64 endian: little family: x86_64 system: linux Python Information: path: /tmp/build-env-6dq_gol7/bin/python version: '3.13' SIMD Extensions: baseline: - SSE - SSE2 - SSE3 found: - SSSE3 - SSE41 - POPCNT - SSE42 - AVX - F16C - FMA3 - AVX2 not found: - AVX512F - AVX512CD - AVX512_KNL - AVX512_KNM - AVX512_SKX - AVX512_CLX - AVX512_CNL - AVX512_ICL
3. 创建一个长度(size)为10的向量(vector)¶
注意这里就是创建一个np.ndarray
z = np.zeros(10)
z
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
4. 计算数组的内存大小¶
z = np.zeros((10, 10))
print("%d bytes" % (z.size * z.itemsize))
800 bytes
注意这里就是简单的把总元素个数z.size
(100), 乘上单个元素所占的内存z.itemsize
(8 bytes).
5. 在命令行打印出Numpy中add函数的帮助文档信息¶
这里主要是有关命令行调用Python的问题,我们可以在命令行从python --help
开始, 找到python -c
符合我们的要求,所以这里只需要在命令行执行python -c "import numpy; numpy.info(numpy.add)"
即可。就等同于在Python解释器中执行如下程序:
import numpy
numpy.info(numpy.add)
此外,针对Numpy,我们可以有很多种方式查看文档:上面的np.info(np.add)
, 以及利用help的help(np.add)
, 和比较少用的doc方法的调用print(np.add.__doc__)
(输出和np.info
一致), 一般来说用np.info
就可以了,也比较方便。如果使用IPython
的话,可以直接np.add?
回车
6. 创建一个长度为10的向量,其第五个值为1, 其他为0¶
z = np.zeros(10)
z[4] = 1
z
array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])
7. 创建一个包含从10到49所有整数的向量¶
z = np.arange(10, 50)
z
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
注意Python也有内建函数range
具有相似的功能,相对而言,Numpy的arange
由于使用了内存优化技术,其效率要高很多。我们可以做个简单的水平对比。例子来自Scipy Lecture Notes
%timeit [i**2 for i in range(1000)]
36.6 μs ± 266 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit a = np.arange(1000) ** 2
1.23 μs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
8. 反转一个向量(逆序)¶
z = np.arange(10)
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
z = z[::-1]
z
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
9. 创建一个3x3的矩阵, 包含数字0到8¶
z = np.arange(0, 9).reshape(3, 3)
z
array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
这里可以在原题目的基础上进行拓展,上面的实现中0到8是可以看作以行为顺序,如我我们希望0到8以列为顺序排列呢?
# 列为顺序
z = np.arange(0, 9).reshape(3, 3).T
z
array([[0, 3, 6], [1, 4, 7], [2, 5, 8]])
其实只需要把原来的矩阵转置就可以了:-)
10. 找出[1,2,0,0,4,0] 中非0数字的位置¶
z = np.array([1,2,0,0,4,0])
z.nonzero()
(array([0, 1, 4]),)
11. 创建3x3的单位矩阵¶
z = np.eye(3)
z
array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
12. 创建3x3x3数组,以随机数字填充¶
z = np.random.random((3, 3, 3))
z
array([[[0.10123423, 0.21966112, 0.18899876], [0.11908867, 0.37073854, 0.131811 ], [0.36358605, 0.14836074, 0.09695158]], [[0.28505722, 0.0198312 , 0.82655425], [0.29247706, 0.92982139, 0.96068727], [0.34935797, 0.98729162, 0.57438858]], [[0.0751635 , 0.05223692, 0.92594158], [0.23688837, 0.21482642, 0.66818089], [0.06488248, 0.73560372, 0.68136474]]])
13. 创建10x10数组,以随机数字填充,并找出其中的最大值和最小值¶
z = np.random.random((10, 10))
z
array([[0.44113 , 0.3808509 , 0.02746347, 0.96999726, 0.60104135, 0.81909443, 0.46735998, 0.27213177, 0.30070838, 0.45319489], [0.56033826, 0.49942057, 0.07625198, 0.1580878 , 0.23466143, 0.24319792, 0.18267022, 0.9957254 , 0.09175414, 0.58720205], [0.83122233, 0.85223567, 0.68272392, 0.47054804, 0.1430139 , 0.63418961, 0.8363594 , 0.6349586 , 0.14426594, 0.82809394], [0.39554447, 0.70009108, 0.27523732, 0.46250024, 0.34815483, 0.08845787, 0.9116974 , 0.32023201, 0.68993804, 0.0167846 ], [0.7360231 , 0.94009083, 0.08273843, 0.06748668, 0.3254564 , 0.62978564, 0.41144829, 0.76430348, 0.31722187, 0.24191262], [0.42281886, 0.74644283, 0.54280632, 0.1105537 , 0.8243719 , 0.05425636, 0.28906517, 0.72102695, 0.02699448, 0.843014 ], [0.00782847, 0.35312571, 0.40465317, 0.63028924, 0.9963226 , 0.42614551, 0.54660927, 0.38908202, 0.82726585, 0.07093775], [0.01410549, 0.23904577, 0.54863807, 0.75826918, 0.57869667, 0.50594964, 0.97115857, 0.6493405 , 0.49316914, 0.59393411], [0.91357726, 0.94371939, 0.16684762, 0.02326755, 0.24744347, 0.85828464, 0.73675366, 0.09264827, 0.23496345, 0.94535806], [0.89210238, 0.20623822, 0.95908579, 0.21091583, 0.5893621 , 0.44828595, 0.11474124, 0.64756439, 0.35082326, 0.40222248]])
z.max(), z.min()
(np.float64(0.996322599906688), np.float64(0.007828471058030417))
14. 创建长度为10的随机向量,并计算其均值¶
z = np.random.random((10))
z
array([0.47956777, 0.6263883 , 0.67986127, 0.58973534, 0.87198368, 0.75330813, 0.68956554, 0.60547743, 0.05354576, 0.8230607 ])
z.mean()
np.float64(0.6172493912958619)
15. 创建一个二维数组,其边界值为1,内部值为0¶
z = np.ones((5, 5))
z[1:-1, 1:-1] = 0
z
array([[1., 1., 1., 1., 1.], [1., 0., 0., 0., 1.], [1., 0., 0., 0., 1.], [1., 0., 0., 0., 1.], [1., 1., 1., 1., 1.]])
16. 将现有的数组(nxn)用0组成的边界包裹¶
z = np.ones((5, 5))
z
array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
m = np.pad(z, (1, 1), mode='constant', constant_values=0)
m
array([[0., 0., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 1., 1., 1., 1., 1., 0.], [0., 0., 0., 0., 0., 0., 0.]])
17. 下列表达式的结果是什么¶
0 * np.nan
np.nan == np.nan
np.inf > np.nan
np.nan - np.nan
np.nan in set([np.nan])
0.3 == 3 * 0.1
0 * np.nan
nan
有np.nan
参与的算术操作返回均为np.nan
np.nan == np.nan
False
这里是合理的,比如我们从数据集读出两列数据全部是np.nan
, 如果上面的表达式设计为返回True
,
那么我们在完全不知道两列数据的情况下就判定二者是相等的,这显然是不合理的。所以这里返回的是False
.
np.inf > np.nan
False
同样地,我们不能比较无穷大与缺失值的大小
np.nan in set([np.nan])
True
0.3 == 3 * 0.1
False
由于浮点数(float
)运算存在误差,我们不能直接比较其大小。Numpy为我们提供了np.allclose
函数来比较浮点数之间的近似相等。此外,此函数还支持np.ndarray
的比较。
np.allclose(0.3, 3 * 0.1)
True
18. 创建一个5x5的矩阵,其中1,2,3,4正好在矩阵对角线元素下方¶
z = np.diag(np.arange(1, 5), k=-1)
z
array([[0, 0, 0, 0, 0], [1, 0, 0, 0, 0], [0, 2, 0, 0, 0], [0, 0, 3, 0, 0], [0, 0, 0, 4, 0]])
19. 创建一个8x8的矩阵,并用0,1标记为国际象棋棋盘的形式¶
如下所示, 黑色部分标记为1.
z = np.zeros((8, 8))
z[1::2, ::2] = 1 # 第2, 4, 6, 8行填充
z[::2, 1::2] = 1 # 第1, 3, 5, 7行填充
z
array([[0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.], [0., 1., 0., 1., 0., 1., 0., 1.], [1., 0., 1., 0., 1., 0., 1., 0.]])
20. 现有维度为6x7x8的数组,找出其中第100个元素的索引(x, y, z)¶
print(np.unravel_index(99, (6, 7, 8)))
(np.int64(1), np.int64(5), np.int64(3))
上面的是给出的答案,一开始我并不知道这个函数,采用了下面的方法,可以作为参考。
z = np.arange(6*7*8).reshape(6, 7, 8)
np.where(z == 99)
(array([1]), array([5]), array([3]))
这是通过Numpy找出来具体位置,但是具体计算的方法并未给出,这里简单解释下。
首先,我们可以形象第考虑“数组的维度越往后,对应数据的颗粒度越小”,也就是说,在上面的例子中,我们可以认为6x7x8的立方体是通过如下的方法来构建的:先将所有的一列值(6 * 7 * 8)排成一行,之后每8个组成一个“长条”, 这样就有6*7个长条;之后将每7个长条,上下拼接,铺成一个平面;这样我们就有6个平面,将这6个平面堆起来,就得到了我们最终的“立方体”。
那么第100个元素又在哪里呢?为方便起见,我们从“颗粒度”大的开始,依次定位其位置。首先,可以知道每一层含有7*8=56个元素,所以由100 // 56 = 1
得其位于第二层, 对应到该维度得到索引就是1,即返回的array[1]
。之后在第二层中继续定位, 去除第一层的56个元素,这里还剩下44个。又由于平面为7x8的,所以由44 // 8 = 5
得其位于第6行,对应该维度的索引是5, 即返回的array[5]
, 最后剩下4个元素在新的一行,对应维度的索引为3, 即返回的array[3]
.由此得到最终的索引为(1, 5, 3)
21. 用tile
函数创建一个8x8的棋盘¶
unit = np.array([[0, 1], [1, 0]])
z = np.tile(unit, (4, 4))
z
array([[0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1, 0]])
tile
的原意就是铺瓷砖,是其作用的一个形象的比喻,这里我们把8x8的棋盘划分为4x4=16块“瓷砖”(这里的unit
), 之后将其平铺在一起即可。
22. 标准化一个5x5的随机矩阵¶
z = np.random.random((5, 5))
z = (z - z.mean()) / z.std()
z
array([[-0.70910491, 0.48982157, 1.3361218 , 1.02789447, 0.3519378 ], [-1.62182557, -1.47935268, 1.11603484, 0.63989904, 1.26060013], [ 1.1193641 , -1.26279274, -1.39294185, 0.3160582 , -1.38386977], [ 0.57000585, -1.36694524, 1.28394106, -0.39278706, -0.84427307], [ 0.64713634, 0.83918882, -0.75341648, 0.15040405, 0.05890133]])
23. 创建一个自定义的包含四个无符号字节(RGBA)的dtype
来描述颜色¶
color = np.dtype([("r", np.ubyte, 1),
("g", np.ubyte, 1),
("b", np.ubyte, 1),
("a", np.ubyte, 1)])
color
dtype([('r', 'u1', (1,)), ('g', 'u1', (1,)), ('b', 'u1', (1,)), ('a', 'u1', (1,))])
24. 5x3的矩阵与3x2的矩阵相乘¶
z = np.dot(np.ones((5, 3)), np.ones((3, 2)))
z
array([[3., 3.], [3., 3.], [3., 3.], [3., 3.], [3., 3.]])
# 也可以使用操作符 @
z = np.ones((5, 3)) @ np.ones((3, 2))
z
array([[3., 3.], [3., 3.], [3., 3.], [3., 3.], [3., 3.]])
25. 给定一个一维数组,将值在3和8之间的数字变为其负数¶
z = np.arange(10)
z[(z > 3) & (z < 8)] *= -1
z
array([ 0, 1, 2, 3, -4, -5, -6, -7, 8, 9])
26. 下面脚本的输出是什么¶
# Author: Jake VanderPlas
print(sum(range(5),-1))
from numpy import *
print(sum(range(5),-1))
sum(range(5), -1)
9
这里是使用Python内置的sum
函数, 它把所有的参数都当作求和的一部分相加, 这里就是简单地将所有的数字相加,10 - 1 = 9
np.sum(range(5), -1)
np.int64(10)
这里使用的是Numpy中的np.sum
, 这里的-1
并非待加的数字,而是另外一个参数的值,代表多维数组在求和时各个轴求和的顺序。具体可以help(np.sum)
27. z
是整数组成的向量,判断下列表达式是否正确¶
z = np.arange(10)
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# 1
z ** z
array([ 1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489])
前面提到过, 对于数组之间的**
等算术运算, 是元素一一对应进行运算的(element-wise), 如这里387420489,就等于9**9
# 2
2 << z >> 2
array([ 0, 1, 2, 4, 8, 16, 32, 64, 128, 256])
本质上进行两次移位运算,也就是等于(2 << z) >> 2
.下面将其拆开来看。
2 << z
array([ 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024])
就是将2分别左移0, 1, 2, ..., 9位,得到的就是2 << 0, 2<<1, ..., 2<<9
,如下所示:
part1 = [2 << i for i in range(10)]
part1
[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
可以看到与2 << z
的输出是一致的。
之后就是进行右移位操作,不同之处在于,这里是对于数组2 << z
中的每个元素进行右移位,分别右移2个位置。每个数值右移2代表对每个值x
,取x // 4
, 对part1
继续处理
[i // 4 for i in part1]
[0, 1, 2, 4, 8, 16, 32, 64, 128, 256]
可以看到其与2 << z >> 2
的输出是一致的。
# 3
z <- z
array([False, False, False, False, False, False, False, False, False, False])
这里涉及的主要是优先级的问题,随便找个操作符,如<
,通过help("<")
即可查看所有操作符的优先级,默认是从低优先级到高优先级。可以看到,-1
相比<
具有更高的优先级, 所以这里就等同于z < (-z)
,测试如下
z < (-z)
array([False, False, False, False, False, False, False, False, False, False])
# 4
1j * z
array([0.+0.j, 0.+1.j, 0.+2.j, 0.+3.j, 0.+4.j, 0.+5.j, 0.+6.j, 0.+7.j, 0.+8.j, 0.+9.j])
对复数的运算的支持
# 5
z/1/1
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
也就是(z/1)/1
# 6
try:
z < z > z
except Exception as e:
print(e)
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
这里参考了下Python表达式的文档,和stackoverflow 找到
Formally, if
a, b, c, …, y, z
are expressions andop1, op2, …, opN
are comparison operators, then aop1 b op2 c ... y opN z
is equivalent toa op1 b and b op2 c and ... y opN z
, except that each expression is evaluated at most once.
就是说,在一个表达式里面进行连续比较的时候,如x < y <= z
,首先是符合语法的,其等同于x < y adn y <=z
, 只不过对于重复的元素(这里的y
)只估计一次。所以我们在Python原生的list
中进行上述z
的运算是可以正常返回的,代码如下。
l1 = [1, 2]
l1 < l1 > l1
False
(l1 < l1) and (l1 > l1)
False
因为l1 < l1
和l1 > l1
全部是False
,所以其and
也是False
.但是对于我们的z
,也就是np.ndarray
类型的数据,情况就有所不同。
这里z < z > z
依旧是估计为z < z and z > z
, 其中z < z
与z > z
都是可以正常返回的,且结果都是一个长度为z.size()
的array
, 元素全部是布尔值。
z < z
array([False, False, False, False, False, False, False, False, False, False])
z > z
array([False, False, False, False, False, False, False, False, False, False])
不可行的是两者之间的and
。 因为在进行and
操作时,Numpy无法确切地知道形如array([False, False, ...])
的数组到底是估计为False
,还是True
, 因为这里有两种方法来定义一个数组的布尔值:其一是all
,即所有的元素全是True
才判定为True
,否则为False
; 另外一种方法是any
, 即只要数组中有一个True
,我们就判定其为True
, 否则判定为False
. 正是这种不确定性使得Numpy报错,并建议使用any
或者all
.
28. 下列表达式的结果是什么¶
np.array(0) / np.array(0)
np.array(0) // np.array(0)
np.array([np.nan]).astype(int).astype(float)
np.array(0) / np.array(0)
/tmp/ipykernel_73994/873513115.py:1: RuntimeWarning: invalid value encountered in divide np.array(0) / np.array(0)
np.float64(nan)
返回nan
并带有警告说在进行真除(true_divide)的时候出现问题,即0做分母。
np.array(0) // np.array(0)
/tmp/ipykernel_73994/2018018105.py:1: RuntimeWarning: divide by zero encountered in floor_divide np.array(0) // np.array(0)
np.int64(0)
返回0, 并带有警告说在进行地板除(floor_divide)的时候出现问题,即0做分母。
np.array([np.nan]).astype(int).astype(float)
/tmp/ipykernel_73994/699728972.py:1: RuntimeWarning: invalid value encountered in cast np.array([np.nan]).astype(int).astype(float)
array([-9.22337204e+18])
29. 舍入浮点数数组,使其尽可能远离0¶
即-0.3, -0.5, -0.6等近似为-1,而非0; 0.3, 0.5, 0.6等近似为1,而非0.
z = np.random.uniform(-10, 10, 10)
z
array([ 2.72380024, -5.97068664, 1.02848316, -6.37947006, -8.47874052, 5.51427231, -9.2120279 , -6.1678368 , -9.5128257 , 8.92989504])
np.copysign(np.ceil(np.abs(z)), z)
array([ 3., -6., 2., -7., -9., 6., -10., -7., -10., 9.])
30. 找到两个数组中相同的元素¶
~我们首先考虑内置的函数,但是我们不知道是否有类似的函数,所以我们可以灵活使用np.lookfor
来找出我们要的函数.~
np.lookfor
was removed in the NumPy 2.0 release. Search NumPy's documentation directly.
np.info(np.intersect1d)
Find the intersection of two arrays. Return the sorted, unique values that are in both of the input arrays. Parameters ---------- ar1, ar2 : array_like Input arrays. Will be flattened if not already 1D. assume_unique : bool If True, the input arrays are both assumed to be unique, which can speed up the calculation. If True but ``ar1`` or ``ar2`` are not unique, incorrect results and out-of-bounds indices could result. Default is False. return_indices : bool If True, the indices which correspond to the intersection of the two arrays are returned. The first instance of a value is used if there are multiple. Default is False. Returns ------- intersect1d : ndarray Sorted 1D array of common and unique elements. comm1 : ndarray The indices of the first occurrences of the common values in `ar1`. Only provided if `return_indices` is True. comm2 : ndarray The indices of the first occurrences of the common values in `ar2`. Only provided if `return_indices` is True. Examples -------- >>> import numpy as np >>> np.intersect1d([1, 3, 4, 3], [3, 1, 2, 1]) array([1, 3]) To intersect more than two arrays, use functools.reduce: >>> from functools import reduce >>> reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2])) array([3]) To return the indices of the values common to the input arrays along with the intersected values: >>> x = np.array([1, 1, 2, 3, 4]) >>> y = np.array([2, 1, 4, 6]) >>> xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True) >>> x_ind, y_ind (array([0, 2, 4]), array([1, 0, 2])) >>> xy, x[x_ind], y[y_ind] (array([1, 2, 4]), array([1, 2, 4]), array([1, 2, 4]))
根据文档就可以直接使用了。
z1 = np.arange(-5, 5)
z2 = np.arange(10)
np.intersect1d(z1, z2)
array([0, 1, 2, 3, 4])
31. 如何忽视所有Numpy的警告(不推荐)¶
# 自杀模式启动:-)
defaults = np.seterr(all="ignore")
Z = np.ones(1) / 0
# 恢复理智
_ = np.seterr(**defaults)
# 也可以定义错误处理的细节
with np.errstate(divide='warn'):
Z = np.ones(1) / 0
/tmp/ipykernel_73994/1125215274.py:3: RuntimeWarning: divide by zero encountered in divide Z = np.ones(1) / 0
32. 下面的表达式会返回True
吗¶
np.sqrt(-1) == np.emath.sqrt(-1)
np.sqrt(-1) == np.emath.sqrt(-1)
/tmp/ipykernel_73994/244602691.py:1: RuntimeWarning: invalid value encountered in sqrt np.sqrt(-1) == np.emath.sqrt(-1)
np.False_
np.sqrt(-1), np.emath.sqrt(-1)
/tmp/ipykernel_73994/178168484.py:1: RuntimeWarning: invalid value encountered in sqrt np.sqrt(-1), np.emath.sqrt(-1)
(np.float64(nan), np.complex128(1j))
33. 如何获取昨天,今天,明天的日期¶
yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
today = np.datetime64('today', 'D')
tomorrow = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
yesterday, today, tomorrow
(np.datetime64('2025-01-22'), np.datetime64('2025-01-23'), np.datetime64('2025-01-24'))
34. 如何获取2016年7月全部31天的日期¶
z = np.arange('2016-07', '2016-08', dtype='datetime64[D]')
z
array(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04', '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08', '2016-07-09', '2016-07-10', '2016-07-11', '2016-07-12', '2016-07-13', '2016-07-14', '2016-07-15', '2016-07-16', '2016-07-17', '2016-07-18', '2016-07-19', '2016-07-20', '2016-07-21', '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25', '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29', '2016-07-30', '2016-07-31'], dtype='datetime64[D]')
35. 如何以替换的方式(in place)计算$((A+B)*(-A/2))$(不通过复制)¶
A = np.ones(3)*1
B = np.ones(3)*2
C = np.ones(3)*3
np.add(A,B,out=B)
np.divide(A,2,out=A)
np.negative(A,out=A)
np.multiply(A,B,out=A)
array([-1.5, -1.5, -1.5])
36. 用5种方法提取随机数组中的整数部分¶
z = np.random.uniform(0, 10, 10)
z
array([8.02607027, 7.46437046, 7.66450121, 3.82822738, 5.41068636, 3.32256939, 9.27552892, 5.33239052, 7.91256662, 7.92753186])
# 1
z - z % 1
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 2
np.floor(z)
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 3
np.ceil(z) - 1
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
# 4
z.astype(int)
array([8, 7, 7, 3, 5, 3, 9, 5, 7, 7])
# 5
np.trunc(z)
array([8., 7., 7., 3., 5., 3., 9., 5., 7., 7.])
37. 创建一个5x5的矩阵,每行均为0到4¶
# 1,答案的方法
z = np.zeros((5, 5))
z += np.arange(5)
z
array([[0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.], [0., 1., 2., 3., 4.]])
# 2,利用tile
z = np.tile(np.arange(5), (5, 1))
z
array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]])
38. 现有一个可以生成10个整数的生成器函数,利用其建立一个数组¶
# 1, 答案的方法
def gen():
for i in range(10):
yield i
z = np.fromiter(gen(), dtype=float, count=-1)
z
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
# 2, 列表解析
z = np.array([i for i in gen()])
z
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
39. 创建一个长度为10,范围从0到1的向量(不包括0,1)¶
z = np.linspace(0, 1, 11, endpoint=False)[1:]
z
array([0.09090909, 0.18181818, 0.27272727, 0.36363636, 0.45454545, 0.54545455, 0.63636364, 0.72727273, 0.81818182, 0.90909091])
40. 创建一个长度为10的随机数组并排序¶
z = np.random.random(10)
z
array([0.65281861, 0.32809737, 0.3265718 , 0.56394927, 0.65133119, 0.33415774, 0.12370651, 0.80815425, 0.25217611, 0.77599015])
z.sort()
z
array([0.12370651, 0.25217611, 0.3265718 , 0.32809737, 0.33415774, 0.56394927, 0.65133119, 0.65281861, 0.77599015, 0.80815425])
41. 对于长度较小的数组,如何更高效地求和(相对np.sum
)¶
z = np.arange(10)
np.add.reduce(z)
np.int64(45)
%timeit np.add.reduce(z)
800 ns ± 3.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%timeit np.sum(z)
1.6 μs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
可以看到np.add.reduce
此时差不多快上一倍
42. 检查两个数组A, B是否相等¶
A = np.random.randint(0,2,5)
B = np.random.randint(0,2,5)
# 1, 已知A,B的shape相等
# 存在容错,适用于浮点数的比较
np.allclose(A, B)
True
# 2. 同时检查shape与数值
# 要求数值完全相等
np.array_equal(A, B)
True
43. 限制数组为不可变数组(read only)¶
z = np.zeros(10)
z.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
z.flags.writeable = False
z.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : False ALIGNED : True WRITEBACKIFCOPY : False
try:
z[0] = 1
except Exception as e:
print(e)
assignment destination is read-only
44. 给定10x2矩阵代表平面座标系中座标,将其转化为极座标系座标¶
z = np.random.random((10, 2))
x, y = z[:, 0], z[:, 1]
r = np.sqrt(x**2 + y**2)
theta = np.arctan2(y, x)
r, theta
(array([0.33820331, 1.31494561, 0.69826993, 0.65504453, 1.16197209, 0.76539786, 0.79805555, 1.24695648, 0.53250039, 0.57669093]), array([0.47338591, 0.80078892, 1.27664152, 1.31267719, 1.00810859, 0.39820197, 0.12497058, 0.76005997, 0.32861008, 1.12416952]))
45. 创建一个长度为10的随机向量,并将其中最大的数改为0¶
z = np.random.random(10)
z
array([0.00997928, 0.95496242, 0.63923037, 0.27144055, 0.19121816, 0.72670271, 0.63900711, 0.95916728, 0.52407859, 0.25870044])
z[z.argmax()] = 0
z
array([0.00997928, 0.95496242, 0.63923037, 0.27144055, 0.19121816, 0.72670271, 0.63900711, 0. , 0.52407859, 0.25870044])
46. 创建一个结构化的数组,其元素为x轴,y轴的座标,并覆盖[0,1]x[0, 1]¶
z = np.zeros((5, 5), [('x', float), ('y', float)])
z = np.meshgrid(np.linspace(0, 1, 5),
np.linspace(0, 1, 5))
z
(array([[0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ], [0. , 0.25, 0.5 , 0.75, 1. ]]), array([[0. , 0. , 0. , 0. , 0. ], [0.25, 0.25, 0.25, 0.25, 0.25], [0.5 , 0.5 , 0.5 , 0.5 , 0.5 ], [0.75, 0.75, 0.75, 0.75, 0.75], [1. , 1. , 1. , 1. , 1. ]]))
47. 给定两个数组X,Y, 计算其柯西矩阵C(Cauchy Matrix)并求其行列式¶
$$C_{ij} = \frac{1}{x_i-y_j}$$
x = np.arange(8)
y = x + 0.5
C = 1 / np.subtract.outer(x, y)
C
array([[-2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615, -0.13333333], [ 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615], [ 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818], [ 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222], [ 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429], [ 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 ], [ 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667], [ 0.15384615, 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. ]])
np.linalg.det(C)
np.float64(3638.163637117973)
其实这里np.subtract.outer
就等于进行了broadcast,我们也可以像下面这样写。
C_test = 1 / (x.reshape(8, 1)- y.reshape(1, 8))
C_test
array([[-2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615, -0.13333333], [ 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818, -0.15384615], [ 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222, -0.18181818], [ 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429, -0.22222222], [ 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 , -0.28571429], [ 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667, -0.4 ], [ 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. , -0.66666667], [ 0.15384615, 0.18181818, 0.22222222, 0.28571429, 0.4 , 0.66666667, 2. , -2. ]])
np.linalg.det(C_test)
np.float64(3638.163637117973)
# 测试两种方法返回的C是否相同
np.array_equal(C, C_test)
True
48. 打印Numpy所有标量类型(scalar type)可表示的最值¶
for dtype in [np.int8, np.int16, np.int32, np.int64]:
info = np.iinfo(dtype)
print(f"{dtype}: min={info.min}, max={info.max}")
for dtype in [np.float16, np.float32, np.float64,]:
info = np.finfo(dtype)
print(f"{dtype}: min={info.min}, max={info.max}")
<class 'numpy.int8'>: min=-128, max=127 <class 'numpy.int16'>: min=-32768, max=32767 <class 'numpy.int32'>: min=-2147483648, max=2147483647 <class 'numpy.int64'>: min=-9223372036854775808, max=9223372036854775807 <class 'numpy.float16'>: min=-65504.0, max=65504.0 <class 'numpy.float32'>: min=-3.4028234663852886e+38, max=3.4028234663852886e+38 <class 'numpy.float64'>: min=-1.7976931348623157e+308, max=1.7976931348623157e+308
49. 打印数组所有元素(不省略)¶
with np.printoptions(threshold=np.inf):
z = np.ones((10, 10))
print(z)
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
50. 给定一个数,在数组中找出距离其最近的数¶
# 给定的数组
z = np.random.uniform(0, 1, 10)
z
array([0.97025996, 0.36085926, 0.09509649, 0.43063634, 0.85445517, 0.44517651, 0.9316781 , 0.47078064, 0.80017135, 0.17822901])
# 给定的数
x = 0.5
# 定位距离最近数的位置
index = np.abs(z - x).argmin()
# 找到该数字
z[index]
np.float64(0.47078064496366556)
z = np.zeros(10, [('position', [('x', float, 1),
('y', float, 1)]),
('color', [('r', float, 1),
('g', float, 1),
('b', float, 1)])
]
)
z
array([(([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.])), (([0.], [0.]), ([0.], [0.], [0.]))], dtype=[('position', [('x', '<f8', (1,)), ('y', '<f8', (1,))]), ('color', [('r', '<f8', (1,)), ('g', '<f8', (1,)), ('b', '<f8', (1,))])])
52. 考虑一个形状为(10, 2)的随机向量,若其代表二维平面中的点,求各点之间的距离¶
z = np.random.random((10, 2))
x, y = np.atleast_2d(z[:, 0], z[:, 1])
d = np.sqrt((x - x.T)**2 + (y - y.T)**2)
d
array([[0. , 0.36156554, 0.25792355, 0.69852624, 0.06046602, 0.55974987, 0.44745959, 0.51535957, 0.91050336, 0.68792257], [0.36156554, 0. , 0.51496789, 0.34198684, 0.31153682, 0.57121176, 0.69180151, 0.34077339, 0.74043802, 0.80160021], [0.25792355, 0.51496789, 0. , 0.80367147, 0.25447245, 0.39136883, 0.19261297, 0.49162044, 0.81302103, 0.44748816], [0.69852624, 0.34198684, 0.80367147, 0. , 0.64409442, 0.69889723, 0.95232077, 0.40096981, 0.64195424, 0.96127519], [0.06046602, 0.31153682, 0.25447245, 0.64409442, 0. , 0.51617982, 0.44708542, 0.45590656, 0.85391212, 0.66273654], [0.55974987, 0.57121176, 0.39136883, 0.69889723, 0.51617982, 0. , 0.39188871, 0.29794784, 0.44238147, 0.26334318], [0.44745959, 0.69180151, 0.19261297, 0.95232077, 0.44708542, 0.39188871, 0. , 0.59452006, 0.83401915, 0.32480842], [0.51535957, 0.34077339, 0.49162044, 0.40096981, 0.45590656, 0.29794784, 0.59452006, 0. , 0.4130361 , 0.56045551], [0.91050336, 0.74043802, 0.81302103, 0.64195424, 0.85391212, 0.44238147, 0.83401915, 0.4130361 , 0. , 0.62292175], [0.68792257, 0.80160021, 0.44748816, 0.96127519, 0.66273654, 0.26334318, 0.32480842, 0.56045551, 0.62292175, 0. ]])
这里使用np.atleast_2d
使得我们得到的x
, y
直接就是2维的数组,方便了我们后面直接使用broadcasting
.我们也可以采用下面的方法代替这行,但是不够简洁:
x = z[:, 0].reshape(10, 1)
y = z[:, 1].reshape(1, 10)
此外我们也可以使用scipy
内置的函数,其效率要高一些。
import scipy
import scipy.spatial
d = scipy.spatial.distance.cdist(z, z)
d
array([[0. , 0.36156554, 0.25792355, 0.69852624, 0.06046602, 0.55974987, 0.44745959, 0.51535957, 0.91050336, 0.68792257], [0.36156554, 0. , 0.51496789, 0.34198684, 0.31153682, 0.57121176, 0.69180151, 0.34077339, 0.74043802, 0.80160021], [0.25792355, 0.51496789, 0. , 0.80367147, 0.25447245, 0.39136883, 0.19261297, 0.49162044, 0.81302103, 0.44748816], [0.69852624, 0.34198684, 0.80367147, 0. , 0.64409442, 0.69889723, 0.95232077, 0.40096981, 0.64195424, 0.96127519], [0.06046602, 0.31153682, 0.25447245, 0.64409442, 0. , 0.51617982, 0.44708542, 0.45590656, 0.85391212, 0.66273654], [0.55974987, 0.57121176, 0.39136883, 0.69889723, 0.51617982, 0. , 0.39188871, 0.29794784, 0.44238147, 0.26334318], [0.44745959, 0.69180151, 0.19261297, 0.95232077, 0.44708542, 0.39188871, 0. , 0.59452006, 0.83401915, 0.32480842], [0.51535957, 0.34077339, 0.49162044, 0.40096981, 0.45590656, 0.29794784, 0.59452006, 0. , 0.4130361 , 0.56045551], [0.91050336, 0.74043802, 0.81302103, 0.64195424, 0.85391212, 0.44238147, 0.83401915, 0.4130361 , 0. , 0.62292175], [0.68792257, 0.80160021, 0.44748816, 0.96127519, 0.66273654, 0.26334318, 0.32480842, 0.56045551, 0.62292175, 0. ]])
53. 将一个32位的浮点数数组,(不使用额外内存)转化为32为的整数数组¶
z = np.zeros(10, dtype=np.float32)
z
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
z = z.astype(np.int32, copy=False)
z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
54. 如何读取下面的文件¶
1, 2, 3, 4, 5
6, , , 7, 8
, , 9,10,11
from io import StringIO
# “假”的文件
s = StringIO("""1, 2, 3, 4, 5\n
6, , , 7, 8\n
, , 9,10,11\n""")
z = np.genfromtxt(s, delimiter=",", missing_values=' ')
z
array([[ 1., 2., 3., 4., 5.], [ 6., nan, nan, 7., 8.], [nan, nan, 9., 10., 11.]])
55. Python内置有enumerate
,Numpy中与之对应的是?¶
Z = np.arange(9).reshape(3,3)
for index, value in np.ndenumerate(Z):
print(index, value)
(0, 0) 0 (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 1) 4 (1, 2) 5 (2, 0) 6 (2, 1) 7 (2, 2) 8
for index in np.ndindex(Z.shape):
print(index, Z[index])
(0, 0) 0 (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 1) 4 (1, 2) 5 (2, 0) 6 (2, 1) 7 (2, 2) 8
56. 生成二维高斯分布¶
X, Y = np.meshgrid(np.linspace(-1,1,10), np.linspace(-1,1,10))
D = np.sqrt(X*X+Y*Y)
sigma, mu = 1.0, 0.0
G = np.exp(-( (D-mu)**2 / ( 2.0 * sigma**2 ) ) )
G
array([[0.36787944, 0.44822088, 0.51979489, 0.57375342, 0.60279818, 0.60279818, 0.57375342, 0.51979489, 0.44822088, 0.36787944], [0.44822088, 0.54610814, 0.63331324, 0.69905581, 0.73444367, 0.73444367, 0.69905581, 0.63331324, 0.54610814, 0.44822088], [0.51979489, 0.63331324, 0.73444367, 0.81068432, 0.85172308, 0.85172308, 0.81068432, 0.73444367, 0.63331324, 0.51979489], [0.57375342, 0.69905581, 0.81068432, 0.89483932, 0.9401382 , 0.9401382 , 0.89483932, 0.81068432, 0.69905581, 0.57375342], [0.60279818, 0.73444367, 0.85172308, 0.9401382 , 0.98773022, 0.98773022, 0.9401382 , 0.85172308, 0.73444367, 0.60279818], [0.60279818, 0.73444367, 0.85172308, 0.9401382 , 0.98773022, 0.98773022, 0.9401382 , 0.85172308, 0.73444367, 0.60279818], [0.57375342, 0.69905581, 0.81068432, 0.89483932, 0.9401382 , 0.9401382 , 0.89483932, 0.81068432, 0.69905581, 0.57375342], [0.51979489, 0.63331324, 0.73444367, 0.81068432, 0.85172308, 0.85172308, 0.81068432, 0.73444367, 0.63331324, 0.51979489], [0.44822088, 0.54610814, 0.63331324, 0.69905581, 0.73444367, 0.73444367, 0.69905581, 0.63331324, 0.54610814, 0.44822088], [0.36787944, 0.44822088, 0.51979489, 0.57375342, 0.60279818, 0.60279818, 0.57375342, 0.51979489, 0.44822088, 0.36787944]])
57. 随机地在二维数组中放置p个元素¶
n = 5
p = 3
z = np.zeros((n, n))
np.put(z, np.random.choice(range(n*n), p), 1)
z
array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 1.], [0., 0., 0., 0., 0.], [0., 0., 0., 1., 0.], [1., 0., 0., 0., 0.]])
58. 矩阵每行进行中心化(减去均值)¶
# 1, 答案的方法
z = np.arange(10).reshape(2, 5)
z
array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
z_new = z - z.mean(axis=1, keepdims=True)
z_new
array([[-2., -1., 0., 1., 2.], [-2., -1., 0., 1., 2.]])
注意这里,设置keepdims
可以方便进行broadcasting
, 免去手动reshape
的流程。
我们也可以考虑对矩阵每一行应用一个中心化的函数来完成任务。
# 2, apply方法
def centered(xs):
return xs - xs.mean()
z_new = np.apply_along_axis(centered, 1, z)
z_new
array([[-2., -1., 0., 1., 2.], [-2., -1., 0., 1., 2.]])
59. 根据某列数据来排列数组¶
z = np.random.randint(0, 10, (3, 3))
z
array([[3, 5, 8], [3, 0, 3], [3, 5, 9]])
# 根据第二列顺序排列
z[z[:, 1].argsort(), ]
array([[3, 0, 3], [3, 5, 8], [3, 5, 9]])
60. 判断二维数组是否含有空列(全为0)¶
z = np.random.randint(0, 3, (3, 10))
z
array([[0, 2, 2, 2, 2, 2, 0, 2, 0, 0], [2, 0, 0, 2, 2, 1, 1, 2, 1, 0], [0, 2, 0, 1, 0, 1, 2, 1, 0, 2]])
print((~z.any(axis=0)).any())
False
一旦有空列的时候, z.any(axis=0)
返回False
, 即~z.any(axis=0)
返回True
, 之后再应用any
, 则比返回True
。 反之,若无任何空列,~z.any(axis=0)
全部返回False
, 应用any
,依旧返回False
。
61. 给定一个数,在数组中找出距离其最近的数¶
与50题重复.
62. 考虑两个数组,形状分别是(3, 1), (1, 3),如何使用迭代器将其相加?¶
A = np.arange(3).reshape(3,1)
B = np.arange(3).reshape(1,3)
it = np.nditer([A,B,None])
for x,y,z in it:
z[...] = x + y
print(it.operands[2])
[[0 1 2] [1 2 3] [2 3 4]]
63. 创建一个带有名称属性的数组类¶
class NamedArray(np.ndarray):
def __new__(cls, array, name="no name"):
obj = np.asarray(array).view(cls)
obj.name = name
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self.info = getattr(obj, 'name', "no name")
Z = NamedArray(np.arange(10), "range_10")
print (Z.name)
range_10
64. 给定一个数值向量,和一个索引向量,根据后者的索引,在前者对应位置加1(注意重复索引)¶
# Author: Brett Olsen
Z = np.ones(10)
indices = np.random.randint(0,len(Z),20)
Z_new = Z + np.bincount(indices, minlength=len(Z))
Z_new
array([3., 3., 3., 2., 5., 2., 6., 2., 3., 1.])
# Another solution
# Author: Bartosz Telenczuk
np.add.at(Z, indices, 1)
Z
array([3., 3., 3., 2., 5., 2., 6., 2., 3., 1.])
65. 根据索引列表I(indices),对数值列表X进行累加,得到F¶
X = [1,2,3,4,5,6]
indices = [1,3,9,3,4,1]
F = np.bincount(indices,X)
print(F)
[0. 7. 0. 6. 5. 0. 0. 0. 0. 3.]
这里bincount
的用法有点绕...让我们举个例子先:-)
我们把I
中出现的数字比作个人的银行账户编号,可以看到这里最大的编号为9,所以我们暂时可以只考虑编号0~9的账户情况, 这10个账户,正好对应最后得到F的10个位置。进一步地,I
与X
结合,可以看作这些银行账户交易的流水,其中I
为账户编号,X
为对应的金额。比如,因为I[0]=1
,我们知道是账户1发生交易,对应的X[0]=1
,所以账户1的金额要加1;此外账户1还发生一次交易(I[5]=1
), 对应的金额X[5] = 6
,所以这段时间账户1总的金额就是6 + 1 = 7
,所以得到F[1] = 7
.
简言之,我们的任务就是根据X
和I
组成的交易流水,来计算各个账户总的金额。
66. 给定一张照片(w, h, 3),计算其中不同颜色的个数¶
# Author: Fisher Wang
w, h = 256, 256
indices = np.random.randint(0, 4, (h, w, 3)).astype(np.ubyte)
colors = np.unique(indices.reshape(-1, 3), axis=0)
n = len(colors)
print(n)
64
67. 考虑一个四维数组,计算后两个轴上的元素和¶
A = np.random.randint(0,10,(3,4,3,4))
A.sum(axis=(-2,-1))
array([[31, 58, 39, 71], [44, 53, 57, 54], [41, 78, 45, 55]])
68. 给定向量D,根据索引数组S得到子集,计算子集上的均值¶
D = np.random.uniform(0,1,100)
S = np.random.randint(0,10,100)
D_sums = np.bincount(S, weights=D)
D_counts = np.bincount(S)
D_means = D_sums / D_counts
print(D_means)
[0.49781183 0.56667629 0.56930058 0.53154984 0.41628091 0.36431595 0.4371647 0.4536847 0.46683434 0.55721722]
结合65题给出的例子,在那里是根据流水计算各个账户总的金额,这里是计算各个账户每次交易的平均金额,也就是该账户总的金额除以其交易的次数。
69. 获取矩阵点乘(dot product)结果的对角线元素¶
# Author: Mathieu Blondel
A = np.random.uniform(0,1,(5,5))
B = np.random.uniform(0,1,(5,5))
# 慢的版本
np.diag(np.dot(A, B))
# 快的版本
np.sum(A * B.T, axis=1)
# 更快的版本
np.einsum("ij,ji->i", A, B)
array([1.78243831, 0.24610741, 0.32904767, 1.46215755, 0.31891836])
70. 如何在数组[1, 2, 3, 4, 5]每两个值的中间添加三个0¶
z = np.array([1, 2, 3, 4, 5])
nz = 3 # 0的个数
v = np.zeros(len(z) + nz*(len(z)-1))
v[::nz+1] = z
v
array([1., 0., 0., 0., 2., 0., 0., 0., 3., 0., 0., 0., 4., 0., 0., 0., 5.])
71. 维度分别为(5, 5, 3), (5, 5)的两个数组相乘¶
A = np.ones((5,5,3))
B = 2*np.ones((5,5))
print(A * B[:,:,None])
[[[2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.]] [[2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.] [2. 2. 2.]]]
72. 交换数组的两行¶
A = np.arange(25).reshape((5, 5))
A
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
# 交换第一、二两行
A[[0, 1]] = A[[1, 0]]
A
array([[ 5, 6, 7, 8, 9], [ 0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
73. 给定10个三元组描述10个三角形,找出所有边的集合¶
faces = np.random.randint(0,100,(10,3))
F = np.roll(faces.repeat(2,axis=1),-1,axis=1)
F = F.reshape(len(F)*3,2)
F = np.sort(F,axis=1)
G = F.view( dtype=[('p0',F.dtype),('p1',F.dtype)] )
G = np.unique(G)
print(G)
[( 0, 24) ( 0, 52) ( 0, 64) ( 0, 96) ( 3, 5) ( 3, 77) ( 4, 20) ( 4, 58) ( 4, 73) ( 4, 77) ( 5, 77) ( 8, 62) ( 8, 76) ( 9, 54) ( 9, 69) (17, 45) (17, 53) (20, 58) (24, 52) (45, 53) (48, 89) (48, 99) (54, 69) (62, 76) (64, 96) (68, 69) (68, 96) (69, 96) (73, 77) (89, 99)]
感觉这里的方法比较巧妙,可以将每一步拆解来理解怎么将每条边抽取出来。之后比较细节的地方就是对描述“边”的二元组排序(如果不排序,后面比较的时候就会出现(a, b)与(b, a)不是同一条边的错误判断),之后通过view
转化类型,方便比较(使用np.unique
)
74. 给定A,我们有C = np.bincount(A), 那么,给定C,如何找到对应的A?¶
C = np.bincount([1,1,2,3,4,4,6])
A = np.repeat(np.arange(len(C)), C)
print(A)
[1 1 2 3 4 4 6]
75. 用滑动窗口计算平均值¶
def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
Z = np.arange(20)
print(moving_average(Z, n=3))
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.]
76. 给定一个一维数组,组建一个二维数组,使得第一行为Z[0], Z[1], Z[2], 第二行为Z[1], Z[2], Z[3], 依此类推,最后一行为Z[-3], Z[-2], Z[-1]¶
from numpy.lib import stride_tricks
def rolling(a, window):
shape = (a.size - window + 1, window)
strides = (a.itemsize, a.itemsize)
return stride_tricks.as_strided(a, shape=shape, strides=strides)
Z = rolling(np.arange(10), 3)
print(Z)
[[0 1 2] [1 2 3] [2 3 4] [3 4 5] [4 5 6] [5 6 7] [6 7 8] [7 8 9]]
77. 如何原地对布尔值取反,如何原地改变数字的正负¶
Z = np.random.randint(0,2,100)
np.logical_not(Z, out=Z)
Z = np.random.uniform(-1.0,1.0,100)
np.negative(Z, out=Z)
array([-0.23755749, -0.55475899, 0.13235314, 0.32766585, 0.89372547, -0.51604401, 0.29658938, 0.21303257, 0.28310276, 0.63122374, 0.88897031, 0.75103678, 0.92842855, -0.6682605 , 0.56913457, 0.28311256, -0.11452513, 0.16162531, -0.63786765, 0.99869718, 0.06608717, -0.01783016, -0.58795909, 0.99166502, 0.73065402, 0.01965904, 0.38957628, 0.88040177, 0.53989097, 0.41985721, 0.31485496, 0.44470555, 0.75452553, -0.20153337, -0.31690058, -0.27618554, 0.96737028, 0.65659492, 0.13246001, 0.49902167, -0.62099293, 0.22705428, 0.84680274, -0.6087285 , -0.33888391, 0.81064531, 0.35195344, 0.39034723, -0.06219297, -0.40671093, -0.48255719, -0.01860387, 0.31714624, 0.14273996, -0.46043585, 0.97447261, -0.98817214, 0.663474 , -0.77110392, 0.57425401, -0.74857315, -0.77490312, 0.78400796, 0.08371795, -0.63655882, 0.32395379, -0.35119195, 0.52912727, -0.0058194 , -0.06066968, -0.5801619 , 0.9034518 , -0.29083442, 0.23830605, 0.48821214, 0.19582874, 0.53502429, -0.13793191, 0.01086209, 0.14367384, -0.84110448, -0.96160559, -0.62869067, 0.95910018, -0.14307867, 0.37050809, -0.23561234, 0.41819084, -0.42485089, 0.54864131, -0.44723871, 0.50481356, 0.57923845, -0.34382593, 0.7632119 , -0.7742601 , -0.49798704, 0.38378989, -0.55397598, 0.84709091])
78. 计算点p到各个直线i的距离, 其中直线由(P0[i], P1[i])表示,P0,P1为一系列对应的点¶
def distance(P0, P1, p):
T = P1 - P0
L = (T**2).sum(axis=1)
U = -((P0[:,0]-p[...,0])*T[:,0] + (P0[:,1]-p[...,1])*T[:,1]) / L
U = U.reshape(len(U),1)
D = P0 + U*T - p
return np.sqrt((D**2).sum(axis=1))
P0 = np.random.uniform(-10,10,(10,2))
P1 = np.random.uniform(-10,10,(10,2))
p = np.random.uniform(-10,10,( 1,2))
print(distance(P0, P1, p))
[ 2.87016544 4.29956268 4.38862866 9.23114998 0.3151408 5.22935363 12.76342318 4.89162077 0.37943315 2.49854235]
79. 接上题,如何计算P0中各点P0[j]到各直线(P0[i], P1[i])的距离¶
# based on distance function from previous question
P0 = np.random.uniform(-10, 10, (10,2))
P1 = np.random.uniform(-10,10,(10,2))
p = np.random.uniform(-10, 10, (10,2))
print(np.array([distance(P0,P1,p_i) for p_i in p]))
[[10.35207887 12.67378836 0.28563335 2.05003475 11.98721709 16.0372439 0.18738185 6.71561003 5.19739808 14.51734939] [ 4.78764602 6.9552522 0.23868599 4.51692933 6.55259803 11.95368222 4.44239072 7.25505132 9.12125235 12.09448479] [ 0.3424046 3.24495396 11.08478199 2.95043208 3.09834515 2.45426952 3.70360735 4.04404986 5.76053795 2.35555852] [12.35970385 9.60929799 11.16600823 7.43323311 14.75586415 7.3625687 5.89792628 4.15903423 2.13222642 3.99726047] [ 9.07055955 6.16068616 10.99864691 6.08963671 11.55422557 4.81344826 3.21513533 3.98258514 0.1083089 2.41316215] [ 8.24258727 3.30229214 15.10621688 9.08680117 11.06184439 0.2328981 4.10794616 8.08352549 1.62311933 2.40818201] [ 4.06800165 3.30295308 1.33983835 6.50346845 1.91412377 3.16443419 10.89870645 5.70311792 14.03121426 5.66692027] [10.32873106 10.02864928 5.74847724 2.33222938 12.37736858 10.83273177 2.23316298 1.25854785 2.19083484 8.68276187] [12.84979915 12.24219229 6.77278662 4.08226934 14.8990718 11.93608043 4.62807683 0.2282104 0.02213922 8.94199149] [ 9.61265759 8.17671198 8.00791372 3.88506107 11.85382862 8.09854297 2.51669393 0.99650791 1.40863968 5.89104338]]
80. 给定任意一个数组,编写一个函数,接受数组和一个元素为参数,返回以元素为中心的子集(必要的时候可以进行填充)¶
# Author: Nicolas Rougier
Z = np.random.randint(0,10,(10,10))
shape = (5,5)
fill = 0
position = (1,1)
R = np.ones(shape, dtype=Z.dtype)*fill
P = np.array(list(position)).astype(int)
Rs = np.array(list(shape)).astype(int)
Zs = np.array(list(Z.shape)).astype(int)
R_start = np.zeros((len(shape),)).astype(int)
R_stop = np.array(list(shape)).astype(int)
Z_start = (P - Rs//2)
Z_stop = (P + Rs//2) + Rs%2
R_start = (R_start - np.minimum(Z_start, 0)).tolist()
Z_start = np.maximum(Z_start, 0).tolist()
R_stop = np.maximum(R_start, (R_stop - np.maximum(Z_stop - Zs, 0))).tolist()
Z_stop = np.minimum(Z_stop, Zs).tolist()
# Convert slices to tuples for proper indexing
r = tuple([slice(start, stop) for start, stop in zip(R_start, R_stop)])
z = tuple([slice(start, stop) for start, stop in zip(Z_start, Z_stop)])
R[r] = Z[z]
print(Z)
print(R)
[[7 8 0 0 0 3 5 1 6 7] [8 6 6 0 5 4 0 2 6 3] [9 0 8 1 6 5 2 8 5 9] [7 8 8 1 4 9 1 8 0 6] [6 8 1 1 5 6 6 2 8 8] [4 9 7 7 7 9 6 7 5 5] [9 3 0 1 8 7 5 9 4 6] [5 6 9 0 5 9 8 5 3 8] [4 4 3 2 5 2 1 5 7 5] [8 1 5 8 8 0 1 1 4 3]] [[0 0 0 0 0] [0 7 8 0 0] [0 8 6 6 0] [0 9 0 8 1] [0 7 8 8 1]]
PS: 感觉难度为三星的题目(64题及之后)很多出的不太好...考察的是更加灵活地运用Numpy,逻辑是没问题,但是缺乏具体的示例,没有依托实际的问题,就显得比较空洞Orz
81. 有数组Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], 如何生成 = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]?¶
# 1, 答案的方法
Z = np.arange(1,15,dtype=np.uint32)
R = stride_tricks.as_strided(Z,(11,4),(4,4))
print(R)
[[ 1 2 3 4] [ 2 3 4 5] [ 3 4 5 6] [ 4 5 6 7] [ 5 6 7 8] [ 6 7 8 9] [ 7 8 9 10] [ 8 9 10 11] [ 9 10 11 12] [10 11 12 13] [11 12 13 14]]
# 2, 76题的一个特殊形式
z = np.arange(1, 15)
rolling(z, 4)
array([[ 1, 2, 3, 4], [ 2, 3, 4, 5], [ 3, 4, 5, 6], [ 4, 5, 6, 7], [ 5, 6, 7, 8], [ 6, 7, 8, 9], [ 7, 8, 9, 10], [ 8, 9, 10, 11], [ 9, 10, 11, 12], [10, 11, 12, 13], [11, 12, 13, 14]])
82. 计算矩阵的秩¶
# 1, 答案的方法
Z = np.random.uniform(0,1,(10,10))
U, S, V = np.linalg.svd(Z) # Singular Value Decomposition
rank = np.sum(S > 1e-10)
print(rank)
10
# 2, 调用API
np.linalg.matrix_rank(Z)
np.int64(10)
83. 找出数组中的众数¶
Z = np.random.randint(0,10,50)
Z
array([7, 7, 2, 8, 3, 0, 8, 4, 5, 4, 6, 6, 6, 1, 8, 4, 3, 2, 7, 6, 8, 2, 6, 1, 6, 5, 9, 6, 6, 6, 8, 9, 1, 6, 8, 7, 1, 5, 1, 8, 2, 7, 6, 2, 7, 9, 2, 3, 2, 3])
print(np.bincount(Z).argmax())
6
如果不限制在Numpy之内,我们可以直接调用Scipy提供的API
from scipy import stats
stats.mode(Z)
ModeResult(mode=np.int64(6), count=np.int64(11))
84. 从10x10的矩阵中抽取出所有的3x3矩阵¶
Z = np.random.randint(0,5,(10,10))
n = 3
i = 1 + (Z.shape[0]-3)
j = 1 + (Z.shape[1]-3)
C = stride_tricks.as_strided(Z, shape=(i, j, n, n), strides=Z.strides + Z.strides)
print(C)
[[[[3 1 3] [0 0 3] [1 3 3]] [[1 3 2] [0 3 3] [3 3 0]] [[3 2 1] [3 3 0] [3 0 4]] [[2 1 1] [3 0 0] [0 4 0]] [[1 1 0] [0 0 2] [4 0 4]] [[1 0 2] [0 2 4] [0 4 2]] [[0 2 0] [2 4 0] [4 2 4]] [[2 0 4] [4 0 3] [2 4 2]]] [[[0 0 3] [1 3 3] [1 4 3]] [[0 3 3] [3 3 0] [4 3 3]] [[3 3 0] [3 0 4] [3 3 0]] [[3 0 0] [0 4 0] [3 0 4]] [[0 0 2] [4 0 4] [0 4 3]] [[0 2 4] [0 4 2] [4 3 3]] [[2 4 0] [4 2 4] [3 3 4]] [[4 0 3] [2 4 2] [3 4 4]]] [[[1 3 3] [1 4 3] [0 3 3]] [[3 3 0] [4 3 3] [3 3 4]] [[3 0 4] [3 3 0] [3 4 0]] [[0 4 0] [3 0 4] [4 0 1]] [[4 0 4] [0 4 3] [0 1 4]] [[0 4 2] [4 3 3] [1 4 0]] [[4 2 4] [3 3 4] [4 0 1]] [[2 4 2] [3 4 4] [0 1 2]]] [[[1 4 3] [0 3 3] [3 3 2]] [[4 3 3] [3 3 4] [3 2 4]] [[3 3 0] [3 4 0] [2 4 3]] [[3 0 4] [4 0 1] [4 3 2]] [[0 4 3] [0 1 4] [3 2 3]] [[4 3 3] [1 4 0] [2 3 4]] [[3 3 4] [4 0 1] [3 4 0]] [[3 4 4] [0 1 2] [4 0 4]]] [[[0 3 3] [3 3 2] [1 1 0]] [[3 3 4] [3 2 4] [1 0 1]] [[3 4 0] [2 4 3] [0 1 1]] [[4 0 1] [4 3 2] [1 1 4]] [[0 1 4] [3 2 3] [1 4 1]] [[1 4 0] [2 3 4] [4 1 2]] [[4 0 1] [3 4 0] [1 2 4]] [[0 1 2] [4 0 4] [2 4 1]]] [[[3 3 2] [1 1 0] [3 4 0]] [[3 2 4] [1 0 1] [4 0 3]] [[2 4 3] [0 1 1] [0 3 1]] [[4 3 2] [1 1 4] [3 1 3]] [[3 2 3] [1 4 1] [1 3 1]] [[2 3 4] [4 1 2] [3 1 3]] [[3 4 0] [1 2 4] [1 3 1]] [[4 0 4] [2 4 1] [3 1 0]]] [[[1 1 0] [3 4 0] [3 0 3]] [[1 0 1] [4 0 3] [0 3 0]] [[0 1 1] [0 3 1] [3 0 4]] [[1 1 4] [3 1 3] [0 4 4]] [[1 4 1] [1 3 1] [4 4 3]] [[4 1 2] [3 1 3] [4 3 2]] [[1 2 4] [1 3 1] [3 2 3]] [[2 4 1] [3 1 0] [2 3 3]]] [[[3 4 0] [3 0 3] [1 4 0]] [[4 0 3] [0 3 0] [4 0 4]] [[0 3 1] [3 0 4] [0 4 1]] [[3 1 3] [0 4 4] [4 1 4]] [[1 3 1] [4 4 3] [1 4 0]] [[3 1 3] [4 3 2] [4 0 3]] [[1 3 1] [3 2 3] [0 3 1]] [[3 1 0] [2 3 3] [3 1 1]]]]
85. 构造二维数组的子类,使得Z[i, j] = Z[j, i]¶
# Author: Eric O. Lebigot
# Note: only works for 2d array and value setting using indices
class Symetric(np.ndarray):
def __setitem__(self, index, value):
i,j = index
super(Symetric, self).__setitem__((i,j), value)
super(Symetric, self).__setitem__((j,i), value)
def symetric(Z):
return np.asarray(Z + Z.T - np.diag(Z.diagonal())).view(Symetric)
S = symetric(np.random.randint(0,10,(5,5)))
S[2,3] = 42
print(S)
[[ 0 11 12 5 17] [11 2 10 9 11] [12 10 5 42 5] [ 5 9 42 0 12] [17 11 5 12 4]]
86. 给定p个(n, n)矩阵和p个(n, 1)向量, 计算张量乘法(tensor product)¶
p, n = 10, 20
M = np.ones((p,n,n))
V = np.ones((p,n,1))
S = np.tensordot(M, V, axes=[[0, 2], [0, 1]])
print(S)
[[200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.] [200.]]
87. 给定16x16的数组,将其分成4x4的小块,求每块的和¶
Z = np.ones((16,16))
k = 4
S = np.add.reduceat(np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
print(S)
[[16. 16. 16. 16.] [16. 16. 16. 16.] [16. 16. 16. 16.] [16. 16. 16. 16.]]
88. 用数组实现生存游戏¶
def iterate(Z):
# Count neighbours
N = (Z[0:-2,0:-2] + Z[0:-2,1:-1] + Z[0:-2,2:] +
Z[1:-1,0:-2] + Z[1:-1,2:] +
Z[2: ,0:-2] + Z[2: ,1:-1] + Z[2: ,2:])
# Apply rules
birth = (N==3) & (Z[1:-1,1:-1]==0)
survive = ((N==2) | (N==3)) & (Z[1:-1,1:-1]==1)
Z[...] = 0
Z[1:-1,1:-1][birth | survive] = 1
return Z
Z = np.random.randint(0,2,(50,50))
for i in range(100):
Z = iterate(Z)
print(Z)
[[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]]
89. 获取数组最大的n个值¶
Z = np.arange(10000)
np.random.shuffle(Z)
n = 5
# 较慢
print (Z[np.argsort(Z)[-n:]])
[9995 9996 9997 9998 9999]
# 较快
print (Z[np.argpartition(-Z,n)[:n]])
[9999 9998 9997 9996 9995]
90. 给定任意大小的数组,计算其笛卡尔积¶
def cartesian(arrays):
arrays = [np.asarray(a) for a in arrays]
shape = (len(x) for x in arrays)
ix = np.indices(shape, dtype=int)
ix = ix.reshape(len(arrays), -1).T
for n, arr in enumerate(arrays):
ix[:, n] = arrays[n][ix[:, n]]
return ix
print (cartesian(([1, 2, 3], [4, 5], [6, 7])))
[[1 4 6] [1 4 7] [1 5 6] [1 5 7] [2 4 6] [2 4 7] [2 5 6] [2 5 7] [3 4 6] [3 4 7] [3 5 6] [3 5 7]]
91. 将一般的数组转化为结构化数组¶
Z = np.array([("Hello", 2.5, 3),
("World", 3.6, 2)])
Z
array([['Hello', '2.5', '3'], ['World', '3.6', '2']], dtype='<U32')
R = np.core.records.fromarrays(Z.T,
names='col1, col2, col3',
formats = 'S8, f8, i8')
R
/tmp/ipykernel_73994/4252519378.py:1: DeprecationWarning: numpy.core is deprecated and has been renamed to numpy._core. The numpy._core namespace contains private NumPy internals and its use is discouraged, as NumPy internals can change without warning in any release. In practice, most real-world usage of numpy.core is to access functionality in the public NumPy API. If that is the case, use the public NumPy API. If not, you are using NumPy internals. If you would still like to access an internal attribute, use numpy._core.records. R = np.core.records.fromarrays(Z.T,
rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)], dtype=[('col1', 'S8'), ('col2', '<f8'), ('col3', '<i8')])
92. 用三种方法计算大向量的三次方¶
x = np.random.rand(int(5e7))
%timeit np.power(x,3)
477 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x*x*x
83.8 ms ± 908 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('i,i,i->i',x,x,x)
78.4 ms ± 459 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
93. 考虑形状(8,3)和(2,2)的两个数组A和B. 如何查找包含B的每一行元素的A行,不考虑B中元素的顺序¶
A = np.random.randint(0,5,(8,3))
B = np.random.randint(0,5,(2,2))
C = (A[..., np.newaxis, np.newaxis] == B)
rows = np.where(C.any((3,1)).all(1))[0]
print(rows)
[1 2 3 5 7]
94. 给定(10, 3)的矩阵, 找出包含不同元素的行¶
Z = np.random.randint(0,5,(10,3))
Z
array([[3, 4, 3], [2, 3, 1], [4, 4, 4], [1, 3, 0], [0, 0, 1], [1, 2, 3], [4, 4, 0], [3, 1, 4], [4, 0, 2], [3, 4, 1]])
# 适用于任意数据类型
E = np.all(Z[:,1:] == Z[:,:-1], axis=1)
U = Z[~E]
U
array([[3, 4, 3], [2, 3, 1], [1, 3, 0], [0, 0, 1], [1, 2, 3], [4, 4, 0], [3, 1, 4], [4, 0, 2], [3, 4, 1]])
# 仅适用于数值类型
U = Z[Z.max(axis=1) != Z.min(axis=1),:]
U
array([[3, 4, 3], [2, 3, 1], [1, 3, 0], [0, 0, 1], [1, 2, 3], [4, 4, 0], [3, 1, 4], [4, 0, 2], [3, 4, 1]])
95. 将给定的整数向量转化为二进制矩阵¶
indices = np.array([0, 1, 2, 3, 15, 16, 32, 64, 128], dtype=np.uint8)
print(np.unpackbits(indices[:, np.newaxis], axis=1))
[[0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1] [0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 1] [0 0 0 0 1 1 1 1] [0 0 0 1 0 0 0 0] [0 0 1 0 0 0 0 0] [0 1 0 0 0 0 0 0] [1 0 0 0 0 0 0 0]]
96. 给定二维数组,抽取所有不同的行¶
Z = np.random.randint(0,2,(6,3))
Z
array([[0, 1, 0], [1, 1, 1], [1, 1, 1], [0, 1, 0], [1, 1, 0], [0, 1, 0]])
uZ = np.unique(Z, axis=0)
uZ
array([[0, 1, 0], [1, 1, 0], [1, 1, 1]])
97. 考虑两个数组A,B, 使用np.einsum写出矩阵间的outer, inner, sum, mul函数¶
A = np.random.uniform(0,1,10)
B = np.random.uniform(0,1,10)
np.einsum('i->', A) # np.sum(A)
np.einsum('i,i->i', A, B) # A * B
np.einsum('i,i', A, B) # np.inner(A, B)
np.einsum('i,j->ij', A, B) # np.outer(A, B)
array([[0.26215489, 0.26546696, 0.33141862, 0.25857708, 0.02226384, 0.32101759, 0.27731471, 0.16404438, 0.08255434, 0.41597821], [0.00881091, 0.00892223, 0.01113884, 0.00869066, 0.00074828, 0.01078926, 0.00932043, 0.00551346, 0.00277462, 0.01398085], [0.45462441, 0.46036813, 0.57474036, 0.44841984, 0.03860956, 0.55670307, 0.48091431, 0.28448289, 0.14316429, 0.72138211], [0.0179267 , 0.01815318, 0.0226631 , 0.01768204, 0.00152245, 0.02195185, 0.01896336, 0.0112177 , 0.00564524, 0.02844546], [0.08837339, 0.0894899 , 0.11172246, 0.0871673 , 0.00750522, 0.10821623, 0.09348383, 0.05529998, 0.02782938, 0.14022781], [0.34268495, 0.34701443, 0.43322547, 0.33800809, 0.02910296, 0.41962939, 0.36250164, 0.21443637, 0.1079138 , 0.54376049], [0.38113705, 0.38595234, 0.48183697, 0.37593541, 0.03236855, 0.4667153 , 0.40317735, 0.23849791, 0.12002263, 0.60477494], [0.18058261, 0.18286409, 0.22829419, 0.17811807, 0.01533621, 0.22112955, 0.19102529, 0.11300023, 0.05686668, 0.28654217], [0.38010653, 0.3849088 , 0.48053418, 0.37491896, 0.03228104, 0.4654534 , 0.40208723, 0.23785306, 0.11969811, 0.60313975], [0.29727966, 0.30103549, 0.37582368, 0.29322248, 0.02524686, 0.36402906, 0.31447066, 0.18602384, 0.09361537, 0.47171296]])
98. 用两个向量(X, Y)描述一条轨道,如何对其进行等距抽样¶
phi = np.arange(0, 10*np.pi, 0.1)
a = 1
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)
dr = (np.diff(x)**2 + np.diff(y)**2)**.5 # segment lengths
r = np.zeros_like(x)
r[1:] = np.cumsum(dr) # integrate path
r_int = np.linspace(0, r.max(), 200) # regular spaced path
x_int = np.interp(r_int, r, x) # integrate path
y_int = np.interp(r_int, r, y)
99. 给定一个二维数组和一个整数n,提取所有仅包含整数,且元素和为n的行¶
X = np.asarray([[1.0, 0.0, 3.0, 8.0],
[2.0, 0.0, 1.0, 1.0],
[1.5, 2.5, 1.0, 0.0]])
n = 4
M = np.logical_and.reduce(np.mod(X, 1) == 0, axis=-1)
M &= (X.sum(axis=-1) == n)
print(X[M])
[[2. 0. 1. 1.]]
100. 给定一个向量,计算其均值的95%的置信区间¶
X = np.random.randn(100)
N = 1000 # 再抽样次数
idx = np.random.randint(0, X.size, (N, X.size))
means = X[idx].mean(axis=1)
confint = np.percentile(means, [2.5, 97.5])
print(confint)
[-0.27742001 0.12618196]