在 Pandas 中获取列与特定值匹配的行的索引

本文演示了 Pandas 中如何获取符合特定条件的行的索引。

在特征工程中，查找行的索引的必要性是很重要的。这些技能对于去除 Dataframe 中的离群值或异常值很有用。索引，也就是行标签，可以在 Pandas 中使用几个函数找到。在下面的例子中，我们将处理使用以下代码段创建的 DataFrame。

import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df)
     A   B   C   D
0   13  16   1   4
1    4   8  10  19
2    5   7  13   2
3    7   8  15  18
4    6  14   9  10
5   17   6  16  16
6    1  19   4  18
7   15   8   1   2
8   10   1  11   4
9   12  19   3   1
10   1   5   6   7
11   9  18  16   5
12  10  11   2   2
13   8  10   4   7
14  12  15  19   1
15  15   4  13  11
16  12   5   7   5
17  16   4  13   5
18   9  15  16   4
19  16  14  17  18





    
在 Pandas 中获取包含整数/浮点数的行的索引
pandas.DataFrame.loc 函数可以通过其标签/名称访问行和列。它直接返回与作为标签传递的给定布尔条件相匹配的行。请注意片段中 df.loc 旁边的方括号。
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.loc[df["B"] == 19])
对应于布尔条件的行将以 Dataframe 格式的输出返回。
    A   B  C   D
6   1  19  4  18
9  12  19  3   1
多个条件可以被串联起来并一起应用到函数中，如下所示。这有助于根据特定条件隔离行。
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.loc[(df["B"] == 19) | (df["C"] == 19)])
     A   B   C   D
6    1  19   4  18
9   12  19   3   1
14  12  15  19   1
用 pandas.DataFrame.index() 获取行的索引
如果你想只查找满足作为参数传递的布尔条件的 DataFrame 的匹配索引，pandas.DataFrame.index() 是最简单的实现方式。
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.index[df["B"] == 19].tolist())
在上面的代码段中，列 A 中与布尔条件 == 1 相匹配的行以输出的方式返回，如下所示。
[6, 9]
我们之所以把 tolist() 放在 index() 方法后面，是为了把 Index 转换为列表，否则，结果就是 Int64Index 数据类型。
Int64Index([6, 9], dtype='int64'
也可以根据多个条件只检索索引。这段代码可以写成如下。
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.index[(df["B"] == 19) | (df["C"] == 19)].tolist())
[6, 9, 14]
在 Pandas 中获取包含字符串的行的索引
字符串值可以根据两种方法进行匹配。上一节中所示的两种方法都可以使用，除了条件变化。
在下面的例子中，我们将使用以下片段。
import pandas as pd
df = pd.DataFrame(
        "Name": ["blue", "delta", "echo", "charlie", "alpha"],
        "Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
print(df)
      Name               Type
0     blue            Raptors
1    delta            Raptors
2     echo            Raptors
3  charlie            Raptors
4    alpha  Tyrannosaurus rex
用精确字符串匹配获取行的索引
上一节中使用的相等条件可以用来寻找 Dataframe 中的精确字符串匹配。我们来寻找两个字符串。
import pandas as pd
df = pd.DataFrame(
        "Name": ["blue", "delta", "echo", "charlie", "alpha"],
        "Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
print(df.index[(df["Name"] == "blue")].tolist())
print("\n")
print(df.loc[df["Name"] == "blue"])
print("\n")
print(df.loc[(df["Name"] == "charlie") & (df["Type"] == "Raptors")])
   Name     Type
0  blue  Raptors
      Name     Type
3  charlie  Raptors
如上所示，索引和符合条件的行都可以被接收。
获取具有部分字符串匹配条件的行的索引
通过将 DataFrame 与 str.contains 函数进行链式连接，可以部分匹配字符串值。在下面的例子中，我们将在 charlie 和 alpha中寻找字符串 ha。
import pandas as pd
df = pd.DataFrame(
        "Name": ["blue", "delta", "echo", "charlie", "alpha"],
        "Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
print(df.index[df["Name"].str.contains("ha")].tolist())
print("\n")
print(df.loc[df["Name"].str.contains("ha")])
print("\n")
print(df.loc[(df["Name"].str.contains("ha")) & (df["Type"].str.contains("Rex"))])
[3, 4]
      Name               Type
3  charlie            Raptors
4    alpha  Tyrannosaurus rex
    Name               Type
4  alpha  Tyrannosaurus rex
这个函数在对 DataFrame 的多列进行部分字符串匹配时非常有用。
        相关文章 - Pandas DataFrame
                如何将 Pandas DataFrame 列标题获取为列表
                如何删除 Pandas DataFrame 列
                如何在 Pandas 中将 DataFrame 列转换为日期时间
                如何在 Pandas DataFrame 中将浮点数转换为整数
                如何按一列的值对 Pandas DataFrame 进行排序
                如何用 group-by 和 sum 获得 Pandas 总和
        相关文章 - Pandas DataFrame Row
                如何获取 Pandas DataFrame 的行数
                如何对 Pandas 中的 DataFrame 行随机排序
                如何根据 Pandas 中的列值过滤 DataFrame 行
                如何在 Pandas 中遍历 DataFrame 的行
                Pandas 中如何获取特定列满足给定条件的所有行的索引
                Pandas DataFrame 删除某行

教程

贴士文章

函数参考

在 Pandas 中获取包含整数/浮点数的行的索引

用 pandas.DataFrame.index() 获取行的索引

在 Pandas 中获取包含字符串的行的索引

用精确字符串匹配获取行的索引

获取具有部分字符串匹配条件的行的索引

相关文章 - Pandas DataFrame

相关文章 - Pandas DataFrame Row

用 `pandas.DataFrame.index()` 获取行的索引