逆透视

unpivot将DataFrame从宽表转换为长表

数据集

1import polars as pl
2
3df = pl.DataFrame(
4    {
5        "A": ["a", "b", "a"],
6        "B": [1, 3, 5],
7        "C": [10, 11, 12],
8        "D": [2, 4, 6],
9    }
10)
11print(df)
1shape: (3, 4)
2┌─────┬─────┬─────┬─────┐
3│ A   ┆ B   ┆ C   ┆ D   │
4│ --- ┆ --- ┆ --- ┆ --- │
5│ str ┆ i64 ┆ i64 ┆ i64 │
6╞═════╪═════╪═════╪═════╡
7│ a   ┆ 1   ┆ 10  ┆ 2   │
8│ b   ┆ 3   ┆ 11  ┆ 4   │
9│ a   ┆ 5   ┆ 12  ┆ 6   │
10└─────┴─────┴─────┴─────┘

即时模式+惰性模式

两者具有相同的API 函数定义如下

1def unpivot(
2    self,
3    on: str | _selector_proxy_ | Sequence[str | _selector_proxy_] | None = None,
4    *,
5    index: str | _selector_proxy_ | Sequence[str | _selector_proxy_] | None = None,
6    variable_name: str | None = None,
7    value_name: str | None = None
8) -> DataFrame
  • on: 需要被展开的列
  • index: 保持不变的列, 相等于每一行的主键
  • variable_name: 展开列的列名
  • value_name: 展开列的值的列名
1out = df.unpivot(
2    on=["C", "D"],
3    index=["A", "B"]
4)
5print(out)

我们来看下具体是如何展开的, 请对照原本数据

1shape: (3, 4)
2┌─────┬─────┬─────┬─────┐
3│ A   ┆ B   ┆ C   ┆ D   │
4│ --- ┆ --- ┆ --- ┆ --- │
5│ str ┆ i64 ┆ i64 ┆ i64 │
6╞═════╪═════╪═════╪═════╡
7│ a   ┆ 1   ┆ 10  ┆ 2   │
8│ b   ┆ 3   ┆ 11  ┆ 4   │
9│ a   ┆ 5   ┆ 12  ┆ 6   │
10└─────┴─────┴─────┴─────┘

一行一行看, 先看A,B,C这三列

  • a 1 C->10
  • b 3 C->11
  • a 5 C->12

然后再看A,B,D这三列

  • a 1 D->2
  • b 3 D->4
  • a 5 D->6
1shape: (6, 4)
2┌─────┬─────┬──────────┬───────┐
3│ A   ┆ B   ┆ variable ┆ value │
4│ --- ┆ --- ┆ ---      ┆ ---   │
5│ str ┆ i64 ┆ str      ┆ i64   │
6╞═════╪═════╪══════════╪═══════╡
7│ a   ┆ 1   ┆ C        ┆ 10    │
8│ b   ┆ 3   ┆ C        ┆ 11    │
9│ a   ┆ 5   ┆ C        ┆ 12    │
10│ a   ┆ 1   ┆ D        ┆ 2     │
11│ b   ┆ 3   ┆ D        ┆ 4     │
12│ a   ┆ 5   ┆ D        ┆ 6     │
13└─────┴─────┴──────────┴───────┘