Faster Pandas,一些加速pandas处理的技巧和工具
Smaller
加速的前置工作之一,内存优化
参考:
Read by chunk
1 | df_reader = pd.read_csv(file, low_memory=False, lineterminator="\n", usecols=columns_needed, iterator=True, chunksize=360000) |
Parrallel Processing
Modin
Modin介绍
想让pandas运行更快吗?那就用Modin吧 - 机器之心的文章 - 知乎 https://zhuanlan.zhihu.com/p/62398921
Modin官方
https://github.com/modin-project/modin
https://modin.readthedocs.io/en/latest/index.html
Modin使用
swifter
pandarallel
Multithreads
1 | from tqdm import tqdm |