由于需要于上千万条dataframe进行计算,耗时太久,于是,想到用pandarallel。
可是,
from pandarallel import pandarallel
后,
pandarallel.initialize()
运行,警告:
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0812 19:11:57.484051 2409853824 io.cc:168] Connection to IPC socket failed for pathname /var/folders/sp/vz74h1tx3jlb3jqrq__bjwh00000gp/T/pandarallel-32ts0h6r/plasma_sock, retrying 20 more times
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0812 19:11:57.485960 2409853824 store.cc:1116] Allowing the Plasma store to use up to 2GB of memory.
I0812 19:11:57.486624 2409853824 store.cc:1143] Starting object store with directory /tmp and huge page support disabled
看了帮助文档:
'''
此方法需要4个可选参数:
- `shm_size_mb`:Pandarallel共享内存的大小,以MB为单位。如果
默认值太小,可以设置较大的一个。默认情况下,
它设置为2 GB。 (INT)
- `nb_workers`:工人数量。默认情况下,它设置为数字
您的操作系统看到的核心数。 (INT)
- `progress_bar`:将其设置为“True”以显示进度条。
- `verbose`:详细程度。 > 1显示所有日志 - 1,仅显示
初始化日志 - <1显示无日志(int)
'''
pandarallel.initialize(shm_size_mb=6072, nb_workers=11,progress_bar=False, verbose=0)
仍然出现警告:
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0812 19:12:56.934706 2409853824 io.cc:168] Connection to IPC socket failed for pathname /var/folders/sp/vz74h1tx3jlb3jqrq__bjwh00000gp/T/pandarallel-zxjkriqd/plasma_sock, retrying 20 more times