|
统计一个800多兆的日志文件有多少行,两个算法,一个单线程,一个并行,但是并行的效率远远低于单线程的算法,怎么提升并行算法的效率呢?
高手帮忙分析下。
单线程的算法和结果如下:
start_t=time.clock()
def block(file,size=65536):
while 1:
nb=file.read(size)
if not nb:
break
yield nb
with open("D:\Centos7\catalina.out","r",encoding="UTF-8") as f:
#print(type(f))
print(sum(line.count("\n") for line in block(f)))
#print(block(f))
print(time.clock()-start_t)
D:\new\Python\excersise>python test4.py
404325
4.501643689510964
效率4秒多。
并行算法和结果如下:
D:\new\Python\excersise>python test4.py
404325 11.875990500941482
def run(fn):
#print(sum(line.count() for line in fn))
#print(type(fn))
#print(fn)
return len(fn)
if __name__=="__main__":
start_t=time.clock()
sum1=0
fp=open("D:\Centos7\catalina.out","r",encoding="UTF-8")
ff=[]
while 1:
fb=fp.readlines(1048576) #65536(字节)*16*10,64kb*16=1024KB=1MB,
if not fb:
break
ff.append(fb)
fp.close()
#print(ff)
pool=mp.Pool(16)
#pool.map(run,ff)
sum_t=reduce(lambda x,y:x+y,pool.map(run,ff))
pool.close()
pool.join()
print(sum_t,time.clock()-start_t)
D:\new\Python\excersise>python test4.py
404325 11.875990500941482
结果12秒,无论调整并行度4,8,16,结果都差不多。
怎么提升并行效率呢?
|
|