|
《用python写网络爬虫》里的 关于爬虫限速的一段程序:
class Throttle:
def __init__(self, delay):
self.delay = delay
self.domains = {}
def wait(self, url):
domain = urlparse.urlparse(url).netloc
last_accessed = self.domains.get(domain)
if self.delay > 0 and last_accessed is not None:
sleep_secs = self.delay - (datetime.now() - last_accessed).seconds
if sleep_secs > 0:
print last_accessed
print sleep_secs
time.sleep(sleep_secs)
self.domains[domain] = datetime.now()
我觉得不合理啊,last_accessd不是每次都更新了啊 而不是调出上一次的,怎么就能实现限速了啊
|
|