python - 如何解決scarpy-redis空跑問題?
問題描述
scrapy-redis框架中,reids存儲(chǔ)的xxx:requests已經(jīng)爬取完畢,但程序仍然一直運(yùn)行,如何自動(dòng)停止程序,而不是一直在空跑?
2017-07-03 09:17:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)2017-07-03 09:18:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
可以通過engine.close_spider(spider, ’reason’)來停止程序的運(yùn)行。
def next_request(self):block_pop_timeout = self.idle_before_closerequest = self.queue.pop(block_pop_timeout)if request and self.stats: self.stats.inc_value(’scheduler/dequeued/redis’, spider=self.spider)if request is None: self.spider.crawler.engine.close_spider(self.spider, ’queue is empty’)return request
還有一個(gè)問題不明白:當(dāng)通過engine.close_spider(spider, ’reason’)來關(guān)閉spider時(shí),會(huì)出現(xiàn)幾個(gè)錯(cuò)誤之后才能關(guān)閉。
# 正常關(guān)閉2017-07-03 18:02:38 [scrapy.core.engine] INFO: Closing spider (queue is empty)2017-07-03 18:02:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:{’finish_reason’: ’queue is empty’, ’finish_time’: datetime.datetime(2017, 7, 3, 10, 2, 38, 616021), ’log_count/INFO’: 8, ’start_time’: datetime.datetime(2017, 7, 3, 10, 2, 38, 600382)}2017-07-03 18:02:38 [scrapy.core.engine] INFO: Spider closed (queue is empty)# 之后還會(huì)出現(xiàn)幾個(gè)錯(cuò)誤才關(guān)閉spider,難道spider剛啟動(dòng)時(shí)會(huì)啟動(dòng)多個(gè)線程一起抓取, # 然后其中一個(gè)線程關(guān)閉了spider,其他線程就找不到spider才會(huì)報(bào)錯(cuò)!Unhandled ErrorTraceback (most recent call last): File 'D:/papp/project/launch.py', line 37, in <module> process.start() File 'D:Program Filespython3libsite-packagesscrapycrawler.py', line 285, in start reactor.run(installSignalHandlers=False) # blocking call File 'D:Program Filespython3libsite-packagestwistedinternetbase.py', line 1243, in run self.mainLoop() File 'D:Program Filespython3libsite-packagestwistedinternetbase.py', line 1252, in mainLoop self.runUntilCurrent()--- <exception caught here> --- File 'D:Program Filespython3libsite-packagestwistedinternetbase.py', line 878, in runUntilCurrent call.func(*call.args, **call.kw) File 'D:Program Filespython3libsite-packagesscrapyutilsreactor.py', line 41, in __call__ return self._func(*self._a, **self._kw) File 'D:Program Filespython3libsite-packagesscrapycoreengine.py', line 137, in _next_request if self.spider_is_idle(spider) and slot.close_if_idle: File 'D:Program Filespython3libsite-packagesscrapycoreengine.py', line 189, in spider_is_idle if self.slot.start_requests is not None:builtins.AttributeError: ’NoneType’ object has no attribute ’start_requests’
問題解答
回答1:怎樣知道放的requests爬取完畢,這個(gè)要定義才知道如果不復(fù)雜,可以使用內(nèi)部擴(kuò)展關(guān)掉!
scrapy.contrib.closespider.CloseSpider
CLOSESPIDER_TIMEOUTCLOSESPIDER_ITEMCOUNTCLOSESPIDER_PAGECOUNTCLOSESPIDER_ERRORCOUNThttp://scrapy-chs.readthedocs...
相關(guān)文章:
1. python - 字符串中反斜杠的替換2. html - 求解關(guān)于偽類和visibility的問題3. javascript - 在點(diǎn)擊nav后,用JS加上顏色,怎么在頁(yè)面跳轉(zhuǎn)后仍能保持改變后的顏色?4. 這段代碼是獲取百度收錄量的!需要怎么設(shè)置才能獲取百度快照旁邊的網(wǎng)址呢?5. vuejs組件內(nèi)的props的屬性賦值問題?6. java - 關(guān)于表的主鍵問題7. 誰有mysql5.7安裝的詳細(xì)教程8. html5 - 如何讓H5頁(yè)面在手機(jī)瀏覽器里和微信全屏顯示9. java中使用log4j如何不用每次調(diào)用都聲明一下?10. Android的webView如何實(shí)現(xiàn)網(wǎng)頁(yè) 錄音功能?
