由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 用一个wheel抓取wechat 文章,这个错误怎么解决?
相关主题
Opening available in EDA company in bay area如何提高debug skill?
EE PhD 求工作refer,非常感谢MS 面试 schedule的问题
关于Reference的问题大公司, 没有Phone Interview, 真接让Schedule for testing
找工作的时候,新雇主一般会在什么时候做REFERENCE?why companies like scheduling interviews in Dec???
Amazon 内部refer--我们组大量招人,SDE,SDETcareer cup 150 pdf download
请教一个情况新鲜A家电面……请问设计题怎么算回答得好?
星期一reference checked, 今天没有任何消息,还有戏吗?onsite scheduling没给面试官名单,要名单不好呢?
SQL debug step into a store procedure from another one (转载)G家面试scheduling会给面试官的名字嘛?
相关话题的讨论汇总
话题: file话题: 23话题: 2018话题: 51话题: wescraper
进入JobHunting版参与讨论
1 (共1页)
o****g
发帖数: 174
1
用的是这个wheel, 要先加载一个package scrapy.
https://github.com/LKI/wescraper
有如下错误:好像是没有得到cookie? 为什么?
[scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-01-23 17:51:22 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2
2.9.7, cssselect 1.0.3, parsel 1.3.1, w3lib 1.18.0, Twisted 17.9.0, Python
2.7.13 |Anaconda 4.4.0 (64-bit)| (default, Dec 20 2016, 23:09:15) - [GCC 4.4
.7 20120313 (Red Hat 4.4.7-1)], pyOpenSSL 17.0.0 (OpenSSL 1.0.2l 25 May
2017), cryptography 1.8.1, Platform Linux-3.13.0-92-generic-x86_64-with-
debian-jessie-sid
2018-01-23 17:51:22 [scrapy.crawler] INFO: Overridden settings: {'DUPEFILTER
_CLASS': u'scrapy.dupefilter.BaseDupeFilter'}
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled item pipelines:
[u'__main__.WeScraper']
2018-01-23 17:51:22 [scrapy.core.engine] INFO: Spider opened
2018-01-23 17:51:22 [py.warnings] WARNING: /home/ubuntu/anaconda2/lib/
python2.7/importlib/__init__.py:37: ScrapyDeprecationWarning: Module `scrapy
.dupefilter` is deprecated, use `scrapy.dupefilters` instead
__import__(name)
2018-01-23 17:51:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0
pages/min), scraped 0 items (at 0 items/min)
2018-01-23 17:51:22 [scrapy.extensions.telnet] DEBUG: Telnet console
listening on 127.0.0.1:6023
2018-01-23 17:51:22 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (302) to http://weixin.sogou.com/> from http://weixin.sogou.com/weixin?type=2&sourceid=inttime_day&tsn=1&query=miawu>
2018-01-23 17:51:22 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (302) to http://weixin.sogou.com/> from http://weixin.sogou.com/weixin?type=2&sourceid=inttime_day&tsn=1&query=liriansu>
2018-01-23 17:51:23 [scrapy.core.engine] DEBUG: Crawled (200) http://weixin.sogou.com/> (referer: None)
2018-01-23 17:51:23 [scrapy.core.engine] DEBUG: Crawled (200) http://weixin.sogou.com/> (referer: None)
2018-01-23 17:51:23 [sogou.com/] DEBUG: Current cookie: {}
2018-01-23 17:51:23 [scrapy.core.scraper] ERROR: Spider error processing <
GET http://weixin.sogou.com/> (referer: None)
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/utils/
defer.py", line 102, in iter_errback
yield next(it)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/wescraper/wescraper/wespider.py", line 97, in parse_
keyword
self.cookie_pool.set_return_header(response.headers.getlist('Set-Cookie'
), current_cookie)
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 68, in set_return_
header
self.dump()
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 27, in dump
lines = [cookie['SNUID'], cookie['SUID'], cookie['SUV']]
KeyError: u'SNUID'
2018-01-23 17:51:23 [sogou.com/] DEBUG: Current cookie: {}
2018-01-23 17:51:23 [scrapy.core.scraper] ERROR: Spider error processing <
GET http://weixin.sogou.com/> (referer: None)
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/utils/
defer.py", line 102, in iter_errback
yield next(it)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/wescraper/wescraper/wespider.py", line 97, in parse_
keyword
self.cookie_pool.set_return_header(response.headers.getlist('Set-Cookie'
), current_cookie)
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 68, in set_return_
header
self.dump()
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 27, in dump
lines = [cookie['SNUID'], cookie['SUID'], cookie['SUV']]
KeyError: u'SNUID'
2018-01-23 17:51:23 [scrapy.core.engine] INFO: Closing spider (finished)
2018-01-23 17:51:23 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1209,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 4,
'downloader/response_bytes': 49660,
'downloader/response_count': 4,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/302': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 1, 23, 17, 51, 23, 392293),
'log_count/DEBUG': 7,
'log_count/ERROR': 2,
'log_count/INFO': 7,
'log_count/WARNING': 1,
'memusage/max': 40009728,
'memusage/startup': 40009728,
'response_received_count': 2,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'spider_exceptions/KeyError': 2,
'start_time': datetime.datetime(2018, 1, 23, 17, 51, 22, 293700)}
1 (共1页)
进入JobHunting版参与讨论
相关主题
G家面试scheduling会给面试官的名字嘛?Amazon 内部refer--我们组大量招人,SDE,SDET
code challenge 求助请教一个情况
Amazon first round phone interview星期一reference checked, 今天没有任何消息,还有戏吗?
请教一个math puzzle题SQL debug step into a store procedure from another one (转载)
Opening available in EDA company in bay area如何提高debug skill?
EE PhD 求工作refer,非常感谢MS 面试 schedule的问题
关于Reference的问题大公司, 没有Phone Interview, 真接让Schedule for testing
找工作的时候,新雇主一般会在什么时候做REFERENCE?why companies like scheduling interviews in Dec???
相关话题的讨论汇总
话题: file话题: 23话题: 2018话题: 51话题: wescraper