怎么写个程序实现自动登录然后下载文件 - Programming版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - 怎么写个程序实现自动登录然后下载文件

相关主题
● 请教一个语言选择的弱问题	● 从网上读取数据，然后在本地计算用什么语言合适？
● 菜鸟问题	● 一个网页点击link和copy link address再打开得到不同结果
● Java可以自动填写webpage，然后submit吗？ (转载)	● python处理gb2312的问题
● 怎么可以取出网页中更新的内容 ?	● 请教大牛一个关于htmlunit的问题。
● 请问如何实现自动向网站提交数据的程序?	● 自动填写网上若干个“contact me” form的小程序？
● 请教，网页抓取、内容整理提取用什么做比较简单	● web scraping有啥方便的API或者框架不
● 请问怎么写外挂啊?	● 浏览器的故事 (转载)
● 如何实现将网页内容自动存取？	● 问一个Mandriva 2007 下Tix的问题

相关话题的讨论汇总
话题: mechanize话题: 2f%话题: form话题: 3a%话题: python

进入Programming版参与讨论

(共1页)

d*g
发帖数: 62

现在的工作，每天都要重复做的一件事情是
打开浏览器，输入一个域名
然后输入用户名和密码，点登录
成功登录后转到另一页面，
等上个20-30分钟，上面会出以当天日期为名的链接
点这个链接下载这个文件，跳出保存框之后存到本地机器上
因为每天都要干这个事情，有没有可能写个小软件，
比如C#的，或者VBA的，总之windows版本的
让它帮我自动把要下载的文件下载下来？
预先多谢大牛的帮忙！

l*********s
发帖数: 5409

python, beautifulsoup + mechanize

l********a
发帖数: 1154

python python python

j******n
发帖数: 271

Try wget or curl, both of which are Open Source.

【在 d*g 的大作中提到】

: 现在的工作，每天都要重复做的一件事情是
: 打开浏览器，输入一个域名
: 然后输入用户名和密码，点登录
: 成功登录后转到另一页面，
: 等上个20-30分钟，上面会出以当天日期为名的链接
: 点这个链接下载这个文件，跳出保存框之后存到本地机器上
: 因为每天都要干这个事情，有没有可能写个小软件，
: 比如C#的，或者VBA的，总之windows版本的
: 让它帮我自动把要下载的文件下载下来？
: 预先多谢大牛的帮忙！

h*******s
发帖数: 8454

用python很容易，不过遇到有验证码的比较麻烦。。。
#!/usr/bin/python
# -*- coding:utf-8 -*-
import mechanize
import time
import os
username = 'xxx'
password = 'xxx'
addr = 'xxx'
delay = 60
b = mechanize.Browser()
b.set_handle_robots(False)
b.addheaders = [('User-Agent','Mozilla/4.0(compatible; MSIE 6.0; Windows 98;
)')]
# connect
b.open(addr)
## log in by post the form
b.select_form(name = 'loginForm')
b['username'] = username
b['password'] = password
#
feedback = b.submit()
#
## delay some time
time.sleep(delay)
#
## log out
response = b.follow_link(url_regex = r'\s*logout.php')
#

【在 d*g 的大作中提到】

d*g
发帖数: 62

多谢！不过遇到问题。在网页上打开，能看见登录的用户名密码框，
但用browser就不知道应该怎么打开，因为
browser.select_form(nr=0) 出错如下：
Traceback (most recent call last):
File "C:/Python27/work001.py", line 36, in
br.select_form(nr=0)
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 524, in
select_form
raise FormNotFoundError("no form matching "+description)
FormNotFoundError: no form matching nr 0
我查看页面源代码，是这种东西

这里面没有 form 啊
这得怎么弄？

【在 h*******s 的大作中提到】

: 用python很容易，不过遇到有验证码的比较麻烦。。。
: #!/usr/bin/python
: # -*- coding:utf-8 -*-
: import mechanize
: import time
: import os
: username = 'xxx'
: password = 'xxx'
: addr = 'xxx'
: delay = 60

p*********t
发帖数: 2690

能不能用java 写一个自动登录程序？

【在 h*******s 的大作中提到】

g*****g
发帖数: 34805

Check HtmlUnit, pretty easy to do.

【在 p*********t 的大作中提到】

: 能不能用java 写一个自动登录程序？

h**********c
发帖数: 4120

username password may https.
Hope big niu cast a light.

b***i
发帖数: 3043

The main webpage has frames. You see real html page for that frame.
so the link (...jsp...) is the html, use browser to open that.

【在 d*g 的大作中提到】

: 多谢！不过遇到问题。在网页上打开，能看见登录的用户名密码框，
: 但用browser就不知道应该怎么打开，因为
: browser.select_form(nr=0) 出错如下：
: Traceback (most recent call last):
: File "C:/Python27/work001.py", line 36, in
: br.select_form(nr=0)
: File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 524, in
: select_form
: raise FormNotFoundError("no form matching "+description)
: FormNotFoundError: no form matching nr 0

相关主题
● 请教，网页抓取、内容整理提取用什么做比较简单	● 从网上读取数据，然后在本地计算用什么语言合适？
● 请问怎么写外挂啊?	● 一个网页点击link和copy link address再打开得到不同结果
● 如何实现将网页内容自动存取？	● python处理gb2312的问题
进入Programming版参与讨论

h*******s
发帖数: 8454

带？的是不是就是所谓动态网页啊，mechanize好像搞不定这种

【在 b***i 的大作中提到】

: The main webpage has frames. You see : real html page for that frame.
: so the link (...jsp...) is the html, use browser to open that.

u******u
发帖数: 106

Selenium 自动化测试网页的工具，存文件部分因为是popup ,需要你自己写东西处理

h**********c
发帖数: 4120

LIKE yahoo mail,
when username/psw https
then plain http
How the browser handles this?
Will this affect LZ's objective?

h*******s
发帖数: 8454

对，这个挺好用，不过好像要开个窗口，挺烦的。。。

【在 u******u 的大作中提到】

: Selenium 自动化测试网页的工具，存文件部分因为是popup ,需要你自己写东西处理

i*****o
发帖数: 1714

The login part just gives you the cookies, then use the cookies to get the
following pages.

【在 h**********c 的大作中提到】

: LIKE yahoo mail,
: when username/psw https
: then plain http
: How the browser handles this?
: Will this affect LZ's objective?

c**t
发帖数: 2744

use outlook (IMAP), google it yourself. Don't go so crazy

【在 h**********c 的大作中提到】

: LIKE yahoo mail,
: when username/psw https
: then plain http
: How the browser handles this?
: Will this affect LZ's objective?

h**********c
发帖数: 4120

Why I mention yahoo mail (Web browser).
Because it is a http -> https -> http sequence.
If it is all http. You can use something like wireshark listen all the
traffic, repeat the get post you have done using Python, c-http etc.
But there is https, there will be a sequence of handshaking exchange. The
TCP port numbers are different. So there must be two sockets to handle it.
I would like to know LZ's case, all traffic is http, https or mixed.
For each situation, how to deal with it?

d*g
发帖数: 62

我的这个网站全是https页面
最开始输入一个域名，比如http://aaaa.aa.aaa.aa
在IE的地址上，它自动就变成了一个很长很长的动态页面，里面有我原始输入的域名，
甚至还有当前日期时间，就象这个样子：
https://bb.bb.bbb/nidp/idff/sso?RequestID=idNHtKzhLTDjRBIzRANbbscOExLAk&
MajorVersion=1&MinorVersion=2&IssueInstant=2012-04-09T02%3A13%3A53Z&
ProviderID=https%3A%2F%2Fbb.bb.bbb%3A443%2Fnesp%2Fidff%2Fmetadata&RelayState
=MA%3D%3D&consent=urn%3Aliberty%3Aconsent%3Aunavailable&ForceAuthn=false&
IsPassive=false&NameIDPolicy=onetime&ProtocolProfile=http%3A%2F%
2Fprojectliberty.org%2Fprofiles%2Fbrws-art&target=https%3A%2F%2Fbb.bb.bbb%
3A443%2FLAGBroker%3F%2522https%3A%2F%2Faaaa.aa.aaa.aa%3A443%2F%2522&
AuthnContextStatementRef=secure%2Fpw%2Fform%2Fonehour%2Furi
而且这个页面里的用户名密码输入框的source code里没有

, 只有

【在 h**********c 的大作中提到】

: Why I mention yahoo mail (Web browser).
: Because it is a http -> https -> http sequence.
: If it is all http. You can use something like wireshark listen all the
: traffic, repeat the get post you have done using Python, c-http etc.
: But there is https, there will be a sequence of handshaking exchange. The
: TCP port numbers are different. So there must be two sockets to handle it.
: I would like to know LZ's case, all traffic is http, https or mixed.
: For each situation, how to deal with it?

h**********c
发帖数: 4120

Then I suggest you study openssl examples.
Maybe, if you are lucky, Python can support https.
Kind regards,

RelayState

【在 d*g 的大作中提到】

: 我的这个网站全是https页面
: 最开始输入一个域名，比如http://aaaa.aa.aaa.aa
: 在IE的地址上，它自动就变成了一个很长很长的动态页面，里面有我原始输入的域名，
: 甚至还有当前日期时间，就象这个样子：
: https://bb.bb.bbb/nidp/idff/sso?RequestID=idNHtKzhLTDjRBIzRANbbscOExLAk&
: MajorVersion=1&MinorVersion=2&IssueInstant=2012-04-09T02%3A13%3A53Z&
: ProviderID=https%3A%2F%2Fbb.bb.bbb%3A443%2Fnesp%2Fidff%2Fmetadata&RelayState
: =MA%3D%3D&consent=urn%3Aliberty%3Aconsent%3Aunavailable&ForceAuthn=false&
: IsPassive=false&NameIDPolicy=onetime&ProtocolProfile=http%3A%2F%
: 2Fprojectliberty.org%2Fprofiles%2Fbrws-art&target=https%3A%2F%2Fbb.bb.bbb%

b***i
发帖数: 3043

用鼠标右键在登陆附近点击，view source可以看到这个frame的html,而不是整个的。
就是说，你可以看到这个jsp动态网页的结果，其中会包含form

RelayState

【在 d*g 的大作中提到】

相关主题
● 请教大牛一个关于htmlunit的问题。	● 浏览器的故事 (转载)
● 自动填写网上若干个“contact me” form的小程序？	● 问一个Mandriva 2007 下Tix的问题
● web scraping有啥方便的API或者框架不	● 请问有没有用过IMSL库的大虾？ (转载)
进入Programming版参与讨论

d*g
发帖数: 62

谢谢，我在submit按纽傍边右点，查看source，果然看到form
不过我在browser里填上form的名字后，也没有成功
br.select_form(name="IDPLogin")
# User credentials
br.form['Ecom_User_ID'] = 'ttt'
br.form['Ecom_Password'] = 'test'
# Login
br.submit()
出错的信息还是没有找到form
Traceback (most recent call last):
File "C:/Python27/work001.py", line 36, in
br.select_form(name="IDPLogin")
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 524, in
select_form
raise FormNotFoundError("no form matching "+description)
FormNotFoundError: no form matching name 'IDPLogin'
我试了 print br.forms, 得到如下：
0x00000000032586C8>>
这好象不是错误啊

【在 b***i 的大作中提到】

: 用鼠标右键在登陆附近点击，view source可以看到这个frame的html,而不是整个的。
: 就是说，你可以看到这个jsp动态网页的结果，其中会包含form
:
: RelayState

a9
发帖数: 21638

用Live HTTP Headers
iehttpheaders跟踪一下不行吗？

RelayState

【在 d*g 的大作中提到】

h*******s
发帖数: 8454

mechanize 好像搞不了动态网页。。。

【在 d*g 的大作中提到】

: 谢谢，我在submit按纽傍边右点，查看source，果然看到form
: 不过我在browser里填上form的名字后，也没有成功
: br.select_form(name="IDPLogin")
: # User credentials
: br.form['Ecom_User_ID'] = 'ttt'
: br.form['Ecom_Password'] = 'test'
: # Login
: br.submit()
: 出错的信息还是没有找到form
: Traceback (most recent call last):

t****0
发帖数: 861

俺不会python，可以试试Macros，简单易学

【在 d*g 的大作中提到】

C***x
发帖数: 468

Zennoposter

i*****o
发帖数: 1714

iframe应该在一个新的br里面吧。

【在 d*g 的大作中提到】

(共1页)

进入Programming版参与讨论

相关主题
● 问一个Mandriva 2007 下Tix的问题	● 请问如何实现自动向网站提交数据的程序?
● 请问有没有用过IMSL库的大虾？ (转载)	● 请教，网页抓取、内容整理提取用什么做比较简单
● Help: undefined symbol	● 请问怎么写外挂啊?
● repast 请进： python debian安装问题	● 如何实现将网页内容自动存取？
● 请教一个语言选择的弱问题	● 从网上读取数据，然后在本地计算用什么语言合适？
● 菜鸟问题	● 一个网页点击link和copy link address再打开得到不同结果
● Java可以自动填写webpage，然后submit吗？ (转载)	● python处理gb2312的问题
● 怎么可以取出网页中更新的内容 ?	● 请教大牛一个关于htmlunit的问题。

相关话题的讨论汇总
话题: mechanize话题: 2f%话题: form话题: 3a%话题: python

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天