乱世佳人续集(第三方库的下载与安装)

2: Requests是http网络请求库。

写验证=假，绕过https加密。

请求请求的方式

requests.get(‘http://baidu.com ‘)

requests.post(‘http://baidu.com ‘)

requests.put(‘http://baidu.com ‘)

requests . delete(‘ http://Baidu.com ‘)

requests.head(http://baidu.com ‘)

requests . options(‘ http://Baidu.com ‘)

(1)requests.get是将请求参数发送到服务器或url指定的地址，可以拼接在url地址中如‘http:www.baidu.com？Page=20content=迷茫的月光年龄=20 ‘

或者下面的边缘。

数据={ 0

“第20页”:

【内容】:《迷茫的月光》

}

Url=(‘ http:www.baidu.com ‘)

html=reuquests . get(URL，data=data)

(2) Requests.post是发送到服务器的请求。

Url=’ ‘ http://baidu.com ‘

Html=requests.post (URL，数据={‘page’ :’ 20′})或data=data与上述数据相同。

(2)使用请求库时，需要添加请求头标题。

标题={ 0

主机’ : ‘www.chanpin100.com，

升级-不安全-请求’ : ‘1 ‘，

Pragma’: ‘无缓存’，

用户代理’ : ‘ Mozilla/5.0(Windows NT 6.1；WOW64)applebwebkit/537.36(KHTML，像Gecko)Chrome/58 . 0 . 3029 . 110 Safari/537.36 SE 2。X MetaSr 1.0 ‘

}

(3)如果用代理ip来爬，是这样写的。

#代理={ 0

# # ‘ http ‘ : ‘ http://10 . 10 . 1 . 1033603128 ‘，

# # ‘ https ‘ : ‘ http://10 . 10 . 1 . 1033601080 ‘，

# ‘ http ‘ : ‘ http://118 . 190 . 95 . 35 ‘

# }

完整的写作如下

地址Url=’ http:www.baidu.com ‘

标题请求标题信息

标题={ 0

主机’ : ‘www.chanpin100.com，

升级-不安全-请求’ : ‘1 ‘，

Pragma’: ‘无缓存’，

用户代理’ : ‘ Mozilla/5.0(Windows NT 6.1；WOW64)applebwebkit/537.36(KHTML，像Gecko)Chrome/58 . 0 . 3029 . 110 Safari/537.36 SE 2。X MetaSr 1.0 ‘

}

代理ip地址

#代理={ 0

# # ‘ http ‘ : ‘ http://10 . 10 . 1 . 1033603128 ‘，

# # ‘ https ‘ : ‘ http://10 . 10 . 1 . 1033601080 ‘，

# ‘ http ‘ : ‘ http://118 . 190 . 95 . 35 ‘

# }

获取(网址，标题=标题，代理=代理)

Print(html)返回200表示成功。

Print(html.text)返回此url中的html代码。

(3)二进制存储

获取这个网址。

HTMl _ JSON=Requests . content(Html)

Html_css=open(r’e:/python ‘，’ wb ‘)

Html_css.write(Html_json)

3:jsonpath专用于解析从后台返回的json数据。

(1)jsonpath的用法是jsonpath.jsonpath (html.text，’ $.标题’)如下图所示

这样，返回的数据可以用jsonpath解析，但前提是api数据要用python的内置库json转换成json数据。

写如下

Jsonpath的语法

4:re正则库介绍：re是python标准库，re正则库叫模糊匹配。

稀土库的主要功能

re.search()

搜索字符串中匹配正则表达式的第一个位置，并返回match对象。

re.match()

从字符串开头匹配正则表达式并返回匹配对象。

re.findall()

搜索字符串，并将所有匹配的子字符串作为列表类型返回。

re .拆分()

根据正则表达式的匹配结果划分字符串，返回列表类型。

re.finditer()

搜索字符串并返回一个迭代类型的匹配结果，每个迭代元素都是一个匹配对象。

re.sub()

替换字符串中与正则表达式匹配的所有子字符串，并返回替换后的字符串。

乱世佳人续集(第三方库的下载与安装)

Published by

风君子

发表回复取消回复

最新文章

标签

书签

Published by

风君子

发表回复 取消回复

最新文章

标签

书签

发表回复取消回复