最近有了点爬取热搜新闻的需求,找个微博热搜热热身吧。

一页应该是50个,就实时爬取top50

使用的库:

requests、bs4

完整代码:


import requests
import bs4

url = "https://s.weibo.com/top/summary"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
    'Accept-Language': 'zh-CN,zh;q=0.9',
}

response = requests.get(url=url,headers=headers)
htmlText = response.text
soup = bs4.BeautifulSoup(htmlText,"html.parser")
newList = soup.find_all("td",attrs={"class":"td-02"})
num = 0
for i in newList[1:]:
    data = i.contents[1].contents[0]
    title = i.contents[3].contents[0]
    num+=1
    print(str(num)+"、",data,title)

爬取效果:

说点什么
头像
支持Markdown语法
好耶,沙发还空着ヾ(≧▽≦*)o
Loading...