最近有了点爬取热搜新闻的需求,找个微博热搜热热身吧。
一页应该是50个,就实时爬取top50
使用的库:
requests、bs4
完整代码:
import requests
import bs4
url = "https://s.weibo.com/top/summary"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
'Accept-Language': 'zh-CN,zh;q=0.9',
}
response = requests.get(url=url,headers=headers)
htmlText = response.text
soup = bs4.BeautifulSoup(htmlText,"html.parser")
newList = soup.find_all("td",attrs={"class":"td-02"})
num = 0
for i in newList[1:]:
data = i.contents[1].contents[0]
title = i.contents[3].contents[0]
num+=1
print(str(num)+"、",data,title)
爬取效果: