[心得] Python爬蟲入門 @ 胖虎的祕密基地

2017年首發

開始學Python爬蟲(Crawler)

系統環境:

windows 7
python3.52

pip 套件列表
requests、beautifulsoup4、jupyter

以下使用這個教學影片範本

[embed]http://youtu.be/3xQTJi2tqgk[/embed]

影片是2014.6

作者用的python2.7

所以裡面使用的網址跟python語法稍有不同

以下就直接貼出我實作的程式碼:(2017/1/06)

import requests

from bs4 import BeautifulSoup



url='http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA'



res=requests.get(url)



soup=BeautifulSoup(res.content, 'html5lib')



get_data=soup.find_all("div",{"class":"info"})



for p in get_data:

 print (p.contents[0].find_all("a",{"class":"business-name"})[0].text)

 

 try:

  print (p.contents[1].find_all("span",{"class":"street-address"})[0].text)

 except:

  pass

 

 try:

  print (p.contents[1].find_all("span",{"class":"locality"})[0].text.replcae(",",""))

 except:

  pass 

 

 try:

  print (p.contents[1].find_all("span",{"itemprop":"addressRegion"})[0].text)

 except:

  pass

 

 try:

  print (p.contents[1].find_all("span",{"itemprop":"postalCode"})[0].text)

 except:

  pass

 

 try:

  print (p.contents[1].find_all("div",{"class":"phones phone primary"})[0].text)

 except:

  pass

後續:

1.程式只抓取第一頁的資料，可增加抓取其他頁面

2.程式可以放Github

正義的胖虎

胖虎的祕密基地

正義的胖虎發表在痞客邦留言(0) 人氣()

E-mail轉寄

«	四月 2024	»
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

四月 2024

日

一

二

三

四

五

六

胖虎的祕密基地

我是孩子王~我是人見人愛的胖虎~~

[心得] Python爬蟲入門

留言列表

月曆

部落格文章搜尋

參觀人氣

近期文章

文章彙整

文章分類

最新迴響

最新訂閱

新聞交換(RSS)

我的連結

減肥

Linux資訊

程式設計

常去的Blog

投資理財

料理食譜

我的朋友

實用的連結 (IT類)

實用的連結 (生活類)

旅遊(遊學)資訊

誰來我家

«	四月 2024					»
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

«	四月 2024					»
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

«	四月 2024					»
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30