ããå¾æ©ä¹åï¼å¦ä¹ Python webç¼ç¨çæ¶åï¼å°±æ¶åä¸ä¸ªPythonçurllibãå¯ä»¥ç¨urllib.urlopen("url").read()å¯ä»¥è½»æ¾è¯»å页é¢ä¸é¢çéæä¿¡æ¯ãä½æ¯ï¼éçæ¶ä»£çåå±ï¼ä¹æ¥è¶å¤çç½é¡µä¸æ´å¤ç使ç¨javascriptãjQueryãPHPçè¯è¨å¨æçæ页é¢ä¿¡æ¯ãå æ¤ï¼ç¨urllibåå»æå页é¢HTMLå°±ä¸è¶³ä»¥è¾¾å°æ们æ³è¦çææã
ãã解å³æè·¯ï¼
ããæä¸ä¸ªæè·¯æ为ç®åçæè·¯å¯ä»¥å¨æ解æ页é¢ä¿¡æ¯ãurllibä¸å¯ä»¥è§£æå¨æä¿¡æ¯ï¼ä½æ¯æµè§å¨å¯ä»¥ãå¨æµè§å¨ä¸å±ç°å¤ççä¿¡æ¯å
¶å®æ¯å¤ç好çHTMLææ¡£ãè¿ä¸ºæ们æåå¨æ页é¢ä¿¡æ¯æä¾äºå¾å¥½çæè·¯ãå¨Pythonä¸æä¸ä¸ªå¾æåçå¾å½¢åºââPyQtãPyQtè½ç¶æ¯å¾å½¢åºï¼ä½æ¯ä»éé¢ QtWebkitãè¿ä¸ªå¾å®ç¨ãè°·æçChromeåè¹æçSafarié½æ¯åºäºWebKitå
æ ¸å¼åçï¼æ以æ们å¯ä»¥éè¿PyQtä¸å¾QtWebKit æ页é¢ä¸çä¿¡æ¯è¯»åå è½½å°HTMLææ¡£ä¸ï¼å解æHTMLææ¡£ï¼ä»HTMLææ¡£ä¸æåæ们æ³ç¨å¾ä¿¡æ¯ã
ããä½è
æ¬äººå®ç¨Mac OS Xãåºè¯¥å¨WindowsåLinuxå¹³å°ä¹å¯ä»¥éç¨ç¸åçåæ³ã
ãã1ãQt4 library
ããLibraryï¼èä¸æ¯CreatorãLibraryå¨Macçé»è®¤å®è£
è·¯å¾ä¸ï¼åºè¯¥æ¯/home/username/Developor/ï¼ä¸è¦æ¹åQt4çé»è®¤å®è£
è·¯å¾ãå¦åå¯è½å®è£
失败ã
ããå®æ¹ç½åï¼
http://qt-project.org/downloadsãã
ãã2ãSIPãPyQt4
ããè¿ä¸¤ä¸ªè½¯ä»¶å¯ä»¥å¨å¨PyQtçå®ç½æ¾å°ãä¸è½½çæ¯å®çæºç ãMacåLinuxéè¦èªå·±ç¼è¯ã
ããä¸è½½å°åæ¯ï¼
http://www.riverbankcomputing.co.uk/software/pyqt/downloadããå¨ç»ç«¯åæ¢å°æ件解ååçç®å½ä¸ã
ããå¨ç»ç«¯ä¸è¾å
¥
ããpython configure.py
ããmake
ããsudo make install
ããè¿è¡å®è£
ç¼è¯ã
ããSIPåPyQt4两个å®è£
æ¹æ³ç¸åãä½æ¯PyQt4ä¾èµSIPãæ以å
å®è£
SIPåå®è£
PyQt4
ãã
ãã1ã2两æ¥å®æä¹åï¼PythonçPyQt4ç模åå°±å®è£
好äºãå¨Python shellä¸è¾å
¥import PyQt4ççè½ä¸è½æ¾å°PyQt4ç模åã
ãã
ãã3ãSpynner
ããspynneræ¯ä¸ä¸ªQtWebKitç客æ·ç«¯ï¼å®å¯ä»¥æ¨¡ææµè§å¨ï¼å®æå 载页é¢ãå¼åäºä»¶ãå¡«å表åçæä½ã
ããè¿ä¸ªæ¨¡åå¯ä»¥å¨Pythonçå®ç½æ¾å°ã
ããä¸è½½å°å:
https://pypi.python.org/pypi/spynner/2.5ãã解ååï¼cdå°å®è£
ç®å½ï¼ç¶åè¾å
¥sudo python configure.py installå®è£
该模åã
ããè¿æ ·Spynner模åå°±å®è£
å®æäºï¼å¨python shellä¸è¯è¯import spynnerçç该模åæ没æå®è£
å®æã
ãã
ããåå°é¡¶é¨
ããSpynnerçç®å使ç¨
ããSpynnerçåè½åå强大ï¼ä½æ¯ç±äºæ¬äººè½åæéï¼å°±ä»ç»ä¸ä¸å¦ä½æ¾ç¤ºç½é¡µçæºç å§ã
ãã#! /usr/bin/python
ãã#-*-coding: utf-8 -*-
ãã
ããimport spynner
ãã
ããbrowser = spynner.Browser()
ãã#å建ä¸ä¸ªæµè§å¨å¯¹è±¡
ãã
ããbrowser.hide()
ãã#æå¼æµè§å¨ï¼å¹¶éèã
ãã
ããbrowser.load("
http://www.baidu.com")
ãã#browser ç±»ä¸æä¸ä¸ªç±»æ¹æ³loadï¼å¯ä»¥ç¨webkitå è½½ä½ æ³å è½½ç页é¢ä¿¡æ¯ã
ãã#load(æ¯ä½ æ³è¦å è½½çç½åçå符串形å¼)
ãã
ããprint browser.html.encode("utf-8")
ãã#browser ç±»ä¸æä¸ä¸ªæåæ¯htmlï¼æ¯é¡µé¢è¿è¿å¤çåçæºç çå符串.
ãã#å°å
¶è½¬ç 为UTF-8ç¼ç
ãã
ããopen("Test.html", 'w+').write(browser.html.encode("utf-8"))
ãã#ä½ ä¹å¯ä»¥å°å®åå°æ件ä¸ï¼ç¨æµè§å¨æå¼ã
ãã
ããbrowser.close()
ãã#å
³é该æµè§å¨
ããéè¿è¿ä¸ªç¨åºï¼å°±å¯ä»¥æ¯è¾å®¹æçæ¾ç¤ºwebkitå¤çç页é¢HTMLæºç äºã
ãã
ããåå°é¡¶é¨
ããspynneråºç¨
ããä¸é¢ä»ç»ä¸ä¸spynnerçç®ååºç¨ï¼éè¿ç®åçç¨åºï¼å¯ä»¥è·åä½ å¨æµè§å¨ä¸çå°ç页é¢çå
¨é¨å¾çãç¨HTMLParserãBeautifulSoupçé½å¯ä»¥å®æHTMLParserææ¡£ç解æãèæéæ©HTMParserã
ãã#!/usr/bin/python
ãã
ããimport spynner
ããimport HTMLParser
ããimport os
ããimport urllib
ãã
ããclass MyParser(HTMLParser.HTMLParser):
ãã
ããdef handle_starttag(self, tag, attrs):
ããif tag == 'img':
ããurl = dict(attrs)['src']
ããname = os.path.basename(dict(attrs)['src'])
ããif name.endswith('.jpg') or name.endswith('.png') or name.endswith('gif'):
ããprint "Download.....", name
ããurllib.urlretrieve(url, name)
ãã
ãã
ããif __name__ == "__main__":
ããbrowser = spynner.Browser()
ãã
ããbrowser.show()
ãã
ããbrowser.load("
http://www.artist.cn/snakewu1994/StyleBasis_Four/en_album_607236.shtml")
ãã
ããParser = MyParser()
ãã
ããParser.feed(browser.html)
ãã
ããprint "Done"
ãã
ããbrowser.close()
ããéè¿è¿ä¸ªç¨åºï¼å¯ä»¥ä¸è½½ä½ å¨é¡µé¢ä¸çå°çå
¨é¨å¾çãç®åçå è¡ç¨åºå°±å®æäºè¿ä¸ªè°å·¨çä»»å¡ãå®ç°äºå¾ççæ¹éå¤çãè¿çæ¯Pythonè¯è¨çä¼å¿ï¼åè°å·¨çä»»å¡äº¤ç»ç¬¬ä¸æ¹å§ã