javaå¯ä»¥ä½¿ç¨jsoupãhtmlparserçå·¥å
·è¿è¡htmlç读åå解æï¼ä»¥ä¸æ¯è¯¦ç»è¯´æï¼
1ãjsoup æ¯ä¸æ¬¾ Java çHTML 解æå¨ï¼å¯ç´æ¥è§£ææ个URLå°åãHTMLææ¬å
容ãå®æä¾äºä¸å¥é常çåçAPIï¼å¯éè¿DOMï¼CSS以å类似äºJQueryçæä½æ¹æ³æ¥ååºåæä½æ°æ®ãæ®è¯´å®æ¯åºäºMITåè®®åå¸çã
jsoupç主è¦åè½å¦ä¸ï¼
ä»ä¸ä¸ªURLï¼æ件æå符串ä¸è§£æHTMLï¼
使ç¨DOMæCSSéæ©å¨æ¥æ¥æ¾ãååºæ°æ®ï¼
å¯æä½HTMLå
ç´ ãå±æ§ãææ¬ï¼
示ä¾ä»£ç ï¼
Document doc = Jsoup.parse(input, "UTF-8", "
http://www.dangdang.com");
Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}
2ãhtmlparseræ¯ä¸ä¸ªçº¯çjavaåçhtml解æçåºï¼å®ä¸ä¾èµäºå
¶å®çjavaåºæ件ï¼ä¸»è¦ç¨äºæ¹é ææåhtmlãå®è½è¶
é«é解æhtmlï¼èä¸ä¸ä¼åºéãç°å¨htmlparserææ°çæ¬ä¸º2.0ããæ®è¯´htmlparserå°±æ¯ç®åæ好çhtml解æååæçå·¥å
·ãããæ è®ºä½ æ¯æ³æåç½é¡µæ°æ®è¿æ¯æ¹é htmlçå
容ï¼ç¨äºhtmlparserç»å¯¹ä¼å¿ä¸ä½ç§°èµã
å¨çº¿ææ¡£ï¼
http://www.osctools.net/apidocs/apidoc?api=HTMLParserï¼
http://htmlparser.sourceforge.net/project-info.html示ä¾ä»£ç ï¼
Parser parser = new Parser ("
http://www.dangdang.com");
NodeList list = parser.parse (null);
Node node = list.elementAt (0);
NodeList sublist = node.getChildren ();
System.out.println (sublist.size ());