当前位置：首页 > news >正文

凡科免费做的网站漳州网站建设网站运营

news 2026/4/25 0:18:02

凡科免费做的网站,漳州网站建设网站运营,电子邮箱注册,网站建设怎么销售在前面的例子用#xff0c;我用了BeautifulSoup来从58同城抓取了手机维修的店铺信息#xff0c;这个库使用起来的确是很方便的。本文是BeautifulSoup 的一个详细的介绍#xff0c;算是入门把。文档地址#xff1a;http://www.crummy.com/software/BeautifulSoup/bs4/doc/ …在前面的例子用我用了BeautifulSoup来从58同城抓取了手机维修的店铺信息这个库使用起来的确是很方便的。本文是BeautifulSoup 的一个详细的介绍算是入门把。文档地址http://www.crummy.com/software/BeautifulSoup/bs4/doc/ 什么是BeautifulSoup Beautiful Soup 是用Python写的一个HTML/XML的解析器它可以很好的处理不规范标记并生成剖析树(parse tree)。它提供简单又常用的导航navigating搜索以及修改剖析树的操作。它可以大大节省你的编程时间。直接看例子 #!/usr/bin/python# -*- coding: utf-8 -*-from bs4 import BeautifulSouphtml_doc htmlheadtitleThe Dormouses story/title/head body p classtitlebThe Dormouses story/b/pp classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1Elsie/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./pp classstory.../psoup BeautifulSoup(html_doc)print soup.titleprint soup.title.nameprint soup.title.stringprint soup.pprint soup.aprint soup.find_all(a)print soup.find(idlink3)print soup.get_text()结果为titleThe Dormouses story/title title The Dormouses story p classtitlebThe Dormouses story/b/p a classsister hrefhttp://example.com/elsie idlink1Elsie/a [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, a classsister hrefhttp://example.com/tillie idlink3Tillie/a] a classsister hrefhttp://example.com/tillie idlink3Tillie/aThe Dormouses story The Dormouses story Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.… 可以看出soup 就是BeautifulSoup处理格式化后的字符串soup.title 得到的是title标签soup.p 得到的是文档中的第一个p标签要想得到所有标签得用find_all 函数。find_all 函数返回的是一个序列可以对它进行循环依次得到想到的东西. get_text() 是返回文本,这个对每一个BeautifulSoup处理后的对象得到的标签都是生效的。你可以试试 print soup.p.get_text() 其实是可以获得标签的其他属性的比如我要获得a标签的href属性的值可以使用 print soup.a[‘href’],类似的其他属性比如class也是可以这么得到的soup.a[‘class’]。特别的一些特殊的标签比如head标签是可以通过soup.head 得到其实前面也已经说了。如何获得标签的内容数组使用contents 属性就可以比如使用 print soup.head.contents就获得了head下的所有子孩子以列表的形式返回结果可以使用 [num] 的形式获得 ,获得标签使用.name 就可以。获取标签的孩子也可以使用children但是不能print soup.head.children 没有返回列表返回的是 listiterator object at 0x108e6d150, 不过使用list可以将其转化为列表。当然可以使用for 语句遍历里面的孩子。关于string属性如果超过一个标签的话那么就会返回None否则就返回具体的字符串print soup.title.string 就返回了 The Dormouse’s story 超过一个标签的话可以试用strings 向上查找可以用parent函数如果查找所有的那么可以使用parents函数查找下一个兄弟使用next_sibling,查找上一个兄弟节点使用previous_sibling,如果是查找所有的那么在对应的函数后面加s就可以如何遍历树使用find_all 函数 find_all(name, attrs, recursive, text, limit, **kwargs) 举例说明 print soup.find_all(title) print soup.find_all(p,title) print soup.find_all(a) print soup.find_all(idlink2) print soup.find_all(idTrue)返回值为 [titleThe Dormouses story/title] [p classtitlebThe Dormouses story/b/p] [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, a classsister hrefhttp://example.com/tillie idlink3Tillie/a] [a classsister hrefhttp://example.com/lacie idlink2Lacie/a] [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, a classsister hrefhttp://example.com/tillie idlink3Tillie/a]通过css查找,直接上例子把 print soup.find_all(“a”, class_“sister”) print soup.select(“p.title”) 通过属性进行查找 print soup.find_all(“a”, attrs{“class”: “sister”}) 通过文本进行查找 print soup.find_all(text“Elsie”) print soup.find_all(text[“Tillie”, “Elsie”, “Lacie”]) 限制结果个数 print soup.find_all(“a”, limit2) 结果为 [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, a classsister hrefhttp://example.com/tillie idlink3Tillie/a] [p classtitlebThe Dormouses story/b/p] [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, a classsister hrefhttp://example.com/tillie idlink3Tillie/a] [uElsie] [uElsie, uLacie, uTillie] [a classsister hrefhttp://example.com/elsie idlink1Elsie/a, a classsister hrefhttp://example.com/lacie idlink2Lacie/a]总之通过这些函数可以查找到想要的东西。

查看全文

http://www.hkea.cn/news/14401405/