当前位置：首页 > news >正文

如何免费注册网站平台做网站要多长时间

news 2026/4/28 19:36:19

如何免费注册网站平台,做网站要多长时间,继续网站建设,wordpress最好的页面编辑器1 代理池搭建 2 代理池使用 2.1 搭建django后端测试 3 爬取某视频网站 4爬取某视频网站 5 bs4介绍和遍历文档树 1 代理池搭建 # ip代理-每个设备都会有自己的IP地址-电脑有ip地址---》访问一个网站---》访问太频繁---》封ip-收费#xff1a;靠谱稳定--提供api-免费#xff…1 代理池搭建 2 代理池使用 2.1 搭建django后端测试 3 爬取某视频网站 4爬取某视频网站 5 bs4介绍和遍历文档树 1 代理池搭建 # ip代理-每个设备都会有自己的IP地址-电脑有ip地址---》访问一个网站---》访问太频繁---》封ip-收费靠谱稳定--提供api-免费不稳定--自己写api用-开源的https://github.com/jhao104/proxy_pool免费代理---》爬取免费代理---》验证---》存到redis中flask搭建web---》访问某个接口随机获取ip# 搭建步骤1 git clone gitgithub.com:jhao104/proxy_pool.git2 pycharm中打开3 安装依赖创建虚拟环境 pip install -r requirements.txt4 修改配置文件 DB_CONN redis://127.0.0.1:6379/05 运行调度程序和web程序# 启动调度程序python proxyPool.py schedule# 启动webApi服务python proxyPool.py server6 api介绍/ GET api介绍 None/get GET 随机获取一个代理可选参数: ?typehttps 过滤支持https的代理/pop GET 获取并删除一个代理可选参数: ?typehttps 过滤支持https的代理/all GET 获取所有代理可选参数: ?typehttps 过滤支持https的代理/count GET 查看代理数量 None/delete GET 删除代理 ?proxyhost:ip# http和https代理-以后使用http代理访问http的地址-使用https的代理访问https的地址2 代理池使用公网和内网是网络术语用于描述不同的网络范围和可访问性。以下是它们的定义和示例**公网 (Internet)**:- **定义** 公网是指全球范围的互联网连接了世界各地的计算机、服务器和设备允许它们通过因特网协议IP进行通信。- **示例** - 当您使用浏览器访问网站例如 Google、Facebook 或 Twitter您是通过公网与这些网站的服务器通信。- 电子邮件发送和接收也是通过公网进行的例如使用 Gmail 或 Outlook 邮箱。- 在社交媒体上与全球范围内的朋友互动如发布推文、分享照片或发布视频。**内网 (Intranet)**:- **定义** 内网是指一个私有网络通常在组织、公司或机构内部使用用于内部通信、数据共享和资源管理。它通常不直接连接到公网。- **示例** - 企业内部网络大多数组织都有内部网络用于员工之间的通信和共享内部资源。这些网络可以包括内部网站、文件共享和内部电子邮件系统。- 家庭网络在家庭网络中您可以有多个设备例如台式电脑、笔记本电脑、智能手机、智能家居设备等连接到一个本地路由器形成一个内部网络。这个内部网络允许这些设备共享文件、打印机和互联网连接但通常不会直接暴露给公网。在这两个示例中公网是全球范围的互联网而内网是限定在特定组织或家庭的私有网络。内网通常需要特定的访问权限才能连接到公网并且通常通过防火墙或路由器进行保护以确保安全性和隐私。 2.1 搭建django后端测试 import requests res requests.get(http://192.168.1.252:5010/get/?typehttp).json()[proxy] proxies {http: res, } print(proxies) # 我们是http 要使用http的代理 respone requests.get(http://139.155.203.196:8080/, proxiesproxies) print(respone.text)# 步骤1 写个django只要访问就返回访问者ip2 部署在公网上---》python manage.py runserver 0.0.0.0:80003 本机使用代理测试import requestsres1 requests.get(http://192.168.1.63:5010/get/?typehttp).json()dic {http: res1[proxy]}print(dic)res requests.get(http://47.93.190.59:8000/, proxiesdic)print(res.text)# 补充代理有透明和高匿透明的意思使用者最终的ip是能看到的高匿隐藏访问者真实ip服务端看不到3 爬取某视频网站 # 目标爬取该网站的视频保存到本地 https://www.pearvideo.com/ import requests import re# 请求地址是 # https://www.pearvideo.com/category_loading.jsp?reqType5categoryId1start0 res requests.get(https://www.pearvideo.com/category_loading.jsp?reqType5categoryId1start0) # print(res.text) # 解析出视频地址---》正则 video_list re.findall(a href(.*?) classvervideo-lilink actplay, res.text) # print(video_list) for video in video_list:video_id video.split(_)[-1]url https://www.pearvideo.com/ videoprint(url) # 向视频详情发送请求---》解析出页面中mp4视频地址---》直接下载即可header {Referer: url}res_json requests.get(fhttps://www.pearvideo.com/videoStatus.jsp?contId{video_id}mrd0.14435938848299434,headersheader).json()mp4_url res_json[videoInfo][videos][srcUrl]real_mp4_url mp4_url.replace(mp4_url.split(/)[-1].split(-)[0], cont-%s % video_id)print(real_mp4_url)# 把视频保存到本地res_video requests.get(real_mp4_url)with open(./video/%s.mp4 % video_id, wb) as f:for line in res_video.iter_content(1024):f.write(line)# resrequests.get(https://www.pearvideo.com/video_1526860) # print(res.text)# 第一层反扒需要携带referfer # header {Referer: https://www.pearvideo.com/video_1527879} # res requests.get(https://www.pearvideo.com/videoStatus.jsp?contId1527879mrd0.14435938848299434, headersheader) # print(res.text)# 反扒二 # https://video.pearvideo.com/mp4/adshort/20190311/ 1698982998222 -13675354_adpkg-ad_hd.mp4 返回的 # https://video.pearvideo.com/mp4/adshort/20190311/ cont-1527879 -13675354_adpkg-ad_hd.mp4 能播的 # s https://video.pearvideo.com/mp4/adshort/20190311/1698982998222-13675354_adpkg-ad_hd.mp4 # print(s.replace(s.split(/)[-1].split(-)[0], cont-1527879)) 4 爬取新闻 # 没有一个解析库---》用正则---》解析库--》html/xml import requests # pip install BeautifulSoup4 from bs4 import BeautifulSoupres requests.get(https://www.autohome.com.cn/news/1/#liststart) # print(res.text) # 找到页面中所有的类名叫article ul标签 soup BeautifulSoup(res.text, html.parser) # bs4的查找 ul_list soup.find_all(class_article, nameul) # 所有的类名叫article ul标签 print(len(ul_list)) # 循环再去没一个中找出所有li for ul in ul_list:li_list ul.find_all(nameli)for li in li_list:h3 li.find(nameh3)if h3:title h3.texturl https: li.find(namea)[href]if url.startswith(//):url https: urldesc li.find(namep).textreade_count li.find(nameem).textimg li.find(nameimg)[src]print(f文章标题{title}文章地址{url}文章摘要{desc}文章阅读数{reade_count}文章图片{img})# 爬5页---把图片保存到本地---把打印的数据存储到mysql中--》建个表 5 bs4介绍和遍历文档树 # BeautifulSoup 是一个可以从HTML或XML文件中提取数据的Python库解析库 # pip install beautifulsoup4from bs4 import BeautifulSouphtml_doc htmlheadtitleThe Dormouses story/title/head body p classtitlebThe Dormouses story/bspanlqz/span/pp classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1Elsie/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./pp classstory.../psoup BeautifulSoup(html_doc, html.parser) # 解析库可以使用 lxml速度快必须安装可以使用python内置的 html.parser# print(soup.prettify())-----重点遍历文档树---------- #遍历文档树即直接通过标签名字选择特点是选择速度快但如果存在多个相同的标签则只返回第一个 #1、用法通过 . 遍历 # ressoup.html.head.title # ressoup.p # print(res) #2、获取标签的名称 # ressoup.html.head.title.name # ressoup.p.name # print(res) #3、获取标签的属性 # ressoup.body.a.attrs # 所有属性放到字典中 {href: http://example.com/elsie, class: [sister], id: link1} # ressoup.body.a.attrs.get(href) # ressoup.body.a.attrs[href] # ressoup.body.a[href] # print(res)#4、获取标签的内容 # ressoup.body.a.text #子子孙孙文本内容拼到一起 # ressoup.p.text # ressoup.a.string # 这个标签有且只有文本才取出来如果有子孙就是None # ressoup.p.strings # print(list(res))#5、嵌套选择# 下面了解 #6、子节点、子孙节点 # print(soup.p.contents) #p下所有子节点 # print(list(soup.p.children)) #得到一个迭代器,包含p下所有子节点 # print(list(soup.p.descendants)) #获取子子孙节点,p下所有的标签都会选择出来#7、父节点、祖先节点 # print(soup.a.parent) #获取a标签的父节点 # print(list(soup.a.parents) )#找到a标签所有的祖先节点父亲的父亲父亲的父亲的父亲... #8、兄弟节点 # print(soup.a.next_sibling) #下一个兄弟 # print(soup.a.previous_sibling) #上一个兄弟 # print(list(soup.a.next_siblings)) #下面的兄弟们生成器对象 # print(soup.a.previous_siblings) #上面的兄弟们生成器对象

查看全文

http://www.hkea.cn/news/14452208/