python在用bs4处理html文件时,怎么去掉不需要的div,span,label,p等等属性
本文演示如何去掉带display:none;属性的div
关键代码:
info = [s.extract() for s in soup(style="display:none;")]
演示结果:
完整代码:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
soup = BeautifulSoup('<html><head></head><body><div style="display:none;">下午两点</div>吃过饭了吗<div>好好工作</div></body></html>','lxml')
info = [s.extract() for s in soup(style="display:none;")]
print(info)
print(soup.text)