【python3】urlparse和urlsplit的使用与两者的区别

结论

先说结论

urlparse与urlsplit一般用于分析网页url的结构，从而快速提取网页中的各个参数，如协议、域名、路径、查询字段等
urlparse与urlsplit的区别是，urlsplit不匹配params

urlparse与urlsplit的使用与区别

首先，我们先来看一个标准url的格式

scheme://username:password@hostname:port/path;params?query#fragment

各参数意义如下：

scheme: 协议
username:password：表示用于认证的账号和密码，但是一般不会使用
hostname: 主机（IP/域名）
port: 端口
path: 路径
params: 参数（以;分割）
query: 查询（以&分割）
fragment: 锚点，或者说位置，用于网页定位

在了解了url参数之后，我们直接看下面这个例子：

from urllib.parse import urlparse

url = "https://root:123456@www.abc.com:8083/uploads/;type=docx?filename=python3.docx#urllib"
result = urlparse(url)

print("scheme:", result.scheme)
print("host:", result.hostname)
print("port:", result.port)
print("path:", result.path)
print("params:", result.params)
print("query:", result.query)
print("fragment:", result.fragment)

运行结果：

scheme: https
username: root
password: 123456
host: www.abc.com
port: 8083
path: /uploads/
params: type=docx
query: filename=python3.docx
fragment: urllib

可以看到，urlparse解析出了各个参数（字符串形式），这就是urlparse的基本使用，也是主要使用。那么，urlsplit和urlparse在使用上有什么区别呢？
答案是，urlsplit不能匹配params字段。我们把上面的代码改一下看看结果

from urllib.parse import urlsplit

url = "https://root:123456@www.abc.com:8083/uploads/;type=docx?filename=python3.docx#urllib"
result = urlsplit(url)

print(result)

print("scheme:", result.scheme)
print("username:", result.username)
print("password:", result.password)
print("host:", result.hostname)
print("port:", result.port)
print("path:", result.path)
# print("params:", result.params)
print("query:", result.query)
print("fragment:", result.fragment)

运行结果：

SplitResult(scheme='https', netloc='root:123456@www.abc.com:8083', path='/uploads/;type=docx', query='filename=python3.docx', fragment='urllib')
scheme: https
username: root
password: 123456
host: www.abc.com
port: 8083
path: /uploads/;type=docx
query: filename=python3.docx
fragment: urllib

可以看到，生成的SplitResult对象中没有params这个属性，同时可以发现在path中出现了;type=docx。那么如果还不放心的话，也可以使用dir方法查看，结果也确实如此

>>> from urllib.parse import urlsplit, urlparse
>>> url = "https://www.abc.com:8083/uploads/;type=docx?filename=python3.docx#urllib"
>>> dir(urlsplit(url))
['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_asdict', '_encoded_counterpart', '_field_defaults', '_fields', '_hostinfo', '_make', '_replace', '_userinfo', 'count', 'encode', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username']
>>> dir(urlparse(url))
['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_asdict', '_encoded_counterpart', '_field_defaults', '_fields', '_hostinfo', '_make', '_replace', '_userinfo', 'count', 'encode', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'params', 'password', 'path', 'port', 'query', 'scheme', 'username']

因此，如果想要兼容性更好的话，推荐使用urlparse

urlparse（对应ParseResult对象）和urlsplit（对应SplitResult对象）的常用属性和方法

属性

scheme: 协议
username: 用于认证的用户名
password: 用于认证的密码
hostname: 主机名
port: 端口
netloc: 相当于username:password@hostname:port
path: 路径
params: 参数（SplitResult对象没有）
query: 查询参数
fragment: 锚点，或者说位置，用于网页定位

方法

geturl: 返回当前对象的url
encode(encoding=‘ascii’,……): 将当前对象编码为一个新的流对象。新的对象中的方法与原来方法相比多了一个decode方法少了一个encode方法
count/index: 不常用，分别返回对象的tuple形式下某字符串出现的个数和首次出现的位置，但是要注意的是，对象的tuple形式是类似('https', 'www.abc.com', '/uploads/', 'type=docx')这样的

文章由极客之家整理，本文链接：https://www.bmabk.com/index.php/post/97110.html