The urlparse Module
Functions for manipulating URLs
- URL's have the following general format
scheme:/netloc/path;parameters?query#fragment
- urlparse(urlstring) - Parses a URL into components
import urlparse
t = urlparse.urlparse("http://www.python.org/index.html")
# Produces ('http','www.python.org','/index.html','','','')
- urlunparse(tuple) - Turns tuple of components back into a URL string
url = urlparse.urlunparse(('http','www.python.org','foo.html',
'bar=spam',''))
# Produces "http://www.python.org/foo.html?bar=spam"
- urljoin(base, url) - Combines a base and relative URL
urlparse.urljoin("http://www.python.org/index.html","help.html")
# Produces "http://www.python.org/help.html"
|