The urlparse Module

Functions for manipulating URLs

  • URL's have the following general format
     scheme:/netloc/path;parameters?query#fragment
  • urlparse(urlstring) - Parses a URL into components
     import urlparse
     t = urlparse.urlparse("http://www.python.org/index.html")
     # Produces ('http','www.python.org','/index.html','','','') 
  • urlunparse(tuple) - Turns tuple of components back into a URL string
     url = urlparse.urlunparse(('http','www.python.org','foo.html',
                               'bar=spam',''))
     # Produces "http://www.python.org/foo.html?bar=spam"
  • urljoin(base, url) - Combines a base and relative URL
     urlparse.urljoin("http://www.python.org/index.html","help.html")
     # Produces "http://www.python.org/help.html"
<<< O'Reilly OSCON 2000, Advanced Python Programming, Slide 96
July 17, 2000, beazley@cs.uchicago.edu
>>>