consoleart

Wednesday, November 24, 2010

Parse Hyperlinks - Python

 While doing my previous post (downloading content from codingbat) i got this script for parsing the url links


urls = re.findall(r'href=[\'"]p?([^\'" >]+)', line)

r - is provided to denote the string is a rawstring(we dont need to specify escape charcters)
href=[\'"] - the string must start with "HREF=" and can either have any of the characters (' - single quote, " - double quote) next to it.
p? - the next character must be a p
([^\'" >]+) - it must end with a greater than symbol which must be preceeded either by single or double quote.

Posted by Unknown at 8:53 PM
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Labels: Parsing hyperlinks, Python, URLLIB

No comments:

Post a Comment

Newer Post Older Post Home
Subscribe to: Post Comments (Atom)

Followers

Blog Archive

  • ►  2011 (1)
    • ►  April (1)
  • ▼  2010 (18)
    • ►  December (4)
    • ▼  November (14)
      • Coding bat - solutions - solution
      • Coding bat - solutions - Code
      • Parse Hyperlinks - Python
      • Coding bat - Post Soutions
      • Database access - GUYI - Part I - Server Auth Page
      • PyGame and GUI Programming - Multiple Database Acc...
      • PyGame and GUI Programming - Multiple Database Acc...
      • PyGame and GUI Programming - Multiple Database Access
      • Running Python scripts from Python
      • Python - GUI programming script - Tkinter
      • Download files - Python
      • Structure Classes
      • Python My SQL Access - Part II
      • Python - My Sql Access - Part 1 - Installation
Simple theme. Powered by Blogger.