consoleart

Wednesday, November 24, 2010

Parse Hyperlinks - Python

While doing my previous post (downloading content from codingbat) i got this script for parsing the url links

urls = re.findall(r'href=[\'"]p?([^\'" >]+)', line)

r - is provided to denote the string is a rawstring(we dont need to specify escape charcters)
href=[\'"] - the string must start with "HREF=" and can either have any of the characters (' - single quote, " - double quote) next to it.
p? - the next character must be a p
([^\'" >]+) - it must end with a greater than symbol which must be preceeded either by single or double quote.

Posted by Unknown at 8:53 PM

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Parsing hyperlinks, Python, URLLIB

No comments:

Post a Comment

Newer Post Older Post Home

Subscribe to: Post Comments (Atom)

Followers

Blog Archive

► 2011 (1)
- ► April (1)

▼ 2010 (18)
- ► December (4)
- ▼ November (14)

Simple theme. Powered by Blogger.