TODO¶

Escape HTML

https://stackoverflow.com/questions/275174/how-do-i-perform-html-decoding-encoding-using-python-django → Add to Util nha @Siếu

Stack OverflowStack Overflow How do I perform HTML decoding/encoding using Python/Django? I have a string that is HTML encoded: '''<img class="size-medium wp-image-113"\ style="margin-left: 15px;" title="su1"\ src=&quo...

Given the Django use case, there are two answers to this. Here is its django.utils.html.escape function, for reference:

def escape(html): """Returns the given HTML with ampersands, quotes and carets encoded.""" return mark_safe(force_unicode(html).replace('&', '&').replace('<', '&l t;').replace('>', '>').replace('"', '"').replace("'", ''')) To reverse this, the Cheetah function described in Jake's answer should work, but is missing the single-quote. This version includes an updated tuple, with the order of replacement reversed to avoid symmetric problems:

def html_decode(s): """ Returns the ASCII decoded version of the given HTML string. This does NOT remove normal HTML tags like

. """ htmlCodes = ( ("'", '''), ('"', '"'), ('>', '>'), ('<', '<'), ('&', '&') ) for code in htmlCodes: s = s.replace(code[1], code[0]) return s

unescaped = html_decode(my_string) This, however, is not a general solution; it is only appropriate for strings encoded with django.utils.html.escape. More generally, it is a good idea to stick with the standard library:

Python 2.x:¶

import HTMLParser html_parser = HTMLParser.HTMLParser() unescaped = html_parser.unescape(my_string)

Python 3.x:¶

import html.parser html_parser = html.parser.HTMLParser() unescaped = html_parser.unescape(my_string)

>= Python 3.5:¶

from html import unescape unescaped = unescape(my_string)