I’ve been playing with Beautiful Soup, a Python library for pulling data out of HTML and XML files. The documents say:
It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
I have some ideas for using Beautiful Soup to create some tools for extracting and consolidating information from sources ranging from journal articles to our electronic medical record.
The name comes from Chapter 10 of Alice in Wonderland, wherein the Mock Turtle sings the following to the Gryphon:
'Beautiful Soup, so rich and green, Waiting in a hot tureen! Who for such dainties would not stoop? Soup of the evening, beautiful Soup! Soup of the evening, beautiful Soup! Beau--—ootiful Soo--—oop! Beau--—ootiful Soo—--oop! Soo--—oop of the e—--e—--evening, Beautiful, beautiful Soup! 'Beautiful Soup! Who cares for fish, Game, or any other dish? Who would not give all else for two pennyworth only of beautiful Soup? Pennyworth only of beautiful Soup? Beau--—ootiful Soo—--oop! Beau--—ootiful Soo—--oop! Soo—oop of the e--—e--—evening, Beautiful, beauti--—FUL SOUP!'