How To: Get the rendered HTML of a webpage with Python

Jun 7, 2013

Hello again!

This time I am going to show you how to get the rendered HTML of a webpage using Python. For this task we need only ten lines of code and to import urllib2, the library that it will help us. The rendered HTML of a webpage is the HTML that our web-browser receives from the server and renders on the screen. In simple words, this is what we see on the screen and not the code used to generate the webpage.

This HTML code I use to find specific tags into the code.

Here is the code:

def get_page_code(link):
    import urllib2                 # import the necessary library
    html = ""                      # initiate an empty string as the variable that holds the html code
    req = urllib2.Request(link)    # initiate a request
    try:
        response = urllib2.urlopen(req) # store the response in a file
        html = response.read()     # read the content of the file
        response.close()           # close the file
    except ValueError:
        return html                # if there is an error return the empty string
    return html                    # return the generated html

Keep in mind that you will get a long string as output.

Tags: programmingpython

Archives

  1. January 2025
  2. The Cost of Slow Pipelines - A Tale of Wasted Time
  3. December 2024
  4. Keeping Software Simple to speed up Software Development
  5. October 2024
  6. The Kanban Café - A Story of Flow
  7. A Story on Accidental Complexity in Software Development
  8. February 2024
  9. Maximizing Software Development Productivity: The Power of Flow and Minimizing Interruptions
  10. December 2023
  11. Clean Code in Java: Writing Code that Speaks
  12. Clean Code in Java: A concise guide
  13. Understanding Value Objects in Java: A Brief Guide
  14. August 2023
  15. Must Have on Message Payload
  16. Centralised Management System For Message Schemas
  17. Consuming RabbitMQ Messages with Clojure: A Step-by-Step Tutorial with Tests
  18. January 2023
  19. Running a Spring Boot service with kubernetes
  20. December 2022
  21. Hosting a PWA with Jekyll and Github pages
  22. November 2022
  23. Global Day of Code Retreat
  24. Facilitating a mini Code Retreat
  25. October 2022
  26. The Curse of Optional
  27. September 2022
  28. Testing Spring Boot Microservices - Presentation
  29. March 2022
  30. TDD Workshop
  31. February 2022
  32. Value Objects in Java
  33. Efficient Java
  34. January 2022
  35. Spring Boot testing - Focus on your changes
  36. Product users - Personas
  37. December 2021
  38. Write code fit for testing
  39. November 2020
  40. Running a Spring Boot app with kubernetes
  41. September 2019
  42. Setup GPG on Mac and sign git repositories
  43. July 2019
  44. Running a Clojure Pedestal application on Raspberry Pi model B revision 2
  45. Clojure from zero to hero (part 3) - First endpoint
  46. Clojure from zero to hero (part 2) - A bit of syntax
  47. June 2019
  48. Clojure from zero to hero (1) - explaining project.clj
  49. Clojure from zero to hero (0) - creating a Pedestal app
  50. November 2017
  51. Introduction to Docker
  52. April 2015
  53. Git micro commits
  54. July 2014
  55. Google Glass Development - setup tools, environment and turn on debugging on Glass
  56. June 2013
  57. How To: Get the rendered HTML of a webpage with Python
  58. Set union of two lists in Python