Phparchitect's Guide to Web Scraping

    Matthew Turland

    Musketeers.Me, LLC
    2010
    192 páginas
    6h 24m
    ISBN-13: 9780981034515
    Inglês

    Despite all the advancements in web APIs and interoperability, it's inevitable that, at some point in your career, you will have to "scrape" content from a website that was not built with web services in mind. And, despite its sometimes less-than-stellar reputation, web scraping is usually an entire legitimate activity-for example, to capture data from an old version of a website for insertion into a modern CMS. This book, written by scraping expert Matthew Turland, covers web scraping techniques and topics that range from the simple to exotic using a variety of technologies and frameworks: . Understanding HTTP requests . The PHP HTTP streams wrapper . cURL . pecl_http . PEAR: HTTP . Zend_Http_Client . Building your own scraping library . Using Tidy . Analyzing code with the DOM, SimpleXML and XMLReader extensions . CSS selector libraries . PCRE pattern matching . Tips and Tricks . Multiprocessing / parallel processing

    Estatísticas

    Avaliações

    0 / 0
    • 5 estrelas0%
    • 4 estrelas0%
    • 3 estrelas0%
    • 2 estrelas0%
    • 1 estrelas0%