| Data mining and content aggregation with PHP |
|
This is an interesting article I came across. It is mirrored and the link is provided at the end of the page. The functions described in this article will allow you to parse unstructured data from HTML pages using PHP and perl regular expressions, The unstructured data can then be stored in a structured fashion in a database of your choosing. Data Mining Basics Tutorial:Luckily, there are alternatives. Using PHP and MySQL you can effectively accomplish the same content aggregation tasks with little cost, but you have to learn the basics first. Most data mining projects follow these basic steps. Basic Data Mining Steps
Step 1. - Fetching the Data / HTML ContentDevelopers have done this work already. The Snoopy PHP class, which can be downloaded at http://sourceforge.net/projects/snoopy/, has all the necessary tools to download an HTML page from the internet. It's advisable to be considerate when you are retrieving content from any site. Contact the Web site admin before fetching with any automated scripting, don't bombard the Web site with thousands of HTTP requests a second, always take notice of any copyrights and setup a contract for content sharing if needed. Leeching Web content from a Web site without permission can lead to serious legal issues. Steps 2, 3 & 4 - Parsing the DataOnce you have the HTML page(s), you will need to parse the data into a friendlier format. This is done so that pattern matching will be more reliable and consistent in the future. The functions shown below will greatly help in this matter. PHP Data Parsing Functions:
|
|
| Last Updated ( Friday, 18 July 2008 ) |