Patent attributes
A computer for identifying categories of web pages. The computer comprises a processor, a non-transitory memory, and an application stored in the non-transitory memory. When executed by the processor the application builds an unvalidated table of uniform resource locators (URLs) in the non-transitory memory based on crawling the World Wide Web, navigates to at least some of the URLs stored in the unvalidated table, analyzes web pages to identify keywords, evaluates the URLs to belong to one or more web page categories using web page categorization rules based on the identified keywords, stores an entry for each evaluated URL in a validated table in the non-transitory memory, each entry comprising the URL, the one or more categories associated to the URL, and the keywords identified in the web page associated to the URL, performs a frequency analysis of keywords associated to URLs, and adapts the web page categorization rules.