A search engine is a software program for information retrieval designed to find specific information stored in a database based on criteria provided by the user. Search engines allow users to find the information they are looking for, often through queries based on keywords or phrases. If multiple results are returned by the search engine, they are typically presented as a hierarchical list based on relevance or importance. The most familiar form of search engine is web-based search engines, such as Google or Bing, which scan websites indexing pages for users to find. When users enter a search term, the web search engine returns what it considers the most relevant information based on web page titles, contents, keywords, and a range of other ranking criteria. Companies and others online use search engine optimization (SEO) to try and get their web page ranked higher on popular search engines.
The goal of a search engine is to help people find information. They are designed to provide people with the right information based on a set of criteria, such as quality and relevance. Webpage and website providers use search engines to make money and collect data. To achieve this, they must build user trust. This can be done through the following:
- Organic results—These are seen as more trustworthy than paid, ad-based results.
- Authority—Google seeks to establish a web page's authority to identify it as a source of true information
- Privacy—Search engines, such as DuckDuckGo, use privacy protection to establish trust.
Crawler-based search engines use a bot to crawl and index new content added to the database. They generally take four steps before displaying websites in the search results:
- Crawling—Search engines crawl the whole web to fetch the web pages available. A piece of software called a crawler, bot, or spider performs the crawling of the entire web. The crawling frequency depends on the search engine, and it may take a few days between crawls. This is the reason a user can sometimes see their old or deleted page content in the search results. The search results will show the new, updated content once the search engines crawl the site again.
- Indexing—Indexing is the next step after crawling. It is a process of identifying the words and expressions that best describe the page. The identified words are referred to as keywords, and the page is assigned to the identified keywords. Sometimes when the crawler does not understand the meaning of the page, the site may rank lower on the search results. Pages need to be optimized for search engine crawlers to make sure the content is easily understandable. Once the crawlers pick up the correct keywords, the page will be assigned to those keywords and rank high on search results.
- Calculating relevancy—A search engine compares the search string in the search request with the indexed pages from the database. Since it is likely that more than one page contains the search string, the search engine starts calculating the relevancy of each of the pages in its index with the search string. There are various algorithms to calculate relevancy. Each of these algorithms has different relative weights for common factors like keyword density, links, or meta tags. That is why different search engines give different search results pages for the same search string. It is a known fact that all major search engines periodically change their algorithms. If a user desires to keep their site at the top, they also need to adapt their pages to the latest changes. This is one reason to devote permanent efforts to SEO—to be at the top.
- Retrieving results—The last step in search engines’ activity is retrieving the results. Basically, it is simply displaying them in the browser in an order. Search engines sort the endless pages of search results in the order of most relevant to the least relevant sites.
Examples of crawler-based web search engines include most of the popular search engines in use:
Human-powered directories, also referred to as open directory systems, depend on human-based activities for listings. They typically work by the following:
- Site owners submit a short description of the site to the directory along with the category it is to be listed.
- Submitted sites then manually review and add in the appropriate category or rejected for listing.
- Keywords entered in a search box are matched with the description of the sites. This means the changes made to the content of web pages are not taken into consideration, as it is only the description that matters.
- A good site with good content is more likely to be reviewed for free, compared to a site with poor content.
Yahoo! Directory and DMOZ are perfect examples of human-powered directories. Automated search engines, like Google, have almost entirely replaced human-powered directory-style search engines.
Hybrid Search Engines use both crawler bots and manual indexing for listing the sites in search results. Some crawler-based search engines use crawlers as a primary mechanism and human-powered directories as a secondary mechanism.