Google and other companies write computer programs "spiders or bots" (which are generic terms that could mean a wide variety of things) that go out to websites and read all the data on the web pages and then puts it into the Google or whatever companies' databases. Then when you go to Google.com and run a search theoretically they can find out the best results and what pages they are located it. In a nutshell they are programs that are harvesting information.