Sensitive documents under your website?
Like all search engines, Google uses a program to visit the server where your website is located. These programs are called “spiders” or “crawlers”. They read the documents on the server, index them into their databases and makes them available for people searching the web.
In the old days, only certain files could be interpreted by the spiders and short-minded webmasters would “store’ all kinds of documents on the server assuming they were invisible to the search engines. This in itself is a security breach; hackers wrote their own programs to find the documents, anyway.
Nightmare: in the old days, you could search for “passwords” in a search engine and get returned hundreds of password files stored on servers by oh so foolish web folk.
Nowadays, a simple piece of code in a robots.txt file tells the search engine spider to skip the document and not publish it. This can be used for files that are subordinate to the site such as “raw” images. The code doesn’t stop hackers, though, the code is a simple instruction to search engines and NOT a firewall.
Google’s mission, after all, is to make available any public document it comes across. “Our specialty is discovering, crawling and indexing publicly available information,” said Google spokesman David Krane. “We define public as anything placed on the public Internet… The primary burden falls to the people who are incorrectly exposing this information.”
However, many are still taken by surprise when documents for internal use, many of which are sensitive, are suddenly in plain view on the web.
Webmasters should know how to protect documents before they start on a site. If it ain’t for the world to see, don’t put it out there, no matter how convenient!
Check with your people that you do not have internal, private documents on the website server – sites tend to start nice and tidy, and then, after a few revisions, grow messy. Clean it up now unless you like hanging your washing in public.