What is robots.txt file and structure of this file

Abhigyan Singh 12th Jul 2020

Robots Exclusion Protocol or robots.txt protocol is a method to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code.

It is a file that is created by webmasters to instruct robots (typically search engine robots) on how to crawl & index pages on their website.

Web site owners use the robots.txt file to give instructions about their site to web robots; this is called the Robots Exclusion Protocol.

If a site owner wishes to give instructions to web robots they must place a text file called robots.txt in the root of the web site hierarchy (e.g. https://www.abc.com/robots.txt). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the web site. If this file doesn't exist, web robots assume that the web owner wishes to provide no specific instructions, and crawl the entire site.
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

Structure of a Robots.txt File

The structure of a robots.txt is very simple and flexible – it is an endless list of user agents and disallowed files and directories. For writing robots.txt file, there is no need to know any programming language but they should must know the structure of this file as well as importance of this file.

Basically, the syntax is as follows:
User-agent:
Disallow:
User-agent: They are search engines' crawlers
Disallow: It lists the files and directories to be excluded from indexing.
Without understand the structure of this file, if you write this file then It directly affect your indexed URL

Authored By Abhigyan Singh

He is a continuous blogger and has blogged on different topic. He loves to surf Internet and always trying to get new Idea about new Technology and Innovations and sharing these great information to all the technology lovers.

ALSO ON DISCUSS DESK