Interaction with search engines

To manage your site according to engine rules, some details...

Resume Summary

How to get an English search engine
To exclude a page from an index
Percentage of clicks based on position on results pages
Manage a temporarily unavailable site
Manage duplicate content
Design change management
Domain change
You can edit snippets
Site map useful
How robots.txt is handled
Further information

How to get English

search engine?

In order not to be automatically redirected to the Goole.fr when entering google.com, a language parameter is added:

https://www.google.com/?hl=en

Likewise for all other foreign languages. When you want to access the search engine, it automatically redirects you to the regional version of the engine. This is appropriate for most users, but not for a webmaster or user who wants to do a google.com. search
To get google.com without redirecting to the French engine, enter the URL in the line:

www.google.com/ncr

What can be placed in the bookmark. "Ncr" stands for "no country redirect."

To exclude a page from an index

Insert a meta tag into the <head> section of an HTML page :

<meta name="robots" content="noindex" />

The robots.txt file at the root of the site may also contain instructions for search engines to exclude pages .

Manage a temporarily unavailable site

This can be if the situation is mismanaged, which suggests that it will be known in advance that the site will be decommissioned.
Otherwise, webmasters may think that the site, if it is not very important, is closed and delete backlinks. Similarly, search engine robots can return a negative signal.
If decommissioning is planned, then the ideal is to return the HTTP 503 code that is provided for this situation. In PHP, the code for the home page or all pages in the case of CMS can be as follows:

header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Retry-After: Mon, 25 Jan 2011 12:00:00 GMT');

This code was provided by Google .

Manage duplicate content

Satisfied duplicate - the presence of duplicate pages not on the site or sites, but in the Google index or other search engine.
This can occur with different URLs pointing to the same page, or with copies of pages. This would be a way for the site to come up with the results of monopolizing the first page, but this is never visible, so we can conclude that the engines really fine the happy duplicate.
Duplicate content may also include part of an article from another site on its site. This is a guaranteed penalty factor, unless it is a quote placed in the "block quota" tag. Quotes must be accompanied by personal text.

Design change management

Webmasters often experimented with losing positioning with redesigning the site without changing the content, immediately after passing Googlebot.
This experience was shared with Webmasterworld. The positioning returns to the previous state after a variable delay. It is likely that a mass change will cause some kind of signal from the engine .
In addition, Google recommends not changing the design when changing domains and redirecting pages.
Therefore, it is recommended to change the site gradually, and not globally. If something triggers decommissioning, it will be easier to see why .

You can edit snippets

This is how Google calls the description under the page title in the search results. We can change it, Google spoke about it through its blog for webmasters (Improve snippets with a meta description makeover), we must use the meta description of the tag, which is at the top of the page and has the following form:

<head>
...autres balises...
<meta name="description" content="information lisible et utile">
</head>

The text assigned to the content attribute must have special qualities:

It should be written in good French with a sentence or two, not a continuation of keywords.
It should briefly describe the content of the page.
It should be attractive: arouse the desire to see the content

Google uses this tag when it considers it valid and most importantly:

When a page has dynamic content that robots cannot know about.
When it contains mostly images or videos and little text.
When the request exactly matches the content of the page

Site map useful

A site map is a standard XML or HTML file that lists all pages in a site as URLs. Sitemap can be created automatically by CMS or with a Simple Map script on a static site.

The main goal of sitemap is to make Google search easier. But there is another.
Dynamic links are ignored by robots from search engines. You can create a static link using XML or HTML sitemap.
XML sitemap can now be used by leading search engines. The single sitemap format is universally accepted.
Each time you change the contents of the site, you will need to restore the site map. But you only need to register it on the engine once.
After registering sitemap, you can get statistics and analysis of your Google site with possible errors.
The XML site address can be placed in the robots.txt file .
There is a special sitemap format for video indexing.
conclusion, save the XML map if your site is incorrectly indexed, indexing is not updated quickly, or you want to get statistical information.

Links and add-ons: See Sitemaps.org FAQs

How robots.txt is handled

The file must be at the root of any Web site. It tells search engines which pages to scan or skip.
Typical default content for robots.txt is:

User-Agent: *
Disallow:

User-Agent is the crawler name for each search engine, and Disallow specifies the full path (from/to the beginning) to the page or directory you want to exclude from the link.
To exclude the cgi directory, the format would be:

User-Agent: *
Disallow: /cgi-bin/

To exclude a file:

User-Agent: * 
Disallow: /rep/nomfichier.html

The specified file names must be case-sensitive.
Do not place multiple file or crawl names on the same line, place multiple User-Agent groups, or place multiple Disallow lines with the same agent user.
Do not insert an empty line without # at the beginning of the line.
You can check the validity of the txt robot file using the tools for the webmaster from Google, as well as edit it online.
According to Matt Cutts for Google, if a page is placed in disallow, the Google robot ignores it and does not parry it, but if that page has backlinks, it may appear in the results (disallow does not mean no index). Links to this page will be used for the description.

Further information

Invisible instructions for motors. With x-robots.

Google wrote an FAQ about its Googlebot robot. Guarantee exploration of the facility, etc.