In Episode 5 of my blogging series, I promised to cover how to make your blog known to robots and as well popular amongst humans. However, it is imperative you have an idea of what crawling, indexing, ranking and rendering are all about so as to enjoy the golden episode 6.
In summary, this is a prerequisite for “learn blogging and SEO Episode 6”. You may want to follow the blogging series by clicking here.
Now, if you are yet to jump into the blogging world, click here to start your personal blog. Also, if you have up to $35 or #15,000 but having issues starting a blog, feel free to drop your comment or click here to visit the WordPress blog designing and hosting episode. I will help you start your blog today.
Back to the topic, “The difference between crawling, indexing, ranking and rendering”. What really are these terms about and does the knowledge help me? You will see them right here. But, note that this topic is more of technical concepts and as such may introduce you to strange terms. However, I will try to simple things as much as I can so that life will be easy for you.
See Also: 30 reasons why you should blog.
Crawling, Ranking, Indexing And Rendering
Talking about crawling, spiders and other crawling creatures come to mind right? You are not wrong at all. However, this topic takes you to another form of crawling mechanism.
Now, what Is Crawling? Crawling is the process by which search engines discover updated content on the web, such as new blogs or pages, changes to existing sites or blog posts, and dead links.
Let me simplify this; crawling is the process whereby search engines like Google, bing and yahoo uses their crawler or bots to go through the web to find new posts, new blogs, new links and updated posts and as well update the cache version they have.
After crawling your pages, the Google crawler will now decide whether to index it or not. But when you use ROBOTS.TXT to stop bots from crawling your webpage, then they will not be able to search, index or rank your posts. A program that can be referred to as a ‘crawler’, ”˜bot’ or ”˜spider’ is used to your blog pages.
What Is Indexing? Once a search engine processes each of the pages it crawls, it compiles a massive index of all the words it sees and their location on each page. It is essentially a database of billions of web pages.
This extracted content is then stored, with the information then organised and interpreted by the search engine’s algorithm to measure its importance compared to similar pages.
Servers based all around the world allow users to access these pages almost instantaneously. Storing and sorting this information requires significant space and both Microsoft and Google have over a million servers each.
What Is Ranking? Everyday, you say things like, “I want to rank higher on Google and Bing search results”.. Do you really know what ranking is about and how Google ranks Flashlearners? Let’s see…
Once a keyword is entered into a search box, search engines will check for pages within their index that are a closest match; a score will be assigned to these pages based on an algorithm consisting of hundreds of different ranking signals.
These pages (or images & videos) will then be displayed to the user in order of score.
So in order for your site to rank well in search results pages, it’s important to make sure search engines can crawl and index your site correctly – otherwise they will be unable to appropriately rank your website’s content in search results.
What Is GoogleBot, Crawling, And Indexing?
Kissmetrics was able to simplify terms as regards Googlebot crawling and Indexing. If Googlebot hasn’t crawled and indexed this Flashlearners page, you won’t have been able to see it as a result in Google search. It is crawling and indexing that makes your blog visible in Search Engines.
- The Googlebot is simply the search bot software that Google sends out to collect information about documents on the web to add to Google’s searchable index.
- Crawling is the process where the Googlebot goes around from website to website, finding new and updated information to report back to Google. The Googlebot finds what to crawl using links.
- Indexing is the processing of the information gathered by the Googlebot from its crawling activities. Once documents are processed, they are added to Google’s searchable index if they are determined to be quality content. During indexing, the Googlebot processes the words on a page and where those words are located. Information such as title tags and ALT attributes are also analyzed during indexing.
What is Rendering? Rendering displays what you see on your screen while surfing the internet. It communicates with the networking layer of the browser to grab HTML code and other items passed from a remote server. The majority of web pages crawled are now being rendered by Google. This page you are reading now is a rendered webpage.
Most time, in fact all the time, Google bots blocks Adsense codes and so it cannot be rendered. This can be shown when you fetch and render your webpage using the Google Search Console or Webmaster tool.
How A Web Page Is Rendered
With every passing day, you search things on Google and Bing, you get results and then click to view the results. Then flashlearners opens and you read what you have searched. Nice one right?, but have you ever wondered the process it takes for a webpage to open (render)? That what you are about to learn in the steps below as adapted from Friendlybit:
- There is need to search for something in flashlearners and you quickly type an URL into address bar of your Opera Mini, Internet Explorer or Google Chrome.
- The browser parses the address you entered to find the protocol, host, port, and path.
- It forms a HTTP request and to reach the host, it first needs to translate the human-readable host into an IP number, and it does this by doing a DNS lookup on the host
- Then a socket needs to be opened from the user’s computer to that IP number, on the port specified (most often port 80)
- When a connection is open, the HTTP request is sent to the host
- The host forwards the request to the server software (most often Apache) configured to listen on the specified port
- The server inspects the request (most often only the path), and launches the server plugin needed to handle the request (corresponding to the server language you use, PHP, Java, .NET, Python?)
- The plugin gets access to the full request, and starts to prepare a HTTP response.
- To construct the response a database is (most likely) accessed. A database search is made, based on parameters in the path (or data) of the request
- Data from the database, together with other information the plugin decides to add, is combined into a long string of text (probably HTML).
- The plugin combines that data with some meta data (in the form of HTTP headers), and sends the HTTP response back to the browser.
- The browser receives the response, and parses the HTML (which with 95% probability is broken) in the response
- A DOM tree is built out of the broken HTML
- Stylesheets are parsed, and the rendering information in each gets attached to the matching node in the DOM tree
- The browser renders the page on the screen according to the DOM tree and the style information for each node
- You see the page on the screen
- You get annoyed the whole process was too slow.
7 Key Components Of Your Web Browser
Path interactive has the following to say as regards the 7 components of your web browser. You will really enjoy it.
1. Layout Engine: This is what you click or type into to expect a result. For example, you type into search box or the space to enter your web address and it passes to the rendering engine. This is what the layout engine does.
2. Rendering Engine: This converts a mere code into beautiful pictures and visual displays.
3. User Interface: This is what you see while using a browser. It is the interface through which you communicate with your browser. It is in the user interface you search for things or check your bookmarks and browsing history.
5. Network Layer : This is a function of the browser that happens behind the scenes and handles network functions such as encryption, HTTP and FTP requests, and all network settings such as timeouts and the handling of HTTP status codes
7. Operating System Interface : The browser must interact with the operating system to draw out several elements of the page like drop down boxes and the chrome of a window (close, maximize, and minimize buttons).
What Does This Mean For SEO?
The fact that Google looks at the fully rendered version of a webpage means that you can no longer look solely at the source code of a site to understand how it is perceived by a search engine spider.
You should assume that search engine spiders see the same page you see in your browser as it appears on page load. Here are some examples of when this matters:
SUMMARY: Google crawls your site and then indexes what it sees as a cached version of the page.
The page may change in design before they crawl and then reindex the new page hence the term cache is used as almost a caveat to say that the page may HAVEchanged since they crawled it.
If your webpages aren’t crawled then they can’t be indexed. Making sure your site can be crawled by bots is a priority. Set up a Google Webmaster Tools account and then submit an XML sitemap to help Google crawl and index it.
Others On The Series....
Recommended: Complete season 1 of my blogging series Important: Full Season 2 of my blogging series Recommended: Complete Season 3 summary S03E1: How to remove strange characters in your blog SO3E2: How to upload Apk files to WordPress library SO3E2: Write long and interesting posts SO3E3: Common mistakes to avoid in blogging SO3E5: How to cure 404 error in blogging SO3E6: best banks for bloggers SO3E7: Future And Present state of SEO SO3E8: Causes of error occurred during Indexing SO3E9: Types of links link SEO SO3E10: Before you pay for hosting RECOMMENDED: Latest on the series