This week I have been experimenting with a new database based on data taken from an access log. This database basically keeps information of all the pages that had been accessed on our website in the last month. An example of a register of this database would be:
Table 1. Register example of an access log database.
As you see in Table 1 we have a database’s register that represents one page of our website where we give information about live music performed in Barcelona. On this register we include information about user_visits, bot_visits, google_visits, page_vies, bounces and filters. All this information can be aggregated by processing the information on your apache access log. Now I will explain in detail the different numbers that we have extracted for each page:
- User visits: This includes visits that are coming from non-bot user agents. This means real visits from users.
- Bot visits: This includes the number of visits that had came from Googlebot in this period.
- Google visits: This includes numbers of visits from users that had came from Google. This information can be extracted analyzing the URL referer of each access. When we detect that a visit is from a user and its referer is google we count this visit as a google visit.
- Page views: When a page has been accessed by a Google’s user we count how many subsequent accesses had been made on our website by this user. So if a specific page in our website causes a high number of page views means that a user spends more time with us. On the other hand, when a page causes low number of page views (or only 1) means that this user has left our website without interacting much.
- Bounces: Counts how many Google visits had made only one page view. When a user makes one page view is because he has closed his window or he hasn’t clicked anywhere. So the page has not attracted his attention at all. Usually this happens when a user is searching information in Google and he checks several results in the search engine until he finds the appropriate part of information that he is looking for. On this way, 2 or 3 pages can receive this bounces.
- Filters: In the section filters I include information about the internal information of this page. Information that is not included in the access log and it comes from my specific use case. So by TC I mean type + city. In this case the “live-music-barcelona” (type: “Live Music”, city: “Barcelona”) would fit in this category called TC. We use this filters to perform analysis using GROUP BY. Aside this field you can think about anything you want. So any field useful to group information can be added to this database.
But is this information enough to take right decisions? All this information is useful but it can become even more useful if we mix it in new factors that keep information of several fields in one field. Here I have experimented with some factors. Some of them are already known and some of them are new.
- Bounce rate: Keeps the percentage of bounces for each page. So if all the page views of a page have caused a bounce the bounce rate will be 100% which is very bad because the engagement of this page is null because no one has stayed on this website after starting from this page. On the other hand when a page causes a good percentage of bounce rate 0% means that everyone that enters to your website through this page does something later. Formula : bounces/google_visits
- Average Page views: Keeps the page views’s average that users make after visiting your page through Google. This number is calculated for each page that has received visits from Google. Formula : page_views/google_visits
- Bot’s efficiency: This factor keeps the number of google visits that a bot had caused. For instance, if a bot had visited a page 1 time and this had caused a total of 4 visits to your website through this page. This would mean that the efficiency of this page was 4:1. A bad case of efficiency would be a page that had been visited 30 times because it was very linked on your home page and it hasn’t brought any visit (0:30). Formula: google_visits/bot_visits
- Bot’s efficiency (II): You can make this factor of efficiency even more useful by adding the concept of page views caused by a bot. I think this is more useful because includes the potential number of page views that you will get after a bot visit. So it is interesting to give priority to this pages on your internal linking to increase your total number of page views and your final revenues. Formula: page_views/bot_visits
Using this new numbers and factors, now we can perform data analysis of our access log from a database and get useful information in order to optimize our different groups of pages. As you will see on Table 2 we have created groups by the different filters that our website has. We won’t describe in detail all the different groups of pages created by our filters field but the idea is to have something to group your information and do not become mad with a table with millions and millions of rows. If you analyze your information in small groups is always easier to reach conclusions. Each group represents a different combination of filters TC means type + city i.e. “Live Music in Barcelona”, TCD means type + city + day i.e. “Live Music in Barcelona on 4th of July”, TCmu means type + music + city i.e. “Rock Live Music in Barcelona”. So a group of pages TCmu will contain information with pages such as “Rock Live Music in Barcelona”, “Jazz Live Music in Barcelona”, and “Pop Live Music in Barcelona” in the same group. So basically we are obtaining statistics about all the pages on our website that talk about music and we compare this group with other groups that might be talking about other subjects.
But how do we use this factors to analyze this information? We have used the new factor bot_efficiency2 to sort this groups from greater efficiency to poorer efficiency. The group that is classified as the best is the group ‘TC’ that at the same time is the best group if we sort by sum(page_views) and also the second better if we sort by bounce rate. This proves that bot_efficiency2 can be a good indicator of quality and a good source information if you want to reorganize your internal link building and make the Googlebot go more to the point. On the other hand, we can see how other groups are not performing very well such as ‘L’ or ‘VT’. They are working with efficiencies between 0.85 and 1.4 which means that every bot visit is only giving us 1:1 visits while other groups of pages are giving us 16:1. So our common sense says the more we maximize those pages with good efficiency the more we can increase our total number of page views and provide to the user a better navigational experience.
mysql> select sum(user_visits), sum(page_views), sum(bot_visits), sum(google_visits), (sum(page_views)/sum(google_visits)) as avg_pv, (sum(page_views)/sum(bot_visits)) as bot_efficiency2, (sum(google_visits)/sum(bot_visits)) as bot_efficency,(sum(bounces)/sum(google_visits)) as bounce_rate, filters from access_log where google_visits >= 1 group by filters having sum(google_visits) >= 25 order by bot_efficiency2 desc;
Table 2. Example of a select using this factors and sorting results by bot_efficiency2 (using group by filters)
To sum up, we have experimented with a new way of organizing our access log and structure it in a database that gives us the freedom of grouping pages and analyze in detail the behavior of each group. Here I had only shown one of the examples of groups that we can make. But with this methodology you can create groups for almost anything that fits well in your use case. By adding to a database this information of apache access logs (that looks quite useless in a log) we can take our website to the next level and take control of our search engine optimization.
We also have discovered a new factor that is not very well known by SEO practitioners that measures the bot’s efficiency of the google bot. This factor gives us the total number of page views that a visit from a bot generates on our website. This new factor seems that correlates very well with other known factors such as bounce rate or average page views.
Finally I would like to thank you to all the readers that have taken time to read this article. I’ll appreciate any contribution, suggestion, question or idea that you might have reached reading this article. I am still analyzing this data and in the next months I will experiment more and I expect to write more articles following up this issue.