Reconsideration Requests
Show Video

Google+ Hangouts - Office Hours - 20 June 2014

Direct link to this YouTube Video »

Key Questions Below

All questions have Show Video links that will fast forward to the appropriate place in the video.
Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video

JOHN MUELLER: Yes, OK, welcome everyone, to today's Google Webmaster Central Office Hours Hangouts. My name is John Mueller. I am a webmaster trends analyst at Google in Switzerland, so I work together with a Web Search engineering team and together with webmasters like you all, to try to make sure that information is flowing in both directions. We have a bunch of questions that were submitted already through the Q&A feature, but if you want to, if one of you wants to start off with a question of your own, feel free to jump on in.

AUDIENCE: So [INAUDIBLE] I have a question for you. I recently Google [INAUDIBLE] 4.0 update has come, right. So we came to a website or a client website which is effective with this [INAUDIBLE] with this update. So I just wanted to clarify a few things on the delivery aspects because it's [INAUDIBLE] us all the web pages might have some problem in that. So just to elaborate that, we have a website called, and it's a hotel website.

JOHN MUELLER: Oh, it's your website.


JOHN MUELLER: I didn't know was yours. OK. OK, go ahead.

AUDIENCE: We have a hotel website called, and it's a-- They have geography like Delhi, city-wide geographies. So what is the site searcher is, like, So all the listing pages are Delhi [INAUDIBLE]. So all of the hotels listing are coming on that URL. Now we have individual property pages, hotel property pages. Let's say, [INAUDIBLE]. So all of the individual property pages are coming like that. So will it be considered as doorway pages?

JOHN MUELLER: It can be. I mean, if these are pages that have minimal unique content, that could be something that could be seen as a Doorway Page. If the content on these pages is really low quality and primarily the same as you'd have from elsewhere, or maybe compiled from other sources, like maybe content from the Wikipedia page put together, then that's something that could be seen as a low-quality Doorway Page. On the other hand, if these are individual pages for individual properties that are unique, they have good information of their own, they can stand by themselves, then that's something that could be fine. It's normal, for example, that maybe you have different locations where you're active, and you have some information about those locations because they're all slightly unique and separate. But if you're just creating pages for different parts of the city, and there's one hotel in the city, but you're creating pages that try to match different aspects, and you're just tweaking keywords, or just adding a sentence or removing one here or there, then that would really be seen as kind of a low quality type page.

AUDIENCE: So if we see that, we do not have pages which has similar content because there is no duplicate content, but there's a possibility of having title tags and all this stuff to target similar keywords. So is there a possibility that they are being considered as low web pages?

JOHN MUELLER: I wouldn't focus only on the title tags. I would really make sure that these pages can stand on their own, that it's not something that is primarily meant to be diluted across a large number of pages and reused, but really that they stand on their own, so that when someone comes to this page, they understand this is the definitive source of information on the specific topic. It's not something that's tweaked slightly just to make it look like it matches the keywords it's targeting. But really, that it's the definitive source of information here. So if this is about a specific hotel in that city, having the information about that hotel is really important there, that you have everything that matches there, and if the title tag looks similar to something else that's on the site, that's, that's a technicality.

AUDIENCE: OK. So if we've got a website, means let's say there's one company who has first and data content of our website? So what is the possibility that we can be, we can also be penalized because of that?

JOHN MUELLER: If someone else is just taking content from your website, the main way to handle that is to maybe look into the legal aspect of whether or not you can take action based on that. That's probably the most direct way because if you can force the other person to remove that content because it's really your content, maybe it's copyrighted even, then that's something that's, along these other programs, because the algorithms don't have to think about that. On the other hand, if this content stays up, then it's something where we try to recognize that the primary source of this information and we usually do a pretty good job of that. But it can get tricky. For example, if your content is copied on a website that is pretty good, and your website overall, is seen as something that's fairly low quality, then it's hard for us to understand that this low quality website still has individual pages that are actually pretty good.

AUDIENCE: OK. If, let's say, we have a website which is creating dynamic pages, [INAUDIBLE] and getting information from the databases, we have some database to get the information [INAUDIBLE] and data [INAUDIBLE] in the city. But you do, because of some reasons, some city does not have any hotel listing in that. So because of that, though we have cities in the databases, but there are no hotels, so that page is blank. So, and it is creating on the website. Dynamically people are searching or coming on that particular city. So, will it be a low quality aspect for the Google?



JOHN MUELLER: Yes, definitely. I mean, this is something, someone is searching for information, and you're providing a page that says, oh, we don't have any information on this topic, but here are the keywords. And that's something that we would see as an "empty search results" page essentially. And that's something we don't want to find in search, and when our algorithms recognize this on a website on a larger scale, then it becomes really hard for the algorithms to figure out, these are low quality pages and should be treated differently versus the whole website has a lot of low quality pages and should be treated differently. I would definitely "no index" those or "404" them, whatever you can do there.

AUDIENCE: OK. One last question from my site.


AUDIENCE: Can-- Have [INAUDIBLE] off internet links lead to Penguin penalty?


AUDIENCE: Internally?



JOHN MUELLER: No. The Penguin algorithm looks at a lot of web span issues, and usually those are based on, when we're looking at links, those are problems from links coming from externally, where page rank is being passed from an external website to your website. And how you handle that within your website, that's essentially up to you. That's not something that we would take action on.

AUDIENCE: So, again, based on that, if we've got a website and suddenly in a month, a sudden spike we can see in the-- externally from a particular website. So on the same month, can we be penalized from Penguin?

JOHN MUELLER: I Penguin takes a certain amount of time to update its data and to update the data that's shown in search. So at the moment, that's more in the period of several months I think. And it's been quite a while since the last update. So usually that wouldn't be an issue. So if there's time to go through the disavow tool and remove those if you want, or to contact the webmaster and have them removed completely.

AUDIENCE: Well, thank you.


AUDIENCE: Thank you.

JOHN MUELLER: All right, let's take a look at some of these questions. [INTERPOSING VOICES]


AUDIENCE: Can I make you a question?


AUDIENCE: Thank you very much. Well, we are a very novelty car rental start-up here in Spain, and we launch it just three years ago. And we receive it-- received great coverage from blogs, newspapers, and so on, from time to time. Our competitors are very well established with more than 12 years in most cases, because are very well known companies, and sometimes local companies. But my question is, how long does it take to achieve the full rank that the word website deserves for the most competitive keywords? In this field, you know, a very competitive keywords?

JOHN MUELLER: Yeah, there's no absolute answer for this kind of a question. Because it can happen that something that's completely innovative jumps up and suddenly starts ranking number one for these queries because it's so innovative that everyone is writing about and it's being recommended by everyone. Kind of like you would see, I think, with maybe Uber, for example, where they're doing something very different, and they're starting to become relevant in an area that has been very, almost static for a long time. So there's a potential for things to move very quickly sometimes, but at the same time, if these websites have been working very hard on being a great website, being a great service in this area, then it's sometimes very hard to jump in and kind of beat them at their own game. Because they had a lot of experience, there's a lot of supporting information that we find online that says oh, these websites are very well known, they're doing their job well. These are businesses that I would trust, for example. And that's-- a lot of things that take a lot quite a long time to actually build up. So I would say there's potential to be disruptive and to change a lot of things in search very quickly, but it's not something that you can count on, and it's not something where I would say there is a certain amount of time that you have to wait or a certain amount of time that-- when you should absolutely see changes and immediately.

AUDIENCE: We have been a novelty, not so much as Uber, you know, but we tried to compete with webs like Kayak or webs like little brokers here in Spain. And it's very hard for us to achieve [INAUDIBLE] positions with, [? least ?] in these three years, a very good evolution. But it's like a barrier to jump in the first page of results. We have a very good profile, with being very honest with our battling work. We have had a very good coverage. And three years, a lot of time. It can be very much more time? Or or less?

JOHN MUELLER: I can't say, I really can't say. So this is something where I think there's always a lot of potential for new websites to be very innovative and do something very different. Whereas, if you try to compete in the same game as an existing business or where there's a whole market already and essentially just try to be a little bit better or almost as good or just as good, then that's very hard to get into, especially if it's been something that has been built up over the years. So if you'd like to see more of your website in the search results, I'd probably look into where you can kind of gain a foothold where existing companies aren't so active. Where the work that you've been doing over those three years is-- kind of gives you leverage in ways that these existing companies, they don't have time to do that, they're not interested in doing that, or it's-- I don't know. So, really trying to find a way where you could be active where the existing market leaders don't really want to bother you. So if you just try to compete with the existing market leaders on something that you kind of have, that's pretty good, that's almost the same as what they're offering, then that's going to be very hard online. It's going to be hard offline as well.

AUDIENCE: OK, thank you very much.

JOHN MUELLER: All right, so let's grab some of these questions. When using the new "Fetch as Google" in Webmaster Tools, we see a text content of the page, however, when we look at the cache of a page in Google's index, we don't see it. Is that a problem? I think that's fairly normal because with the "Fetch as Google," we actually try to render the JavaScript and all the content on the pages. Whereas with the index cache page, we try to focus on the main page as we actually fetched it, so the HTML page. What I would do there, to kind of confirm that we're recognizing the content on your pages, is to do a site query and include some of those keywords. And often you'll see in the snippet, maybe that we actually pick up those keywords, or you'll see those pages ranking in the search results, which kind of tells you that even if these keywords aren't on the HTML page, we were able to pick them up and use them for indexes.

AUDIENCE: Google, we actually trying to render the JavaScript and all the content on the pages, whereas with the index, [INAUDIBLE] we try to focus on--

JOHN MUELLER: It has an echo. OK.

JOHN MUELLER: All right, do Panda and Penguin work together when assessing a site trust? For example, would a site be less likely to be hit by Penguin if they're not affected by Panda and seen as high quality in another span site? We generally try to treat these separately because otherwise it would be more like a part one algorithm rather than two separate algorithms that are working on their own. So that's not something where we try to mix things together because you could easily get into a weird loop where you say we use some data from Penguin in the Panda algorithm, and the Penguin algorithm uses some data from Panda, and then you're in this cycle of things just kind of like being self-supporting without any real reason. So we try to treat them as completely separate things that look at separate signals. Sometimes these signals overlap, in that a low quality website might also be doing web span things that the other algorithms pick up on. Does manual penalty or algorithmic devalue affect site links? Will low quality sites lose site links after they're hit? Good question. I don't think so. Site links are mostly based on how we understand the structure of a website, and since we mostly show them, as far as I know, when the site is ranking number one in the search results anyway, it's more something that kind of makes it easier for users to get into the site when they're explicitly looking for that site anyway. So it wouldn't necessarily something that we'd suppress when we recognize that there's a problem on the site. Because usually what happens when the site has some kind of an algorithmic demotion, is it just doesn't rank number one for those specific keywords, and then we wouldn't show site links anyway. But I don't think we would suppress site links just because of other quality factors. Let me mute you, there's a bit of background noise. What is Google's view on high-quality aggregators like or Will Google continue to rank them? They don't offer unique content but rank top for most countries within Google search. I don't know either of these websites so at least they're not ranking number one for the queries that I do, so I can't really say much about these specific websites. In general, when we see aggregators in web search, it's important for us that we recognize that these sites offer something unique and compelling past just the aggregation of the content. And some of these sites, I don't know these specific ones, but some aggregators do a fairly good job of doing that. For example, sometimes you'll see news portals that offer a compilation of different news sources and put together content that's actually pretty useful to have in web search. But just aggregated content, just taking feeds from other websites, just taking product feeds from websites, and republishing that, that's essentially low-quality content that both our algorithms would pick up on and from a manual point of view, we'd probably try to take action to suppress that. Some background noise again. Is content that's in a hidden div that appears when a user clicks a button and gets crawled on page load valued less than content that's displayed. An example would be on a product tab that has tabs for description reviews and details. In general, we try to recognize these kinds of situations, and try to treat it appropriately. But what can happen is that our algorithms assume that this content is actually hidden on a page and that it shouldn't be used for indexing, or that it shouldn't be valued as highly for ranking. So if you have information in tabs that are actually hidden by default and you think this information is really important and people might want to look at it and find it and search, that I'd recommend, make turning that into a separate page. So instead of using just JavaScript, or just CSS to switch tabs, maybe set up a separate URL for this specific type of tab. If that's really content there that you think is really important. So for example, if you have, I don't know, a product listing and you have some details on the side that you can switch with tabs and those attributes aren't really primarily that important, but maybe someone who wants to buy this product wants to know. Like, I don't know how fast this product moves, or the different color shades it's available in, whether it's for 110 volt or 220 volt, those kind of details might not be so relevant for someone who's actually searching for a product, but they could be useful for someone who's evaluating that specific product. So those are the type of things I'd say hiding them in a tab is fine. Whereas, if this is your main product description, and you have this hidden on a tab, then that's something you'd probably want to put separate because when users come to that page, first of all, they have to understand that they can click on this tab to find this information. So they would feel kind of misled if we sent them to that page and said, hey this is where you can find the information but they can't see it by default. And secondly, our algorithms, when they look at that page and they think, oh this is hidden text, for example, on a page, we can just ignore this for indexing, and that's probably not what you're trying to do there either. So if this is important content, put it in a separate URL. If it's not important content, if it's just supporting information for someone who's already on this page, then keeping it in tabs like that is probably fine. Why is it that after implementing Google Tag Manager across site and upgrading to Universal Analytics, we're seeing significantly less direct organic referral traffic. That you'd probably have to ask someone from the analytics team. So maybe in analytics help forum, or check Google+ to see if there's someone from analytics who can help you with those kind of questions. I'm not really that aware of all the details of analytics. Does Google take a negative view of links gained by HARO? Don't really know what the service is. A service where reporters ask for help and when they feel appropriate, will reference the user or website with a link, in many cases, DoFollow. I don't really know what that website is. I assume it's something, maybe like the stock exchange or something like that, and from our point of view, that's essentially kind of like any other forum where there's user-generated content on there. Where if the webmaster feels that these links are appropriate, that these are great links that can be trusted, then using them with a DoFollow is fine. Whereas, if you see people just using that to drop links to their service to try to gain some kind of manipulative advantage in Web Search, then that's something where we take action on. So I don't know this specific service, but assuming it's kind of like a forum, I would treat those links kind of like you would treat other forum links.

AUDIENCE: I have another question.


AUDIENCE: Thank you. I have a question. In the design of the page, we have implemented an iFrame for our booking [INAUDIBLE] to be [INAUDIBLE] of them. It's a big space. And the content goes a little bit lower in the space. It can be affected by the [INAUDIBLE] filter at algorithm?

JOHN MUELLER: So basically, on your page, you have an iFrame with a big piece of content and then more content below the iFrame?

AUDIENCE: Yeah, our content is below.

JOHN MUELLER: And the iFrame is part of your content? Or is this some external service that you're kind of tying in?

AUDIENCE: It's external. We are an affiliate of them. They are a booking agent, and we are trying to test how it works to hack some sales.

JOHN MUELLER: Yeah, so I assume we'd focus on the primary content there. So what we would see is essentially this big block that's there that might be blocked by robot site tags, so we can't actually even look at the content there. If we can look at the content, it'll be clear to us that this is a block that's reused on a lot of other pages as well. So it's something where we'd probably try to focus on the primary content there, and if that's way below the fold, then that's probably not that great. But depending on how you implement this, how you kind of integrate this within your website, it might not be terrible either. So I would look at it from the point of view that people are using certain keywords to reach this page. You probably assume what kind of keywords they're using to reach that page, and if that page overall matches their intent, then you're probably OK. Whereas, if that page is essentially just the same as lots of other affiliated pages, then at least at first glance, then maybe it makes sense to kind of redesign this page a little bit so that it's clear from the beginning what's kind of unique about your page compared to all of those other affiliates.

AUDIENCE: OK, thank you.

JOHN MUELLER: And that's similar to advice we have in general for affiliates, so if you have affiliate content, that's not automatically a red flag for us. That's essentially fine, but we want to make sure that your pages have something unique and compelling where it makes sense for us to actually show your site in the search results. So if someone is searching for a certain book, and we have 100 websites that are essentially just affiliates of one big electronic reseller, then it doesn't make sense for us to show all of these affiliates. Maybe it only makes sense to show the original source. But on the other hand, if you're also an affiliate and you have some real unique information about this book, and you have something that all of these other affiliates don't that maybe even the original source doesn't have, then that's something great that we'd want to show in search, a little bit higher. Maybe, sometimes we would even show that higher than the original source because there's some great information on here that the original source doesn't have, some information that we think users would love to see.


JOHN MUELLER: So just being an affiliate isn't bad, but you really need to make sure that you're providing something significantly better than just affiliate content.

AUDIENCE: We try to do that. The question was the space, the visible space in the page, I was worried because it occupies a little bit bigger space inside the frame, we can't change that. And I was worried because our content is real, it's really great, but it's lower in the space. I was worried. It's so sensible sometimes.

JOHN MUELLER: I would try to look at it how users would look at that. So if they land on this page, is this something where they'd see the unique content? Or is your content essentially only there for the search engine so that there's some text to index? And if the content is only there for the search engines, then that's something I'd worry about. Whereas, if this is really something where users land on this page, they understand that there's lots of great content on here, and there's this big affiliate block, then that's fine.

AUDIENCE: OK. Thank you.

JOHN MUELLER: OK, can you please explain how to avoid duplication of pages in multi-regional websites? I'm handling websites which mostly have a similar content structure and [INAUDIBLE] the duplication of pages, how to avoid it. In general, with multilingual, multinational websites, that's something that's very common, and it's not something where we would penalize a website for having that. For example, if you have content for the UK and for Australia, then maybe a lot of the text is exactly the same, it's not that you're rewriting the content just for Australia. So those are situations where we definitely understand that it's normal for this content to be very similar or even the same. What I would do there, is to look at the hreflang markup, and use that to connect these pages. So what happens then, is we understand that these pages are for different locations, and we understand which page is available for which location. So a user in the U.K. would see the U.K. content, and the user in Australia would have the Australian content. So I wouldn't worry so much about the duplication. I'd still make sure you have something unique on these pages, maybe like your local address or something like that, but I'd definitely look into using the hreflang annotations between these pages.

AUDIENCE: And John I have a question.


AUDIENCE: Remember [INAUDIBLE] was asking a question about the sitemap supplementing. I'm really curious because one month ago, we already submitted there is no single URL of this type of content submitted yet. But I'm really curious because we have a unique content, we have unique-- everything is unique, we build everything for this reason because obviously there was a factor of-- Google found out we are not [INAUDIBLE]. But still, I'm curious, we are not [? affected ?] [INAUDIBLE] still, it's not going on.

JOHN MUELLER: So you submitted a sitemap and none of the pages have an index?

AUDIENCE: No, no. This is one of them which I just [? sat and ?] [? checked. ?]

JOHN MUELLER: OK, let me just take a quick look. OK. And other pages from the website are being indexed?

AUDIENCE: Yeah, other pages are going-- taking on somewhere, some days, they are at 1,200, some days they are 700. [INAUDIBLE] are going down. But still, it's going on, we are looking and all is fine. But with this, it's not going [? anything. ?]

JOHN MUELLER: Yeah, so I think this is just a matter of time, a little bit. What usually happens is when we look at a website overall, we try to understand how much resources we should put into actually crawling and indexing all of this content. And sometimes it happens that we get millions of pages submitted by sitemap file, but we realize it probably doesn't make sense to actually crawl and index all of those millions of pages. So that's the situation that sometimes happens. Another thing, when I look at your website just from a site query, is I see most of your URLs are with www's. And the URL you posted in the chat is without the www. So if your sitemap file is submitting URLs that are different from the URLs that we actually indexed, then we won't count that in the sitemap file for the index count. So make sure you're really doing them--

AUDIENCE: Because we already have an identical tag on each URL, I don't know. This is what URL leaving the canonical tag for [? deleted ?] URL. So let Google know this is what real is. And the second question is, because I had a lot in blogs and in [INAUDIBLE], if we are using advertising [? about ?] forward, is that affecting us the quality of page?

JOHN MUELLER: Yes, we do have an algorithm that tries to recognize advertising above the fold, and if a significant part of the above the fold content is filled with advertising, then that's something where we would say this is a bad user experience. Where the user is searching for something specifically, they land on a page that is essentially the big advertising that they see, they don't realize that there's actually content there. And that for us is a really bad user experience. And that's something where we would potentially take action on as well. So that's something that our algorithms will try to figure out and try to resolve there. I think, looking at your site, especially with the www, non-www version, I would make sure that you're really sending all of the signals within your site, to make sure that the right version is being crawled and indexed.


JOHN MUELLER: So that you're redirecting--

AUDIENCE: This is [INAUDIBLE] URL which we are sending you, so we are using this kind of [INAUDIBLE] advertising in our pages.

JOHN MUELLER: I'd have to take a look at specifically.

AUDIENCE: I'm sorry I'm taking more time for my question than [INAUDIBLE]. It's like taking long.

JOHN MUELLER: Let me just click on your link. I think, to some extent, at least what I see there, that's fine. That's a big photo that belongs to the article, something like that. So, that looks OK. But looking specifically at the issue with the sitemaps that you mentioned, one thing I notice is your website doesn't redirect www or non-www to a preferred version. And also if I access the URLs with www or non-www, then the canonical tags [INAUDIBLE] as well. So it always points to the current version, it doesn't actually say this is the version.

AUDIENCE: I will take [? the suggestion ?] again, once again. Thanks.

JOHN MUELLER: So I'd really make sure that all of the signals you give us are consistent in pointing to the version that you want to have indexed.

AUDIENCE: OK, great. Thanks. Another, what was the effect of after [INAUDIBLE] of this photo [INAUDIBLE]? So if, for example, there are 1,000 photo [INAUDIBLE] because it was not indexed and that's no problem.

JOHN MUELLER: That's no problem, yeah, definitely no problem.

AUDIENCE: Thank you. Thanks.

AUDIENCE: When you are speaking about multilingual websites. So as you are in Switzerland, you know that we live with many languages, and we run a multi-- user content generated website. So users, usually what they do is that on the same page, they translate because they have a different communities. So they have, like in French and English and German on the same page. We know that practice, we try to teach our users not to do it, but if you've been looking for flags or for a [? card, ?] you've seen sometimes ads, so classified ads with different languages on the same page. What would you advise? Or do you take actions against that?

JOHN MUELLER: So what we try to do is recognize the primary language of a page, and in cases like that, sometimes it's not easy. Or sometimes there is no primary language because all of these are kind of equivalent. So that makes it a lot harder for us. So if at all possible, we really recommend making sure you have one primary language on the page and that the other languages are on separate URLs. If you can't do that for some cases, for example, for user-generated content like that, that's the way it is. It's not something that will break your website completely. We do recognize multiple languages on the page and we'll try to take that into account in the search results. But the harder it is for us to recognize like the primary content, or to recognize the primary page, the primary language on that page, the harder it is to be absolutely certain that we'll be able to recognize it. So if you give us a page, for example, that's like 50-50 French and German, then that's very hard for us to realize how we should rank this page. Should we show it to German users? Or should we show it to French users? Or should we show it to both users? It's not absolutely clear. So it's not that we would penalize a page for having content like that, but it makes it a lot harder for us to show it in the right places. So you can't kind of be guaranteed that we'll always show it as a French page because maybe it just looks like a German page that's explaining some words in French.

AUDIENCE: Thank you. And maybe another question regarding user-generated content. So as you understand, we are classified websites and the issue we-- so classified ads website, --so the issue we have is that the time to lead for each page is difficult for us to evaluate. So when the user, they delete their ad, or when they update their ad, they want the modification to be replicated instantly on Google as you can imagine. So how can we do, at least when we remove the classified, or will this decrease the price or increase the price whatever? How can we decrease the latency to have the modification then in the index?

JOHN MUELLER: So I think there are two main aspects there. On the one hand, you want to make sure that your website can be crawled as quickly as possible. So making sure that it has a low latency for the HTML pages, for Googlebot is important, that our crawl rate isn't slowing your server down is really important. I think that's kind of the primary prerequisite for this, and for fast updates, we really recommend using sitemaps, which maybe you're already using. And one alternative that you could use instead of sitemaps, or in addition to sitemaps, is for example, RSS feeds using PubSubHubbub, where you essentially send us an update of the content immediately, and we can process that almost immediately.

AUDIENCE: But how can we do that? Because I haven't seen any information regarding this.

JOHN MUELLER: You can use a normal RSS feed as a sitemap file. And in the RSS feed, in the header, you can supply a link to your hub. And we'll use that PubSubHubbub interface to crawl the RSS feed normally. So this is something that you can do with RSS feeds specifically, and it's documented, I think, for the PubSubHubbub project. It's nothing special just for webmasters, it's essentially, you can use this for any RSS feed and any RSS reader.

AUDIENCE: Because the number one request is regarding the page deleted.

JOHN MUELLER: Page deleted is-- I don't think you can submit that directly, like in a sense that you tell us these pages are removed. But you can ping us those URLs in the sitemap file as well, and say this page was updated. We'll crawl that page, we'll see the 404 result code, and we'll update that a little bit faster. So you can kind of ping us in the sitemap file about things that were actually removed as well.

AUDIENCE: OK, also this is not the community OK. Interesting.

JOHN MUELLER: So what will happen is, when we look at the sitemap file in Webmaster Tools, we will show that the number of index URLs is obviously lower than the number you submitted because they're 404, which is normal, so that we don't index them like that. But you can submit those URLs like that anyway.

AUDIENCE: So what you're saying, regarding the sitemap, is that we can send you not the full view, an instant view of the website, but we can [INAUDIBLE] of modifications.

JOHN MUELLER: Yes, you can do that. So in the sitemap file, you can either include everything from your website or you can just focus on the updates that you've done a certain period of time. That's essentially up to you, we don't require any specific format there.

AUDIENCE: Excellent, thanks a lot, John.


AUDIENCE: John, again I'm going to say, I have a question related to the sitemap.


AUDIENCE: We have submitted a sitemap for a client. And when we look at the Webmaster, what we see is, there is no number for these index pages. Those submitted are not being seen. How, if we look at the Google searches, these pages are indexed in the searches.


AUDIENCE: So, what could be the possible reason for that?

JOHN MUELLER: There are two possibilities. One is that you need to wait a little bit, so when you submit a sitemap file, it takes, I think, maybe a couple of hours for the index count to be updated. So maybe you just needed to wait a little bit longer. Another one is similar to your www non-www change that I had mentioned for your website, in the sense that if you submit URLs in the sitemap file that aren't exactly the way that your website uses them, then we'll index the content but we'll index them on different URLs, and then we won't count it in the sitemap file. So the content might be indexed, but it's not the same URL, so we don't show the number there.

AUDIENCE: OK, but in both the cases, I have checked that the URL is same, exactly the same www, and we have submitted URL a couple of months back.

JOHN MUELLER: OK, I would still double-check the URL because sometimes there are things like uppercase, lowercase, or a slash at the end of the URL, or maybe a parameter at the end of the URL, all of those things can kind of add up. And if we index the URL just slightly differently than you submitted in the sitemap file, then we wouldn't count it. Otherwise, the sitemap file is one of, I think, maybe the most exact way of getting an index count in Webmaster Tools. It should really be one-to-one. It'll test exactly what we have indexed compared to your sitemap file.

AUDIENCE: So the possibility of not having the same URL could be thought of one-to-ten URLs. Bu we have so many hundred URLs, and we insure that they're all URLs from the [? plan. ?] But still not showing a single URL as in [INAUDIBLE], so it's something that I would like to understand more.

JOHN MUELLER: I mean, you can send me the link, but really in all of the cases I've looked at before, that had this kind of a problem, there was a subtle inconsistency there, where things like a trailing slash, or a slash, dot, HTML, or index.html at the end. All of these things are systematic errors that are easy to add into the sitemap file, that just subtly don't match exactly what we find for indexing. And often when you look at it as a webmaster, you say all these URLs should be the same because it's just this period, of this comma, or like dot HTML that's different, and it's obvious that it's the same. But for our systems, they're really picky. They want exactly that URL as the one that's indexed. [INTERPOSING VOICES]

AUDIENCE: Sorry to interrupt. As per your suggestion, I'll recheck it, but prior to this Hangout, half an hour before that I checked the same thing.



JOHN MUELLER: So you also send me the sitemap link and I can double-check that as well.

AUDIENCE: OK, thank you.


AUDIENCE: I have another question too, about sitemaps maybe.


AUDIENCE: We have some pages with theme content, and we've removed the links to them. So they are not linked by us. But it's better to let them in Webmaster Tools or maybe send a sitemap without them? What's the best way to delete theme content pages? Remove.

JOHN MUELLER: What I would do there is, put a "no index" on these pages so that they're really clearly removed. From our point of view, even if they're not specifically linked within, your website, if you have a sitemap file, or if we've indexed them in the past, and we have them in our search results, and we recognize that they're theme content, then that's something where our algorithms would use that. So it doesn't matter so much that you're not internally linking through them as much, but as long as they're indexed, then we essentially try to use that information for understanding your website. So putting a "no index" on them would be a great idea. You can keep them linked within your website if you think they're useful for people who go to your website and navigate the website. Leaving them there is fine. So what the "no index," the user doesn't see that, but at least, they're taking out of the search results.

AUDIENCE: Thank you.

JOHN MUELLER: All right, let's take another one from the Q&A. What should we do with a site with a popular forum section, where we produce good quality content in blog, and members produce low-quality content? For example, content with spelling mistakes in a few sentences only. What should we do? One, close the forum section? And this probably goes on somewhere. So generally speaking, our algorithms look at your website overall. We don't recognize that there, well for a lot of our algorithms, we don't recognize the different parts of your website and try to treat them separately. So when we look at your website overall, and we see that there's a mix of good content and kind of middle content and really bad content a little bit, then we have to try to make some kind of a quality assessment of this website overall. And if the low-quality content is really the largest amount of content on your site, the most relevant part there, then it might be that our quality assessment from the algorithms turns out to be more on the lower quality side of things. So that's something kind of, to keep in mind there. What I generally recommend doing there is not closing the forum, because that's maybe where your community is, where people are who want to keep coming back to your website. Some people move the forum to a different domain if they want to split it off completely. Other things you can do are just work to make sure that the quality of the forum as it's being indexed, is higher than it actually overall is. So you could do that, for example, by trying to find some kind of a mechanism within your forum where you can recognize higher quality content automatically and leave that indexed, and the lower quality content, when your algorithms recognize that, just have a "no index" on them for example. So in that case, the user within your website can still navigate the forum, they can find a low-quality content, if they want to chat about the weather and it's not something that we really care about. But at the same time, that content isn't being used for indexing and isn't being used by our algorithm. So this is something where know no your forum best, you know your users best, where you can probably find some kind of a mechanism to figure that out. So like you said, you mentioned spelling mistakes, or short sentences, maybe that's something where you could say everything shorter than a couple of sentences has a "no index." If the user goes back to the forum posts, and updates that to something that's more significant, then maybe you can automatically change that to an index again. But using some kind of a mechanism like that makes it a lot easier for us to focus on the high quality parts of your website and for you to kind of keep this user-generated area open so that your users can build a community. They can interact without you having to worry about it, kind of dragging your site down.

JOSHUA: Hi John, I've got a question.

JOHN MUELLER: Go for it Joshua.

JOSHUA: OK. In Google+, I have a-- since I have a bit of a eccentricity about using clean URLs, and it's really nice that you can take off the beginnings of a URL, and Google+ recognizes those, so you can trim those down. In Google+'s URLs of the system when you first enter, it automatically adds a U, a slash, and a zero. Does that serve a particular purpose? Do you know what that is? Is that for a redirect? I know there is one, which is for the pages, that's with a slash, B. So if you're controlling your page, and then if you take out the B, then all of a sudden your profile instead of the page. But in the Google+ thing, so usually I take those out, so that I don't have to--


JOSHUA: You know, because if I want to grab a URL quickly, to share it in something, then I like to take out the extra characters.

JOHN MUELLER: Yeah. I would have to watch out for that as well. So the slash U and the number is a mechanism that we use for multi-login. So if you have multiple accounts associated with that browser profile that you're in at the moment, so that you can switch between those. So sometimes you have slash U, slash 0, slash U, slash one, slash two, for the different accounts. So if you share a URL, stripping that out is a great practice because that way, it automatically uses whatever the user who's looking at it has as their, basically their default profile.

JOSHUA: And those are not indexed like that either. They're always--


JOSHUA: --stripped out for indexing too so that's why--

JOHN MUELLER: I'm pretty sure.

JOSHUA: I've seen them as the slash U and the slash zero, but also I've noticed if you-- the only place those come from is on the right hand side up at the top. So if you take them out at the beginning when you start your session, then you don't get them reoccurring at all for the rest of your sessions, but-- [INTERPOSING VOICES]

JOHN MUELLER: No, it's really from the multi-login, and sometimes for example, I have a Google Apps account and I have my Gmail account, and switching between the two is possible, and the slash U, slash one, or slash zero is basically just a way of kind of keeping that state in the URL. It's not something that we use for indexing, we try to strip that out as well, and I think if you share a URL, it's best to kind of strip that out so that it doesn't focus on any specific profile on the person who is clicking on it.

JOSHUA: OK, that's good. And one more question about the-- last week you were talking about comments and how if they're on your page, those are considered part of your page because the site owner's responsible for those. And so that had me also a little bit concerned, and I wondered about in the past. In a blogger, and also if you're using WordPress Google+ plug-in for comments, so you get all the Google+ comments appear at the bottom of your screen. And then if I share a post on Google+ that's super SEO, and then it says super SEO one more time in it, and then usually like 50 people or a lot of people share it, and then they may or may not add something to that share. So on the page itself, we could get 100 or 200 repetitions of that word just in the comments. And I'm talking about usually it's a good quality article, so maybe it's the bottom one third or further down than a fourth of the page. Is that something that I should be concerned about? Or would that mostly get ignored? I wouldn't want it-- I wouldn't want to turn off comments for that being a problem, would I?

JOHN MUELLER: I think that's perfectly fine. That's completely normal, that's something we understand as well. All of the commenting system that use JavaScript like the Google+, I think the Facebook system as well, or the Discus, I think also has a system that you can embed with JavaScript. They're a little bit harder for us to understand to pull out this content, but I think we've been getting pretty good at pulling that out as well. But even there, it's something where we understand that some of these things get repeated a few times, and we wouldn't necessarily say the whole page is keyword stuffing because everyone has taken the title of your blog post and reused that in a share that they've had. So that's definitely not something I'd worry about.

JOSHUA: All right, thanks.

AUDIENCE: So John, I have a question on the same thing. Let's say, [INAUDIBLE] you Google to talk about that think about the users. So let's say we have users who are very pretty much concerned about the luxury. So let's say they are very biased about the luxury, so we have a hotel [INAUDIBLE] of hotels, and there are some, a couple of websites we use, which are talking about luxury, but not the hotels. Let's say, luxury brands, luxury clothing, watches. So would it be fine to link that website to our website?

JOHN MUELLER: If this is something that you're doing because it makes sense for the users, I would say go for it. I think that that can be very useful. On the other hand, if you're doing this because of some relationship you have with the other website, then I'd just make sure that you're using NoFollow links so that it's kind of clear to us that this is more of a business relationship, and we shouldn't be passing page rank there. And that's something that users-- they can still click on these links, they still see these links fine, and it's still a value for the user. The content is still kind of there as a value for the user. So from that point of view, both of these options are fine.

AUDIENCE: So I put a scenario that what we are thinking of. Let's say a person is looking for shopping, right. And he's looking for luxury brands, and we are talking about the luxury brand. And what we are talking about, shopping for a particular city. And we are linking that luxury brand is present in that city. And this is being linked to our blog, which is talking about that city of that area. So would it be fine to link [? up? ?]

JOHN MUELLER: Sure, I think that makes sense, yeah. Again, if this is something that you're only doing because of a business relationship, just make sure you have a NoFollow there. But if this is something that you think makes sense for the user, go for it. I mean, we do this as well, in our blog posts, if we think something is relevant for the user, we'll link to that. That's a part of how the web works.

AUDIENCE: OK. Then one thing I would like to understand about the keywords. We use some tools to understand the keyword ranking for a particular URL. So for a few keywords, we find that very inconsistent in ranking. Sometime it is on the first page, sometime it is on four page within a month. So what-- why is this gradation, just because of some update of only Google? Or we also should think of something to take care, so that we can have a consistency in that.

JOHN MUELLER: You'll never have consistent ranking. I think that will never happen. Because we make changes all the time, we make over 600 changes in our algorithm every year. We recalculate the data based on the web, and the web changes every second I would say. And at the same time, there are a lot of personalization factors involved here as well. So specifically, if you see that there are very few searches for something, and sometimes you're ranking very high, sometimes you're ranking fairly low. For example, you would see that in Webmaster Tools. That's something where probably there's some personalization aspect involved. Maybe the user is in that location, and they're searching and we think this is relevant for them. Maybe the other user is in a different location, and we think this is not so relevant for them. So that's something, you'll always see fluctuations, it's never going to be consistent in the sense that you can say, well, you're doing it right, you're always ranking number five in the search results because there's always some fluctuation happening.

AUDIENCE: Oh, OK. Thank you.

JOHN MUELLER: All right.

AUDIENCE: I have a little more question, if I can.

JOHN MUELLER: OK, we have time for maybe one last question.

AUDIENCE: Returning from the-- to the beginning, don't you think that if very well established websites with maybe a not very good backlink profile, but simply they are there forever. A new entrance with a very good backlinks from maybe newspapers, very well known, trustable resources. This time that it takes to this new website to arrive to the position they deserve for the power they have behind. For a question of time, they don't have enough, that they're not aged enough, or you haven't seen them enough to arrive there, it's not a barrier to new entrants to a very competitive field because the existence, the websites that exist there, they're very well established. They are there because simply, they're there for a very long time without any other merit.

JOHN MUELLER: It's not so much the case that we think this is an old website, it deserves to rank there. But it's just that, consistently over time we've collected a lot of signals that say this is a relevant page here. But it doesn't mean that it can never be changed. For example, maybe you have an encyclopedia page on the periodic table and it's the perfect reference for this content, and it doesn't change because the periodic table has changed quite some time. But if someone were to recognize a new element that belongs in the periodic table and they create a better periodic table, then it would make sense for us to swap to the other one as quickly as we can. So it's not so much the case that because this is an old website, it to rank, but rather because consistently, we've seen that this is a website that works well for users. And if you're just trying to be kind of the same as they are, then it's really hard for us to recognize that this is pretty much just as good as the other one. So what you should be looking into doing is, really making sure that you're much more than just the saying, that you have something unique that makes sense from our side to say, OK, we should rethink our algorithms and make sure that this is really number one. And also when we talk to our engineers about these type of questions, we'll often take examples that we get from the Hangouts here, and we'll go to the engineers and say, why is this not ranking? Because it's kind of like similar to the other ones here, it deserves to rank as well. And our engineers often come back and say, well if we already have other pages that are just as good, why would we show more pages of the same. It really needs to be something where the engineers say yes, we're doing something wrong, we're breaking something for our users by not showing this result as number one because it's clearly, by far, the best page here. Whereas, if it's just as good as other ones the engineers will say, well why should we spend extra work on making this website also show in the rankings when it doesn't provide anything special that we should be pushing stronger to our users.

AUDIENCE: OK, I'm thinking, in the booking agent, we work with booking agents, we try to make it with much more information for the user, but maybe this information is not available for Google because it's something you find when you are doing the searching. Maybe you're renting a car, then you find more information about the price, the extra airport--, extra charge. We are better in this point of view, but maybe we are working to do a better content too for the users.

JOHN MUELLER: Or think about how you can do it completely differently maybe, and think about what you can do that you're not directly competing with the existing market people in that search area. But rather that you're doing something that turns it around, so that it really makes sense for our engineers when they look at your website and they say, well, we should definitely showing this for these type of queries because it's clearly something that's not just the same as the others.

AUDIENCE: OK, thank you very much.

JOHN MUELLER: OK, great. So with that, let's close down here. It's been a great Hangout, lots of good questions, lots of good feedback and in the discussions here as well. I really appreciate you guys taking your time, and I wish you guys a great weekend.

AUDIENCE: Thank you very much, a good weekend too.

AUDIENCE: Thank you.

JOHN MUELLER: See you next time. OK, bye everyone.

AUDIENCE: Thank you.

JOSHUA: See ya. | Copyright 2019