Reconsideration Requests
Show Video

Google+ Hangouts - Office Hours - 16 October 2015

Direct link to this YouTube Video »

Key Questions Below

All questions have Show Video links that will fast forward to the appropriate place in the video.
Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video

JOHN MUELLER: OK. Welcome, everyone, to today's Google Webmaster Central Office Hours Hangout. My name is John Mueller, and I'm a Webmaster Trends Analyst here at Google in Switzerland. And part of what we do is talk with webmasters and publishers like the ones here in the Hangout, the ones that submitted lots of questions, people in the forum as well. And looks like we have a bunch of people here already. As always, if any of you want to get started with a question, feel free to jump on in now. [INTERPOSING VOICES]

ROBB: I'll ask a question.


ROBB: In the UK, we have two brands, and we're going to merge them. And so we're going to do a site move and 301. Do we need a disavow file at both after the move or just the new one, since the old one will probably be ignored?

JOHN MUELLER: Just the new one. So basically, just where the links go and where they get redirected to. That's all you need to describe.

ROBB: Yeah, I thought so. OK.

MIHAI APERGHIS: John, I have an issue with my [? automotive ?] publisher in the United States. He's in a good news. I'm not sure if you're the best person to ask. But I tried the forums, and they said this was an issue. So in Webmaster Tools and Search Console, I see that the new site map reports unknown new site for the past few weeks. Yet in the partner dashboard publisher center, Google News Publisher Center, everything seems OK. It's verified. The website is included. But the new site map still shows. I tried recently. It still shows errors that the URL is not [INAUDIBLE] website [INAUDIBLE].

JOHN MUELLER: I'd go through the News Contact forum there. So within the Help Center, if you search for something like Contact Google News, you can go to the Contact forum. And there, you should be able to submit. Then I can have someone take a look at that.

MIHAI APERGHIS: OK. OK. So I tried the [INAUDIBLE] forums. So you say this would be a better option.

JOHN MUELLER: Yeah. I mean, what we sometimes see is that in the Google News site map, you have to specify the publisher name, I think. And we've sometimes seen sites have, like, a typo in there or a slightly different version of the name than we have in our database. And that confuses things a bit.

MIHAI APERGHIS: Right. But it worked. We didn't change anything, and it worked fine up until weeks ago, something like that. And somebody from the Product forums [INAUDIBLE] known issue, that they don't have any workaround at the moment, or a fix, or something like that. So I thought maybe you [? might not ?] know [INAUDIBLE].

JOHN MUELLER: I don't know. [LAUGHS] I'm meeting some folks from the Google News team next week. If you want to send me a link, I can pass that on to them as well.

MIHAI APERGHIS: Sounds good. Sounds good. Thanks.

JOHN MUELLER: All right. All right. Let's go through some of the submitted questions. And we can see how far we go. And we'll probably have time for questions and answers in between as well. "How can you increase domain authority of your website without link building? Is there any other techniques we can use? We recovered from a manual penalty due to unnatural links. So we're very reluctant to let an SEO agency use this method. What do you recommend?" So I think the first aspect here is that domain authority isn't something that we define on our side. So I don't really know what goes into this kind of arbitrary metric that you call domain authority then. So I can't really help you with what you need to do to change that. And from my point of view, if you just recovered from a manual action from links that were built, then I really wouldn't recommend just going off to build links unnaturally again, because you just kind of slide into that again. And that's probably not in your best interest. "For some reason, Googlebot spiders and creates URLs from our e-commerce site that we didn't create. We are forever setting 301s. Can it damage our rankings if we have too many 301s? And if so, should we do something different?" So too many 301s shouldn't be a problem. In general, what happens is we follow up to five 301s in a row. And then if we can't reach the destination page, then we'll try again next time. So that's something that does have an effect on how we crawl. But if you're talking about too many 301s in that you have too many individual pages that just 301 once to a final page, then that's definitely another problem. However, if you're saying that Googlebot is crawling URLs that you didn't create, then my guess is that something within your website is set up incorrectly, that maybe you have some relative links somewhere that are pointing at URLs that you didn't know about, where maybe the rewriting that you're doing on the server side creates kind of this infinite structure, where we can go down, down, down through different levels and folders, and keep getting the same content. And that's something that you should ideally clean up, because what happens there is we get kind of lost with all of this crawling. And even if there are 301s for the individual URLs there, we kind of get lost crawling all of these unnecessary URLs. And we might not be able to crawl and index your new and updated content as quickly as we could otherwise. So that's something I'd try to figure out where that's coming from. There are a number of crawlers out there that you can use to kind of crawl your website on your own. Screaming Frog is a really popular one. There's also [? Xenos ?] Links [? Loop, ?] which is a Windows app, I think, that you can just run across your website to see where it goes, how far it goes, where it finds which URLs. And ideally, I'd recommend cleaning that up.

NEERAJ: Yeah. John, actually, regarding this 301 database up query [INAUDIBLE]. John, when I redirect one page to another page, Google always says that please direct 301 or 302 in relevant page. OK?

JOHN MUELLER: What does it say? I'm sorry. I missed that-- the last part.

NEERAJ: Yeah. Google always suggests that please redirect your relevant pages to relevant page.


NEERAJ: So does Google, before passing its value to another page, really consider that really, this page was relevant to this page, and this is why you say that please redirect only to the relevant page? Or just with the help of 301, you pass the value and do not consider whether this page was relevant or not?

JOHN MUELLER: We don't blindly treat redirects as clear redirects, because sometimes there's situations that it's a temporary issue, or sometimes it's a situation that the whole website is redirecting to one URL. And actually, that's more like a 404. So we do try to understand what's happening in the bigger picture for the website with these redirects. And if these are just individual pages that are being redirected, then that's perfectly fine. But if you're using 301 redirects as a way to clean up a site instead of a 404, then we would probably treat that just as a 404. So we do try to take a little bit more look directly at what's actually happening and don't blindly trust everything that we see.

ROBB: So John, can you clarify something you said there? Maybe you did at the end there. But you said that if there's too many pages being 301ed, you might consider it a 404. Do you mean just within a site, if someone's trying to use that as a [INAUDIBLE]?

JOHN MUELLER: Oh, yeah. Yeah.

ROBB: But not with an official site, moving, et cetera, and doing it in Webmaster Tools then.

JOHN MUELLER: No. No, no. I mean, too many different URLs redirected to the exact same--

ROBB: Just a [INAUDIBLE]. Right. OK.

JOHN MUELLER: Yeah. So that's something that we sometimes see where instead of a 404, it's like all of this missing content is redirected to the home page. And that's kind of confusing for users. It doesn't really make sense for us, because we can't really equate these pages. So we treat them as soft 404s.

ROBB: Right. But if you took, say, every stock item of type of T-shirt-- and it was every time it goes [INAUDIBLE], and then redirected that back to the category page, and that kept happening, you wouldn't consider that 404, because it's just going back to the main--

JOHN MUELLER: Yeah. That would probably be OK. Yeah. I think there is kind of a sliding scale there. If you're, like, saying, well, these are individual products, and it's a small part of my website, and you're redirecting them to the Category page because that fits best, I think that's fine. But if you're saying, well, half of my website is gone, and I'm redirecting it to my home page, then, of course, that's more of a 404 situation, where, actually, the website is kind of missing this content now. And the home page isn't really an equivalent replacement for it.

ROBB: All right. That's interesting, because with old sites, I think we've probably made that mistake when if we're winding down a site and we're folding it out, sometimes we'll take the products, fold it into a category, then fold the categories into the home page in the hope that the home page will just become one. And then we can throw on that [INAUDIBLE]. But essentially, we could be damaging it as we go.

JOHN MUELLER: No. I mean, you wouldn't be damaging it, because what would happen is we would treat them as 404s. And if you made them 404 yourself, then we would treat them as 404s, too. So in the worst-case scenario, it's essentially the same as if you were putting 404s there directly.

ROBB: Right. But you're not folding up that page rank via 301s. OK. So that's the wrong way to do a slow transgression [INAUDIBLE]-- folding your business up, and then moving it to another site. You shouldn't really fold it up into the home page first slowly and then move it across. OK.

JOHN MUELLER: Yeah. All right. "Two affiliate sites from the same manufacturer with the same DANPR--" I think that's a domain authority again. "--and page rank. Both have unique product descriptions, et cetera, but my site has a lot more unique content, links, et cetera. The main difference is my site recovered from a manual penalty last year. How can this be?" It's really hard to say. I mean, two sites are always pretty different. So it's hard to compare them directly. I don't know exactly what domain authority is looking at there. Page rank-- you need to keep in mind that we stopped updating two of our page rank I think two years ago, three years ago, something like that. So those are potentially metrics that are kind of stale and not very useful. So I wouldn't really use that as a way to compare websites directly. The main difference with one site having recovered from a manual penalty last year, I guess that's a good thing if you cleaned up any kind of unnatural links there. But I don't really know what the actual question here is, because two websites are bound to be very different. And they will appear different in the search results as well. We do take into account over 200 factors when it comes to crawling, indexing, and ranking. So just because some of those metrics are similar or maybe even exactly the same, when you look at the numbers, doesn't mean that we'll show them in rankings exactly the same. "For [?, ?] we're seeing widely different results for [? 'fee' ?] and [? 'feen'--" ?] which are like two words for "fairy" in German. "We understand singular imperial [? periods ?] have different results, but the extents seem very strange. Do you know why this might be?" So I guess in general, we generally don't have a kind of semantic model of the languages i in that we'll try to understand, oh, this word is this, and this is the plural of this word. And we've looked up in our dictionary and said this word is a synonym for that. But rather, we try to do this algorithmically. And we try to learn that as we go along. And so if we see that people are looking for the same kind of content with, like, two different words, then we might assume that their synonyms are very similar. On the other hand, if we see that people aren't looking for the same kind of content for words that are very similar-looking, then we might assume that they're not synonyms. So I suspect what's happening there is that you're just kind of seeing an effect of how people are searching in German and how maybe they use these words in different contexts, in different way-- in ways that suggest to us, or to our automated systems, that they're not exact synonyms, that we shouldn't be showing exactly the same search results. And that doesn't mean that you need to do anything special. It's essentially just how users search and how we pick that up. "I have a question regarding engagement of a site as a ranking factor. If there are two sites which have the same bounce rate, I guess, do you rank the one better where a user spends more time on a page, click around more, or isn't that considered a ranking factor at all?" So we don't use anything from Analytics as a ranking factor in Search. So from that point of view, that's something that you can kind of skip over. We do sometimes use some information about clicks from search when it comes to analyzing algorithms. So when we try to figure out which of these algorithms are working better, which ones are causing problems, which ones are causing improvements in the search results, that's where we would look into that. But it's not something that you would see on a per-site or per-page basis. "If you change your site structure, does it take only as long as it takes for Google to re-crawl your site in order to get your new ranking positions? Or does Google put a time constraint on things in order to ascertain that nothing else is going to change?" If you're changing your site within your website, then we do this on a per-URL basis kind of as we go along. So it's not something that we would freeze a site in place or we would say, well, we're going to drop this site from search until we've figured it out again. We go through it on a per-URL basis. Some of those will be picked up faster. Some of them take a lot longer to be re-crawled. And essentially, that's kind of a flowing transition from one state to the next one. It's not something where your site should disappear from search and pop back up again. The important part here is, of course, that you do make sure to follow our guideline for site restructuring, that you set up redirects properly, so that we can understand this old page is equivalent to this new page. And that's how we should kind of treat that from our side. We sometimes see sites that just delete all of the old and set up a new site. And in those cases, it is really hard for us to understand what actually is happening here, because all of the old content, all of the old URLs might be gone except for the home page. And all of the new content is something that looks different, has different URLs, a different structure on it. So we kind of have to learn all of that from zero again. So as much as possible, make sure that you're following our guidelines for site restructuring, URL changes, so that we can kind of handle that as a flowing transition instead of having to learn everything new again. "In Search Console, from an SEO perspective, what can the Crawl Stat section tell us, and what should we be looking at to understand how Google perceives one's site? Also, are too many internal links either a bad thing?" I don't think internal links is really something most sites really have to worry about unless it's really a situation where we can't crawl to a website to actually find the content. So that might be a problem. But too many internal links, I don't see that as being much of a problem. From the Crawl Stats, essentially what you want to look for there is that we can crawl your pages in a fairly timely way, so that, I think the average time to crawl the URL-- I forgot what it's called. But the bottom graph doesn't go too high up, that you're looking at something that's well below a second per URL. So this is not to render the page, to kind of load the page completely, but the individual HTML files, or image files, or whatever that we're fetching from your server. And I'd aim definitely for something below a second there if you can make it happen with, like, a CDN. Or in fancy setup, going below 100 milliseconds, that's a good thing to kind of aim for. The main reason there is that the speed that it takes to actually fetch the URLs is one of the things that we take into account when we determine how fast we can crawl a website. And if we can see that a website is responding very quickly, we can obviously crawl it a lot faster and pick up the new content a lot easier than if a website is really sluggish, and we're not sure if we're going to be able to crawl 1,000 URLs today or cause the server to go down by doing that. So that's kind of one aspect there. It doesn't mean that we'll rank the site lower. But if we can't crawl it, then we can't rank the content. "I had location-specific landing pages with unique, rich content, which Google classes as possible throwaway pages. I removed them all, and now my rankings have significantly dropped. My site has suffered significantly. Was removing those pages a bad idea?" So I suspect if you're saying that Google classified them as possible throwaway pages, then you had a manual action on those pages, which means someone from the [? website ?] team took a look at your website and said, well, these are all pages that are essentially targeting the same keywords that are just funneling people to the same stuff over and over again, with different variations of the same keywords. Then that's something I'd really clean up. And it might be that your site had unnatural visibility, essentially, because of those pages, which is why maybe the traffic has dropped after you've removed them. But the manual [? web time ?] action would have had the same effect as well. So that's something where I'd recommend really making sure that the landing pages that you have provide value of their own-- that they're not just City Name Cleaners, and the same content as before, maybe a scraped Wikipedia image, so that you really have something that's kind of unique to present there. Maybe refine it down to the individual areas instead of city name so that you don't have thousands of pages which are just essentially regenerated content, but maybe, I don't know, 10, 20, 50 pages that are targeting specific areas where you do have specific services or products to offer. So that's kind of what I'd recommend there. I'd really kind of shy away from these doorway pages, because it's auto-generated pages that essentially are a really bad search experience for our users. "We use rel=publisher tag on all pages on our site. This is linked to Google+, which shows our headquarters address. Should the publisher tag be linked to our other Google+ pages for our brand pages instead of the main [? HQ1?" ?] I don't know. This is something you'd probably want to check with the Google+ folks, or with the Google My Business folks if these are your local business pages. I don't know exactly how they handle this kind of situation. "Our schema reviews have been removed from the search results by Google. We had implemented them incorrectly, as we thought Google Shopping would be the same for organic results. So we didn't have them placed on our site. How do we get them back?" So if they were removed with a manual action, you would see that in the Manual Action Viewer section in Search Console. And once you've cleaned that up, you can do a reconsideration request there. And someone from the team will take a look at that and see, this is good, or this is still problematic, and give you kind of a feedback based on that. And if it's for the manual action and the reconsideration request goes through, then over time, that will pop back up again in the search results. On the other hand, if you don't have a manual action for this, then that's something that was kind of algorithmically determined. And cleaning that up is a good first step. But it's also something that takes a bit of time, because it has to be reprocessed, re-crawled, and kind of reworked on our side to understand that OK, this site actually implements them properly. We can trust this site to do it right. And our algorithms can just start showing that up again. So it kind of depends on what actually happened there with regards to which direction you need to go. "I've noticed that our website has a 302 redirect from the .com to the for a few months. Has this been passing link juice? So I think there's a big misconception out there about 302s being bad for your website and being bad for your page rank, and your page rank disappears, and you don't pass any value. And that's definitely not the case. When we recognize a redirect and we see it's a 302, we'll assume it's a temporary redirect at first. And we'll assume that you want the initial URL index, not the redirection target. And in general, that's one thing that we try to do there. However, when we recognize that it's actually more like a permanent redirect and the 302 is something that you maybe accidentally set up, then we do treat that as a 301. And we say, well, instead of indexing the redirecting URL, we'll index the redirection target instead. So it's not a matter of passing page rank or not. Both of these redirects do pass page rank. It's just a matter of which of the URLs we actually show in the search results. Is it the one that is redirecting, which would make sense if it's a temporary redirect? Or is it the one that it's redirecting to, which would make sense if it's a permanent redirect? And we do look at the result, the HTTP code there if it's a 301 or a 302. But we also try to be smarter about that and try to fix any mistakes that the webmaster might have made there. "Is there any chance we can do a Site Clinic Hangout?" Yes, we can definitely do that again. They always take a lot more time, so I try to find a time frame when I have a little bit more time to actually go through all of these sites. But it's definitely on my mind as well. "We're considering implementing HTTPS, but wanted to find out if we get hurt by having to redirect our existing HTTP links to the HTTPS version of a site? When you redirect a link, you lose a little bit of value, right?" It is true that there is a very small bit of value that kind of gets lost with any kind of a redirect there. But if you're doing this with any website, that's not something you really need to worry about. So if you're redirecting from HTTP to HTTPS, then that's definitely not something that I would see as holding me back from moving to a secure protocol, to making my site a little bit secure for my users. So that's definitely not something where you'd expect to see any kind of a visible change-- where you might say, well, my site has dropped in ranking, because I redirected it. That's definitely not going to happen. "We notice a lot of 301, 301, 30-- 200 status codes on our website instead of just one occurrence of the 301. Does this directly impact our rankings? Do you lose double the amount of link juice passed because there were three 302 [? 301s?" ?] No. That's perfectly fine. I mean, we follow those kind of redirects, and we recognize their final destination page. And that's perfectly fine. I would make sure that you're not doing more than five redirects in one go. But in general, that's not something that I've seen people do. So if you have two redirects there, that's perfectly normal. Sometimes that's also an effect of how the site is set up, that maybe there's a redirect from the non-www to the www version, and on the www version, there's a redirect to the actual new URL. And those are very common setups. And that's not something that would cause any problem for your site.

NEERAJ: Yeah, John, regarding this question--


NEERAJ: This user has asked for a 301 to 301 to 200. But I think there could be any issue if 301 goes to 302, and then 302 goes to 200. Or in terms of passing page link juice, is there any issue if there is one 302 between this?

JOHN MUELLER: No. No. That's perfectly fine. I think that's also the kind of situation where we try to figure out what the webmaster's trying to do and just try to do that for them directly. So I've never seen any situation where you have a combination of 301 and 302 that would cause any problem on either side.


JOHN MUELLER: "We've seen lots of discussions about creating good content, adding value, fixing duplicate content, et cetera. But what does a content mean based on different nature of websites? I see people creating lots of text, which is really never even read." So that's a good point. Yeah. It's not the case that we say your website should have a lot of text. With content, we essentially mean what you're providing on your website, which could be a service, which could be information, which could be a tool. It could be something that is based on images, and there's absolutely no text at all on a page. And that's really something that you have to work out yourself-- what you want to provide on your website, so that users go to your website. And it's definitely not that you need to generate a big block of text in order to be seen in search. Sometimes a little bit of text and a fancy game or something like that can be really effective. "How should I treat out-of-stock product pages? Should I show a custom 404, a 410, or a 301 to the Category page?" We talked about this with Robb in the beginning. And essentially, it kind of depends on the individual situation there. So for example, if it's individual products that are going out of stock, and the Category page has good replacements, then maybe it makes sense to redirect to that. If it's individual products that are going out of stock, and you have another replacement that you can use, that instead of that product, maybe that's a good replacement there as well. You could 301 for that. On the other hand, if these are products that are going out of stock and the Category page is really kind of unrelated to that, then maybe just returning a 404 is perfectly fine. I wouldn't worry so much about the difference between 404 and 410 in a situation like that. We do process 410s a tiny bit faster. But in practice, that's not going to have any effect on your website. So I wouldn't lose any sleep on whether or not I should use 401, or a 404, or a 410. They're very, very similar on our side. But with regards to redirecting to a different product or a Category page, that's something you kind of have to look at on a per-page or per-product basis, and think about how equivalent is this new page actually to the old page. And in the worst case, what will happen is we'll see that as a soft 404. And we will say, well, they're redirecting 500 products to a category page, and the category page is totally unrelated. So we assume that this is meant to be a 404, and we'll treat it as a 404, which means we don't pass any page rank to the category page. But essentially, it's the same as if you did the right thing from the start. So our algorithms are trying to figure out what you mean and make sure that that actually happens. "There's a Google My Business profile on a location where we no longer operate. However, it still says we have a location here on our brand name. We don't own this listing and want to delete it. How can we remove it?" I don't know. You'd have to check with the Google My Business folks. It might be that you can do something through Map Maker, depending on where you're located, if Map Maker is active there. But I'd really double-check with the Google My Business or the Maps folks to see what can be done there. "What carries more weight with regards to affecting rankings-- Panda, or Penguin, or the 200 other factors that you have?" Everything. Everything has a lot of weight. It really depends on what's happening. It's not that we say, this factor is the only thing that applies, or this factor is the only thing that applies. Sometimes we-- [MIC FEEDBACK] Lots of noise. Scary. Sometimes we recognize that people are looking for something new, and we'll try to show something new. And then maybe a week later, we'll recognize that oh, for this query, they're not looking for anything new anymore. They want the reference work that was put online 10 years ago. So these are things that change all the time. It's not one single factor that affects all of the search results, and that you need to focus on to fix everything and to rank number one. "Do internal server errors with a 500 impact my ranking?" Yes, they can. So there are two aspects there. On the one hand, server errors are a sign for us that maybe we're crawling your website too quickly. Then maybe we're causing problems on your server. So we'll back off with the crawl rate a little bit when we see 500 errors. On the other hand, we see 500 errors, when they persist, as being kind of like a 404 in that we will have to drop those URLs from the search results, because we think maybe these URLs don't work for users either. So those are kind of the two aspects there. It's not that a site will get penalized for having server errors. We just won't crawl it as quickly. And we drop those individual URLs that are returning server errors persistently. The second part of the question goes on. "In Search Console, my site is having a larger number of internal server errors, which are actually pages which I block using noindex, nofollow tags. How can I fix that?" So if these pages are blocked with noindex, nofollow tags, then we wouldn't see that if we see a 500 server error. And for those individual pages, if we drop them because they have a 500 or if we drop them because they have a noindex, it's essentially kind of the same. But the effect that you will see, , still, in a case like this is that we will try to call a little bit slower, because we think maybe we're the reason why your server's returning the server errors. And we don't want to be rude for your website and crawl it stronger than it can actually take. So fixing that, one thing I'd try to do is figure out why your server is returning 500 errors for those pages. And maybe you can just have it return the normal search result, the normal page with the noindex tag, so that we can learn to drop that. Maybe you can just return a 404 and tell us, well, these pages really don't need to be crawled and indexed. And we'll respect that as well. "Does changing the title tag too often affect the overall rankings of a page? For example, I changed the page title to better match the search queries that I want to rank for. However, this resulted in a drop in rankings. I changed it back, but it never recovered." So I think we do take the title tag into account when it comes to ranking, but we do primarily also use that for the title in the search results. So if someone sees your entry in their search results, they'll understand what this page is about and understand that this might be relevant to them or might not be relevant to them. And if you've changed that, that shouldn't really be a problem. So if you're just changing your title tags, and you see a significant drop in rankings, then that wouldn't be related to title tags. "On our travel site, does it make sense to prioritize the site structure, the products where we add the most value and unique content to accommodations easier to reach by Googlebot, because we know these accommodations will be available on many other sites?" So let me try to parse this question properly. So I think what he's asking is, should we move the content where we have something unique and valuable into a more visible place on our website with regards to site structure, making links from the home page, those kind of things? And in general, that's something I'd really go for. If these are pages where you think you're providing a lot of value, where you also see that users love these pages, then I'd try to put them in a more visible place. I think that's a logical thing to do. It does help us to better understand that these pages are more important for your website as well if we see that they're linked from the home page, for example. So if these are really important topics, if these are maybe destinations where you're saying, well, I have a lot of knowledge here, and I really want to make sure that anyone who goes to my website knows that there's some really great content here that they can take a look at, then that's something I'd aim to do. I think that makes a lot of sense. That's not something where I'd kind of artificially hide the nice and great stuff that I have there. "Updating the robots.txt file in Webmaster Tools isn't clear. It simply says Submit and makes it seem like a test submit, not a request to update and pull the latest one." That's good feedback. I'll double-check with the team on that. So essentially, updating the robots.txt file there, the idea is you can try it out in Search Console to see if it matches what you're trying to do. And then you still have to download it, put it on your server, and we'll pick it up again. So what generally happens there is we'll crawl the robots.txt file about once a day for a website. And if it changes, then we'll take that into account. But obviously, if you update your robots.txt file maybe 10 minutes after we've looked at it already, then it'll take it up to a day for us to actually take that into account. And the Submit feature in Search Console tells us we need to check this out a lot faster. So we'll go and crawl the current robots.txt file a lot quicker after you've submitted it there, because we think, well, if you've taken the time to update it and to let us know that you've updated, we need to reflect that a lot faster.

GARY: Hey, John, could I ask you-- how you doing? Nice to see you again. Been a while. [LAUGHS] A quick question is I'm actually doing a huge culling of our site. So we've currently got 10,000 pages on the site, and I'm dropping it down to 1,000. I'm actually removing 9,000 pages of content to do with office space. And so what I'm doing right now is I've gone in and I've put a bunch of URLs in the URL Removal tool. And hopefully, that's going to remove all of the corporate pages that I want gone. And then from there, I read that I also have to have things in a robots.txt file in order for that to sort of block the process and make it do what I want. What kind of time frame am I looking in terms of seeing those URLs being removed from Google? I have noindex and nofollowed all the pages also.

JOHN MUELLER: OK. So you need to make sure that you're not combining robots.txt with noindex, because if they're blocked by robots.txt, we don't see the noindex. So I'd let them [? crawlable ?] and just have the noindex there. I think we're working on updating documentation there to make that a little bit clearer, because that's a common mistake that we see. With the URL Removal tool, you should see those effects within less than a day, about. But one thing to keep in mind with the URL Removal tool is it removes these results from the search results. It doesn't remove them from our index string. So the subtle difference there is that when the removal request expires and that URL is still in our index, because maybe it doesn't have a noindex at that time or whatever, then it might show up again. And I think that's 90 days, something like that.

GARY: But they have been noindexed already.

JOHN MUELLER: If they have been noindexed, then that should be perfectly enough time for us to actually re-crawl those, and recognize that they're noindexed, and remove them completely from our index. So the URL Removal tool kind of speeds things up in that they won't be shown in search. And during those 90 days, we have a little bit of time to actually re-crawl those pages and see, well, there's a noindex here, so we can remove it completely for the long run.

GARY: And are you saying, then, I shouldn't be using the robots.txt file to block those at the moment?


GARY: OK. Because the documentation-- I only just did that about 20 minutes ago. The documentation all over the forums and everything-- and this is official webmaster stuff-- says that they do need to be used in conjunction to actually get me a step-by-step guide to do exactly that. And I'll send you some links in an email just so that you can see that.

JOHN MUELLER: Sure. Yeah. That's good. So there's some confusing differences there, in that we have one version of the tool for webmasters, which is in Search Console, and another version for normal users. And for the normal user tool, you can use a robots.txt file to let us know that this page is kind of gone. For webmasters, they need to kind of follow the normal steps that are necessary for an organic update of the website through the noindex and all of that.

GARY: OK. And the final part of that is that I'm actually taking into account one of the things that you were saying about combining content. So instead of having 20 pages with a small bit on each page, I'm piling them all into one page and reducing everything down to a couple of hundred pages at the most. Now, I don't want to submit those new pages to Google until the others have been removed, so that that old content that's being reintroduced isn't kind of duplicated content from the site. I'm not sure if it makes any sense for me to do it in this order. I was kind of under the impression once the content is removed, the index kind of forgets about that content. And then I can kind of reintroduce it as new content. But that's just kind of where my brain's at. Is there any sense to that?

JOHN MUELLER: I don't think you need to do that. I think you can just go ahead and put your new content up. And even if the old content is still live in the search results, that shouldn't be a problem, primarily because these pages aren't one-to-one copies. It's like individual blocks of text might be the same there, but it's not the case that you have exact copies of the same content there that we would pick one and kind of hide the other one. So in a case like that, I would have no trouble just saying, well, put it all online and take the other ones down. Do it this order, or do it the other order. It's really up to you.

GARY: Yeah. Because the content is really Panda in how it looks at content, essentially, and making sure that it looks at our site in a totally different way now.

JOHN MUELLER: I don't think you need to do anything special with regards to the order there.

GARY: Amazing. Thanks, John.


ROBB: John, can I just ask something else based on that?


ROBB: So during that, let's say it is 90 days. If you've removed a page from the index, that page is still sitting in your servers. Is that still parsing all the relevant authority and/or penalties, or anything else? Even though you'd never see if you Googled it, it's still sitting there and giving a signal to the rest of the pages it may still link to, et cetera. It's just an invisible--

JOHN MUELLER: Yeah, yeah.


JOHN MUELLER: So I mean, it's similar. I mean, essentially, what happens is we have this page in our index. And when we put the search results together for a specific query, we double-check if this is on that list of the urgent removals. And if it's on that list, we won't show it in the search results. So it's kind of in our system still normally. It can be processed normally if the links pass value normally. But we just don't show it in the search results.

ROBB: If you're looking at another related page, you could still see that hidden one in a link report, and that might be a bit confusing to [INAUDIBLE].

JOHN MUELLER: Oh, definitely. Yeah.

ROBB: All right. OK.

JOHN MUELLER: We'll definitely still see that. I mean, you can also see situations where you have a page that has a noindex but that has links on it that do pass page ranking. So that's kind of a similar situation there, except that with a noindex, of course, you can double-check the headers and see, oh, it has a noindex. That's why it's not showing. And if it's with the Urgent Removal tool, then you don't really see that as a site owner or a person who's just randomly looking for that URL.


JOHN MUELLER: "This week's AJAX announcement was a bit vague. Care to rephrase?" Oh, man. We worked so long on this blog post, and now it's vague. Ah! So I think the biggest aspect here is really that we're working on really rendering as many pages as possible. And in a lot of times, people were using the AJAX crawling scheme as a way to say, well, Google can't really crawl my pages properly. Therefore, I'll try to do it for them with the AJAX crawling scheme. And we'll try to do it like that. And we just want to say that this is no longer necessary. If you have a normal website that uses JavaScript to create your content, you don't really need to use the AJAX crawling scheme anymore. We can pretty much crawl and process most types of JavaScript setups, most type of JavaScript-based sites. And we can pick that up directly for our index. So that's essentially what this announcement was, that we're not recommending the AJAX crawling setup anymore. We're still going to respect those URLs. We'll still be able to crawl and index those URLs like that. But in the future, we recommend that you just, like, either do pre-rendering directly in the sense that users and search engines would see a pre-rendered page, which is something that there are also a number of third-party services out there that do this for you. Or just use JavaScript directly as we have it. And in general, we'll be able to pick that up. You can double-check that with the Fetch and Render tool in Search Console. And that essentially gives you a good idea of what Googlebot is able to pick up when it does render the page. And there are probably going to be individual parts where we don't pick up everything perfectly just yet. I think that's kind of normal. That's something we're always working on improving. If you do see that for your specific JavaScript framework or your setup that you have there, we can't pick it up properly, then I'd first check to make sure that the JavaScript runs properly, that you don't have any JavaScript errors in there, because if the JavaScript crashes when we try to render the page, then like any browser, we won't be able to render the page properly. Finally, if all of the files are actually accessible for us-- so we see a lot of situations where JavaScript files are blocked by robots.txt or server responses are blocked by robots.txt. And if that's the case, we can't see the content. And you'll see that flag in the Fetch and Render tool in Search Console, for example. And if that still doesn't help, if it still looks like Googlebot isn't able to pick up your content properly, then, by all means, do post in our Help forum, so that we can take a look at that, pass it on to engineers, and get that improved, because that's something we do want to improve over time to make sure that it actually does work for as many frameworks as possible.

ARTHUR: John, can I step in with a question?


ARTHUR: The problem here-- I mean, not a problem. But anyway, how about we are using JavaScript to search the content faster to the clients, to the visitors, and Google comes and see the normal URLs and the Java URLs? Is that somehow considered duplicate content?

JOHN MUELLER: No. I mean, we'll process the JavaScript as well. And if we see the final version on the page, that's perfectly fine. If you use separate URLs for JavaScript and for Googlebot, then I think you might run into problems, because we don't really know how to fold those URLs together. But for the most part, we see websites use JavaScript on their normal URLs to add additional functionality, for example, or to kind of do lazy loading, those kind of things. And all of that is perfectly fine and shouldn't be a problem.

ARTHUR: Yeah, we don't hide content from Google, and we don't serve Google other content than the normal users. But we just use the JavaScript, because from a point of view, it will only change the products on a page and doesn't stay to load all the designs and stuff, you know? And that's a very fast serving for the client.

JOHN MUELLER: That sounds good. I wouldn't worry about that.

ARTHUR: OK. Thank you.


JOHN MUELLER: Ooh, I can't hear you.

NEERAJ: --issue. There was one problem I was facing.

JOHN MUELLER: Can you repeat the question?


JOHN MUELLER: Or maybe if you can type it into the chat.

NEERAJ: --on a [? jet ?] [? scroll-in. ?]

JOHN MUELLER: I can barely hear you. Maybe you can type it into the chat. I'll pick it up from there.

NEERAJ: --I am thinking.

JOHN MUELLER: OK, I'll wait for your question in the chat and go through some of the next ones here very briefly. "What are some ways small businesses can get ahead in search engines and gain more exposure to the websites?" Ooh. That's a big topic. So I think there are two things that I see a lot of small businesses, especially local small businesses, make mistakes. On the one hand, they don't explain exactly what the [? audit is ?] that they're offering on their website. So it's really hard for search engines to understand where we should be showing this content. On the other hand, one thing that I frequently see is that sites try to be the same as other big sites in that maybe your local bookstore tries to position itself as being kind of like Amazon and that it has all the books that you can possibly read. But at the same time, of course they're not Amazon. It's a very different business. And they have very unique selling propositions of their own. But those USPs aren't really directly visible on their website. So really making it clear what it is that you're offering, what it is that search engines should be showing your website to other users for, and trying to find a little niche of your own, where you can say, well, I'm a fantastic expert in this kind of book, for example, or this kind of a product. Or I am the only person in this region who has this to offer? Kind of this unique value that you're providing, making clear that that's visible directly as well. All right. Let me just check the question from the chat. "I want to index my page without the hash-bang and my company even not giving an HTML Snapshot to search engines. So after yesterday's announcement, should I work on server delivery from HTML Snapshot or what?" So the first thing that needs to happen is we need to have individual URLs. So if you have the current URLs with the hash-bang, then that's something you might want to migrate to a cleaner normal-looking URL structure using HTML5, pushstate, for example, as a way to navigate using JavaScript within a website. So that's probably a fairly big step that needs to be done at some point. And then with regards to serving HTML Snapshots or not, I think that's something you'd probably want to look at in a second step. Or if you're doing that at the same time, use the Fetch and Render tool in Search Console to see how Google is able to pick up the content without HTML Snapshot. And maybe also double-check to see where else you need to have your pages visible. If you are using a special social network that you need to have your content visible in, then double-check to see that they can pick up the right content from those pages as well. And if you need to use HTML Snapshots or pre-rendering for any of those pages for any of the places where you want your content shown, then Google will be able to use that as well. But it's a hard question to answer.

NEERAJ: Yeah, John. Actually, I have done Fetch as Google [? both. ?] And [INAUDIBLE]. Actually, I [? wasn't ?] even thinking to work on this [INAUDIBLE] issue.

JOHN MUELLER: You're breaking up. I can hardly hear you.

NEERAJ: Oh. Let me type it.

JOHN MUELLER: Fetch as Google. Yeah. So Fetch as Google doesn't work for ha hash-bang pages directly. You need to fetch the escape fragment version there. If you have URLs that have a normal URL structure, then Fetch as Google will work for that. So that's something that you could be doing there to kind of try things out. "Is it possible to circumvent the wait time between making a change in Google crawling your site by social presence? It seems it ought to make sense that higher traffic might trigger a crawl." So what I'd recommend doing there is using maybe something like a site map file to let us know that these pages have changed. Give us a new change date in the site map file. You can also use the Fetch as Google and Submit to Indexing tool in Search Console to let us know about that. In general, I don't think we use social networks to kind of speed up the re-crawling of changed content, primarily because those links tend to have a nofollow attached to them. So we wouldn't even notice that we'd have to re-crawl those pages. So that's something where I think social networks is probably a very, very indirect way of letting us know that this content has changed. There's a question here about iOS app indexing. I'd really recommend going to the Help forum about that, because I don't know the details of how that's handled on iOS at the moment. But I know we have some people looking at the forums with regards to app indexing questions. And they should be able to kind of help you resolve that. "What's the difference between the Google index and the Googlebot? How does it affect the search results?" So Googlebot is essentially the crawler that we send out to the web to crawl the individual pages. And the Google index is essentially where we store those individual pages when we have crawled them. So Googlebot takes the content from the web and passes it on to the Google index. And then the Google index is used as a basis for the Google search results. Let's see. Lots of questions left. Let me try to pick some. There's a question about misconceptions. We could probably do a big Hangout on that. Completely. And "My site is reporting 55 pages with mobile usability issues. However, when I test the pages individually, they're reported as mobile-friendly. How long does it take?" Well, we generally re-crawl these pages regularly. And depending on how big your website is, it might be that we re-crawl them every couple of days. So if this report will refresh very quickly, it might be that there are pages within your website that take up to half a year to be re-crawled, which might be these 55 pages that are kind of lingering along there. But that's generally not something I'd really worry about then. If you want to speed that up, you can use Fetch and Submit to Indexing Tool in Search Console. "I understand a new gTLD is becoming available-- .law. Just says you have to prove you're an educator to get a .edu, you have to have a law license to get a .law. Do you see that this will affect the trust factor of a domain on .law?" From our point of view, no. We would, at least initially, treat these as any other generic top-level domain.

MALE SPEAKER: [INAUDIBLE] worry about that. If you want to speed that up, you can use Fetch and Submit to Indexing tool.

JOHN MUELLER: So if this is a new top-level domain and you have very strict requirements, we would still treat it as any other generic top-level domain that we would run across. And potentially, over time, if we recognize that these domains on this top-level domain are really, really different and need to be treated in a different way, then that's something that we might be able to take into account. But that's definitely not something that I would count on. I mean, even with .edu, where you have to prove that you're an educator, we see a lot of spam on .edu and we see a lot of hacked sites on .edu. We see a lot of pharmaceutical advertisements that are placed there by I don't know who on .edu sites. So we can't really say that .edu is a step above everything else just because it has some requirements to get the domain name. So theoretically, we could take that into account if we recognize over time that this content is really, really significantly different and it makes sense for our relevance algorithms to treat it differently. But at least initially, we're going to treat that as any other generic top-level domain. And of course, these domains have a chance to become really relevant and really fantastic websites. They're not held back by anything. It's just that we're not giving any special bonus out just because you're using something like a .law or another gTLD that has specific requirements behind it. All right. We still have a bunch of questions left. But I bet you here in the Hangout still have questions, too. So let me just open it up for you guys. What's on your mind? What can I help you with?

ARTHUR: John, can I tell you one more thing?


ARTHUR: When we use multiple-entity schema markups, Google Search Console show all the properties marked up twice. This is a problem in interpreting or is just a renderer problem? I don't know.

JOHN MUELLER: So you're using multiple types of markup for the same content? Or--

ARTHUR: Yeah, it's multiple entities. I will show you an example right now in chat.

JOHN MUELLER: So it's like multiple things on the same HTML page, something like that?

ARTHUR: It's an entity declaration which use multiple entities. I mean, I can use a product and the [? residence ?] in the same time. In this case, Google Console shows all the properties twice. Like, once it's parsing for product, and the other time, it's parsing for [? residence. ?]

JOHN MUELLER: Yeah, I think that's normal. That should be fine.

ARTHUR: OK. And the second problem is that the Structured Data Testing tool is having a bug. And it doesn't recognize multiple entities although Google Console sees them. The problem is we are not able to verify the correctitude of the implementation prior to uploading on the website because of this bug.

JOHN MUELLER: OK. I'd love to have an example. So if you can send me some example URLs, I would love to take a look at that with the Structured Data Testing tool people, because I know they're pretty responsive with regards to changes and bugs. So that's something where I suspect they'll be happy to fix something.

ARTHUR: OK. Thank you very much. I will send you through email a detailed example. Thank you.


MIHAI APERGHIS: Hey, John. There's a recent report that shows that there's a lower correlation of links to ranking this year versus last year. Is it something that might be true? You're putting less emphasis on a number of things or maybe on the after effects, something like that? Or you're trying to go that route?

JOHN MUELLER: I don't know. I haven't seen that report. I don't know where that came from. I don't think anything significantly is changing there in the sense that we drop links completely. I don't think that's happened.

MIHAI APERGHIS: No, but I mean, is it something-- is it maybe a goal of the algorithm-- relying less and less on links or on the anchor text? Is it something that you're trying to do going forward? Maybe rely more on other factors?

JOHN MUELLER: I don't know. I mean, we always make changes in search. And I think depending on the factors that we're working on that we see are very promising to make the search results more relevant,. [INTERPOSING VOICES]

JOHN MUELLER: We do make changes there. So I would expect to see various reports saying, well, Google search is changing. And maybe it's changing this direction now. Maybe it changes in another direction next week. We're constantly working on improving things. So that's something where I don't know what that specific report is looking at or what they're picking up on. But I wouldn't be surprised if reports found things are different.

MIHAI APERGHIS: Right. And really quickly, is there any news regarding the Penguin update?

JOHN MUELLER: Regarding what?

MIHAI APERGHIS: The Penguin update.

JOHN MUELLER: The Penguin update. I don't have anything new on that to share at the moment.


JOHN MUELLER: All right. More questions from you all here in the Hangout if you have anything on your mind.

JUCHEL: Hello?


JUCHEL: Hi. Yeah. I just want to ask, in order to have a Knowledge Graph for either person or place, what are the things that we need to consider in doing that? I mean, for Knowledge Graph, either extracted from our website or from Wikipedia, and for how long? I mean, how long can we have that kind of result?

JOHN MUELLER: So the Knowledge Graph is generated completely algorithmically. It does use things like structured data on a website to extract information from a website to show that. But it's not something that you can control as a webmaster per se, because we do try to look at the information from multiple sources and figure out which of these aspects are actually real information and which of these aspects might be just something from one individual source on the web. And who knows if we can trust that source? So that's something where we try to look at things overall. We use the structured data markup on the pages to pick that up. But we can't guarantee that we'll be able to use that all the time or right away from any specific website. So using structured markup is a great way to help us there, but it's not a guarantee that we can use that for the Knowledge Graph.

JUCHEL: OK, cool. Thanks.


DAWN: Hello? Hello?


DAWN: Hello? Hello?


DAWN: Ah, hi, hi. Can you hear me?


DAWN: OK. I'm here. Hey! Finally. I had a quick question about something that Gary said at PubCon last week in his closing keynote speech. He was talking about noindexing thin content rather than trying to remove it, obviously, to avoid any sort of Panda-related issues. And he said to also add it to the XML site map. My only concern, really, was that is that not a waste of [INAUDIBLE], so to speak? When you add it to [INAUDIBLE] there.

JOHN MUELLER: So adding it to the site map makes sense as a temporary way to kind of speed things up for the re-crawling. So if you put a noindex tag on a page and we would re-crawl that page every six months, for example, then it takes a while, quite a long time for us to notice that there's a noindex there. And by putting that URL into the site map, you're saying, well, this URL changed. And we'll try to re-crawl it a little bit faster. We'll see that there's a noindex there, and we'll drop it a little bit faster. And at that point, you can remove it from the site map. You don't need to keep it there. So it's kind of like you put it temporarily into the site map maybe for a week or two. And then you take it out, and you say, well, it's really gone for good now. Google doesn't need to crawl it anymore. They don't really need to worry about it anymore.

DAWN: OK. That's brilliant. Can I ask one other quick question as well?


DAWN: A few weeks ago, you mentioned about canonicalization. My understanding of canonicalization has been that it's always got to be a near exact duplicate of a page. So when you end it with things like .html, .php, you know, variations of the same exact output. But recently, you mentioned that you can canonicalize to pages that have the same sort of value, if you like. So is it OK if you've got very, very thin subcategory pages on a site that ended up-- I've got a site that ended up with an infinite URL, and it just spiraled out of control. And we ended up with like 1 and 1/2 million indexed pages. Crawl stats just went through the floor at the time.


DAWN: They've been trying to pull it back for quite a long time. So it occurred to me, if we've got very thin subcategory pages en masse, can we canonicalize to a content-rich category and add a noindex as well? Or would you just do one or the other?

JOHN MUELLER: Yeah. I'd try to do one or the other. The difficulty with having a noindex on, like, one pair of a canonical page set is that we don't really know exactly what you mean with that. So one page can be indexed, but has a rel=canonical pointing to another page that has a noindex tag on it. They're very different pages, because this one can't be indexed, this one can be indexed. But you're saying they're equivalent. So we don't really know what to do there. So what generally happens is we'll ignore the rel=canonical and just keep that one page indexed. So if you have both of the pages noindexed, then that's fine. Then you don't really need a rel=canonical between those pages. If you say this is a one page that kind of replaces these individual pages, then having a rel=canonical pointing at that one page would be fine. But in a case like that, I try to avoid having one of them have a noindex and one of them.

DAWN: So best to have the noindex submit to XML site map, and then remove, and just take out the rel=canonicals?

JOHN MUELLER: That sounds good. Yeah.

DAWN: OK. Because over time, will Googlebot begin to not trust that any rel=canonicals you put on if you start to get a bit silly with it?

JOHN MUELLER: It's really rare. It's really rare that that happens. So we sometimes see cases where people have a rel=canonical pointing at their home page and they put that across the whole website. And obviously, that's something we can recognize and say, well, this is clearly wrong. We should be ignoring that. But if you're messing around with a rel=canonical and you have something's to category page, something's here, then that's probably not something where we'd say, well, we can't trust the rel=canonical at all on this site.

DAWN: OK. That's brilliant. Thank you very much. I'll leave now so somebody else can jump in. Thanks, John. Cheers.


MIHAI APERGHIS: John, regarding rel=canonical, I assume it's a bad idea for [INAUDIBLE] putting rel=canonical on pagination pages to the main, so you have a category with a lot of pages. I've seen that some CMSes simply use a rel=canonical from those pagination pages to the main page of the category. So page two, three, four, five of the category have a rel=canonical to the main page. I assume it's a bad idea. They should probably use a rel=next param for just noindex page [INAUDIBLE] rather than just use canonical.

JOHN MUELLER: Yeah. I think for paginated sets like that, I'd aim to use, like, the rel=next, rel=previous. I don't think it's going to cause any big problems with the rel=canonical, provided we can still crawl to the individual pages that are linked. So for example, if you have a set of paginated pages and they point to individual products, then on page five, there's a link to this one product that we don't see anywhere else on the website. If you always have the rel=canonical pointing to the first set of that set, then we won't find that link to that individual product. But if we can crawl the website normally and we can see the rest of the content, then it should be fine. I wouldn't really worry about that. I wouldn't treat that as a critical issue on a website.

MIHAI APERGHIS: OK. But there are some cases where you might just not find certain products, let's say, because you know there's a rel=canonical there, so you're just going to see the main category page. And you might never crawl those products if those products aren't accessible from somewhere else.

JOHN MUELLER: Yeah. Yeah. I mean, theoretically, that can happen. I think in practice, most websites have a strong enough mesh across the whole website that from one product, you can get to any other one by clicking long enough, even if you don't go through a paginated set. But theoretically, that could be a situation.


JOHN MUELLER: All right. Yeah, we're kind of over time. The next Hangout, I think, is in two weeks, something like that, which will be at a more US-friendly time. So [? Kristin, ?] I think that'll be easier for you. And after that, we should be getting back to kind of a more regular schedule as well. So thank you all for joining. I hope this was useful. Lots of really good questions here. Lots of good discussions. And I wish you all a great weekend. And maybe I'll see you again in more of the future Hangouts. Bye, everyone.

GARY: Thanks, John. Nice to see you. | Copyright 2019