Reconsideration Requests
Show Video

Google+ Hangouts - Office Hours - 11 August 2014

Direct link to this YouTube Video »

Key Questions Below

All questions have Show Video links that will fast forward to the appropriate place in the video.
Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video

JOHN MUELLER: OK, welcome, everyone, to today's Google Webmaster Central Office Hours Hangout. Today, we have a little bit of a special hangout in that I allowed users to submit individual questions and their sites and to have a little bit of time to discuss those issues directly one on one. I invited a few people already. Two of you are here already. So let's get started. Which one you wants to get started first? Or you want to do a quick round of introductions?

DANNY KHATIB: Up to you.

JOHN MUELLER: OK. Yeah, so you're first on the list here. So why don't you go ahead, Danny?

DANNY KHATIB: OK, sure. My name's Danny Khatib. I'm the founder and president of Livingly Media. We've been around since 2006, publishing lifestyle content across three brands--,, and We hit about 25 million users, do around 400 million page views a month. We're a venture-backed company. We've been around for a while. Our problem, I think, is a unique one. And it relates to, our flagship property. When we started, we were a mix of user generated content and professional content. We had an open blogging platform where users could write whatever they want, sort of in an open mic fashion or re-syndicate content from Blogspot or WordPress and other platforms. And then we would mix that content with content that our editors would write and photo and video partnerships with folks like Getty and music labels and all that all mixed together-- same subdomain, no directory, no easy way to regex it out. After a few years, the blogger network had exploded in size. It was doing very well for a while. We had over 100,000 bloggers who published 40 million URLs, 40 million articles-- a lot of content, all indexed in google. And over the last year, we've come to appreciate that the network was overrun with essentially user generated spam despite having moderators, despite having algorithms. We had no control for it. And so we killed the whole feature. We no longer have it. The site is entirely professional content now. We did that at the end of the year. Recently, we found manual actions, which is this list of Webmaster Tools. And under Partial Actions for Pure Spam, we had over 1,000 URLs listed. And so they were already deleted from our site, but they hadn't yet fallen out of the Google index. So we went through the URL Removal Tool, pulled them all out, filed a reconsideration request. Those 1,000 URLs came off the list, and another 1,000 URLs popped in because the effected URLs is capped at 1,000 URLs. So now we're pretty scared. We've submitted reconsideration requests. We've said, look, we took down 40 million URLs. All of these are gone. Can somebody please just manually go through the list? I've submitted eight reconsideration requests. Every time the 1,000 URLs gets off, another 1,000 URLs pops up in its place. This could take months or years. And I'm not quite sure, because 99% of the content's already fallen out of the index on its own. So it's all gone. It's just the manual action is just hanging around. Any advice on how to either submit a different kind of reconsideration request or have Google sort of basically reprocess the list or the site?

JOHN MUELLER: So especially for larger sites that have a lot of user generated content, this list of user generated spam that we find is essentially informational for you. It's meant in the sense that maybe these are things that slipped under your radar that you didn't actually clean up yet. But they're not meant in the sense that you need to clean this regularly. So we see this as a way of letting you know that there are some things that we found that you might want to catch up on. But it's not something that would otherwise be affecting your site. So essentially, what happens on our side is we take those individual URLs down from the search results because we think they're spam. But the rest of your site essentially is still treated the way it normally would be treated. It's not something that if this is shown in your Webmaster Tools account, the rest of your site will be demoted or treated in any way badly. So cleaning this up, I think, is a good idea. But especially when you're talking about that many URLs, I don't think it's something that makes sense to do this manually. I just-- leave that list there. Double check from time to time to see that you're really taking out the spammy stuff, that you're actually cleaning that up and removing that from your site, maybe checking your algorithms to see that they're really picking these things up algorithmically as much as possible. But otherwise, it's not something that you need to absolutely clean up.

DANNY KHATIB: Got it. See-- go ahead.

JOHN MUELLER: Yeah, and the other thing I thought I'd mention is the URL Removal Tool. You don't necessarily need to use that for this kind of situation. Because if they're already being removed for webspam reasons, you don't need to remove them additionally for whatever other reasons. I'd use that tool more for urgent removals. If someone urgently needs to remove something that they accidentally posted, then that's a great way to get that out there. But if this is essentially normal site maintenance that you're doing, then you don't need to use the URL Removal Tool.

DANNY KHATIB: So it's unlikely, in your opinion then, that if we had, let's say, 50,000 of these URLs flagged with a Pure Spam action that it would affect the larger domain's SERP rankings at some point?

JOHN MUELLER: It would only affect the larger rankings if, in general, your site's quality were lower quality kind of thing. So it's not the case that these are flagged from a manual webspam point of view, but more that our overall quality algorithms look at your site and they might say, overall, the quality isn't that great. It's not that the quality algorithm looks at the manual web spam reports. But if they were to look at your site overall and say overall quality isn't that great, then those algorithms might pick up on that. But that's essentially independent of anything from the web spam side.

DANNY KHATIB: Got it. Got it. So as long as we've already deleted all the content and it's falling out of the index, we should be OK even if it's still hangs around in the Manual tool.

JOHN MUELLER: Yeah, exactly.

DANNY KHATIB: OK, that's helpful. Thank you.

JOHN MUELLER: Great. All right. Spiros.

SPIROS GIANIOTIS: Yes, hi, John. I'm Spiros Gianiotis. I'm from Athens, Greece. And we've been around on the internet since 1996 with the first travel domain in Greece. We're called We handle a lot of clients in the tourism and travel industry. One of our clients is a major hotel chain. And they have something like around 25 to 30 separate domains with their hotels. And they're considering putting them under either as subdirectories, each hotel, or as subdomain. The question is which one would be best? Please note that they're all hotels in Greece. Greece is a very small country. And a lot of these hotels are relatively close in vicinity-- not that they're nearby, let's say, walking distance. But geographically, they're very close. So what would you have to say, John?

JOHN MUELLER: I think it kind of depends on what you want to achieve there. From a technical point of view, you could move that into subdomains. You can move them into subdirectories. You could leave them on separate domains if you wanted to. Geotargeting might be a factor that you might want to look at. If they're on a .gr domain, then they would be geotargeting Greece. Maybe that's fine for you. Maybe you would like to have it more generic on a .com, Essentially, you could have an international website on a .gr domain. And that's fine too. But if you wanted to, for example, target users specifically in France, than on a .gr domain, you'd be targeting more of those in Greece or the kind of average global audience. That's it. Past that, I think it's mostly up to you how you want to organize these. So you could leave them separately if you think that these are essentially separate entities that should be shown separately, that should be treated separately by users as well. If you'd rather like to see this as one strong entity, maybe putting them together on the same domain. Subdomain or subdirectory is essentially up to you. Sometimes, there are technical reasons for one or the other. So that might be an option too. It kind of depends on how strongly you want to organize everything into one group compared to keeping it separate. Sometimes there are also maintenance issues around there. So with 25 hotels, I imagine it's handleable on separate domains as well. But if you'd go up to 100 or 1,000 hotels, then at some point, it makes sense to combine everything into one domain and treat it more as a package that you could easily maintain instead of solidly separate domains. So from that point of view, I don't really have the magic answer for you. But I hope that kind of gives you some ideas to think about there.

SPIROS GIANIOTIS: So because I've heard various explanations saying that if they were under one domain in subdirectories, that would strengthen the domain rather than putting them in subdomains-- does that have any logic?

JOHN MUELLER: In a case like that, we treat subdomains the same as subdirectories. So it's not something where you'd have any big advantage there. I'd look at it more from a technical point of view. Sometimes it's easier in subdirectories. Sometimes, it's easier in subdomains.

SPIROS GIANIOTIS: Mhm. OK, thank you very much.

JOHN MUELLER: All right. Let me open it up to everyone else. But if either of you have any questions, feel free to ask away in the meantime.


JOHN MUELLER: Any questions?


DANNY KHATIB: Well, I mean, I can ask another one.


DANNY KHATIB: So as I mentioned, related to the problem I had before where we had almost 40 million articles that we've actually deleted, now about 96% or 97% of them have fallen out of the index after about 4 or 5 months. But it seems like there's still the 1% or 2% hanging around. And they're not falling out with the same pace and vigor as the rest is. So I'm a little concerned that there's almost a million articles that have been deleted that are still low quality and that are still in the index. Is there any-- there's no regex pattern that I can put to sort of handle that targeted strike to tell Google to remove those million URLs. Is there any way that I can more efficiently get those 1 million URLs dropped from the index?

JOHN MUELLER: I mean one thing you could do is set up a site map file with those URLs and say that these essentially get changed in the meantime so that we start crawling them again. That's something you could do to kind of trigger a re-crawl there. I wouldn't recommend leaving that for the long run because then you have the disconnect between you're saying these are URLs that should be crawled and indexed, and actually, there is not content. [PHONE RINGING] But if this is something that is a one time thing that you want to have re-crawled and re-indexed, then that might be a possibility. Also with the site map file, one advantage that you have there is you'd see how many URLs are indexed for those individual site map files within Webmaster Tools. So you kind of see the progress as it's moving along there. But that's it. Especially for larger sites, there's some URLs we crawl quickly every couple of days. And others that can take several months, maybe even up to a half a year to be re-crawled again. So there's some amount of latency that's expected for sites like that where a large amount might be re-crawled and reprocessed fairly quickly. And the rest just takes a while to actually be picked up again.

DANNY KHATIB: Great, that's helpful. Thank you.

JOHN MUELLER: All right. Let's grab some questions from the Q&A that were also submitted here. "My site offers an affiliate scheme. And all articles out there explain what to do when linking to a website but don't explain what to do if you're the website all these web masters are linking to. I use affiliate parameters in my URLs." Essentially what you'd want to avoid is that you're using this affiliate scheme to build page rank so that it looks like you're artificially creating links to your site like this. What I'd recommend doing is just making sure that the code snippets that you provide for your affiliates really include the rel="nofollow" in there so that they don't pass page rank. And that's essentially the best thing you can do there. Apart part from that, you don't necessarily need to do anything in between. I know some websites have a domain set up that they redirect through that's also blocked by robots.txt, which is another way of additionally blocking the page rank for passing. But in general, that's not something that you'd really need to do there. So just to make sure that your affiliates also use a rel="nofollow" when linking yo your site so that we don't pass page rank those links. "Does a sandbox really exist? My website started to rank after five to six months. But all ranking dropped after 10 days. Now, my keywords rank after the six page in the search results. I haven't used any unethical tactics." So traditionally, the sandbox has been something where, as far as I recall, new websites would essentially be kind of held back almost when they start. And what you're describing there seems something completely different. So you're saying this website was fine for five to six months, and then dropped. So that would kind of point to something completely different anyway. From our point of view, we don't have anything that we would call a sandbox. There's some aspects of our algorithms that might have similar effects. But there's nothing specific where we'd say all new websites are blocked for a certain period of time until we show them in search results. In addition, sometimes it's even the case that new websites, where they show up, they show up fairly well in the search results because we don't have a lot of information about this website. But it looks great. It looks like something that we'd like to show our users. So maybe we'll show it even a little bit more than we would otherwise. So--

AUDIENCE: Is that a bonus that you're giving the site when it's showing up better than others?

JOHN MUELLER: It's not necessarily something like a bonus where we'd say all new websites get this extra cookie when they start showing up in the search results. But sometimes, it's just the case that we don't have a lot of data for a new website. And we have to make decisions based on limited data. And just because we don't have a lot of data doesn't mean that it's bad, the website is bad. But at the same time, it also doesn't mean that it's the best website out there. So we have to make some kind of an informed decision algorithmically on how we should treat the situation where we don't have a lot of data about a website. And sometimes, it happens that these websites sink in ratings over time until they settle down into some stable state. Sometimes they go up a little bit over time and settle into a little bit of a better state. So it's kind of a situation where you have a little bit of information about a website, but you don't really have all of the information that you can use to make a really informed decision on how and where this website should be ranking. And that's sometimes something that webmasters could see as a kind of a sandbox effect where we have limited information. We have to make do with that information. It takes a while for that data to be collected. And after a while, you'll see some changes. So it's not that there's specifically a sandbox to hold those sites back. But sometimes you might see situations where, after a certain period of time, the ranking kind of settles down into a different position. "Say I use 301 redirects to move from site A to site B. After six months, the redirects are removed because another company will use site A. Does the site authority, page rank, et cetera, stay with site B or does it go back to site A?" That's a bit of a tricky question, because six months is kind of a problematic time in the sense that some of the URLs will have re-crawled and re-processed with the new URLs. And others, we might not. So if after six months, you're kind of splitting those two domains into completely separate sites, then it's possible that some of the signals that we have remain attached to the old website. And some are already transferred to the new one. So that's something where I'd recommend at least keeping the redirect in place for longer-- maybe a year, maybe even longer than that-- and also making sure that all of the external signals are also updated. So if there were links to site A that are actually meant for site B now, maybe contact the webmaster and say, hey, we moved to a different domain. Please update your links to the new one so the users get there directly. And if those links are updated, then we can use those directly and pass all of the data to the new domain whereas if there were only these redirects in place and it was only for a limited time, then it's thinkable that at least some of the signals go this way, some of them go the other way. "Does performing a site move with Webmaster Tools and 301s cause an instant reevaluation of the destination site with regards to algorithms like [INAUDIBLE] thus stopping people from trying to run away from penalty?" We do have algorithms that try to follow up on site moves to make sure that any signals that we attached to the old website are forwarded to the new one. So if there is algorithmic problems with one version of the site and you just 301 redirect to a different domain, then traditionally, those signals would be forwarded as well. That's especially the case with link-based algorithms, where of course the 301 redirect forwards the page rank anyway. So all of those problematic links that you might have with your old domain, if you're just redirecting them to a new domain, then they'll be forwarded as well. So that's something where you'd probably see these problematic parts of an old site move onto the new site, if you're just 301 redirecting. If you think that the old site is so bad that you can't possibly fix it and you really wanted to start over, then I'd really recommend starting a new website, not redirecting from the old one.

AUDIENCE: John, I have a related question here.


AUDIENCE: So if a site is affected by Panda, and that's for thin content, but the site had some really good content, what if we take just that good content and redirect that to a new subdomain of the same site-- not a new domain, but a new subdomain on the same site that is more relevant. So for example, if it's a broad-based website like Wikipedia that has a lot of content and you might not be able to figure out what it's about, then you get a specific sub-domain for a specific niche, like health care, and transfer some of the good content over there. So will Panda still transfer all the new domain or what do you think would happen?

JOHN MUELLER: It's hard to say completely because sometimes we treat subdomains as part of the website itself. So that's something where maybe you're just kind of like moving things around within the same bucket. The other problem there is if you're moving the high quality content out of your website, then that's kind of a weird situation in that it looks like you want to keep the low quality content on your old website, but at the same time, move some of the high quality content to a different website. So what I'd recommend doing there is more of the opposite and either removing the low quality content or moving the low quality content to a different domain so that it's really clear when we look at your domain, we can see, OK, this domain has a history of really high quality content overall. Overall when we look at the pages that are indexed there, it's a good mix. There's the right amount of content here. There's good high quality content here. And the low quality content that we might have seen in the past is actually no longer there. Maybe it's on a different domain. Maybe it's noindex. Maybe it's 404 completely. So that's kind of the direction I'd go there. Instead of moving the high quality content out, maybe just really cleaning things out and taking the low quality content out of the site and focusing only on the high quality content on that existing site.

AUDIENCE: OK, so you said that you look at the subdomain as part of the main website then.

JOHN MUELLER: Sometimes, yeah.

AUDIENCE: Sometimes, OK.

JOHN MUELLER: That's [INAUDIBLE], right? [LAUGHS] I mean, there are situations where clearly, a subdomain or even a subdirectory are separate sites. So when you think of things like shared hosting environments, sometimes-- for example, on Blogger, there's like, And they're separate sites. We need to treat them as separate sites. Sometimes they're shared hosting that uses subdirectories in the same way in the sense that they say, this subdirectory is this user. A different subdirectory is a different user. And we can treat those as separate sites as well. But if you're talking about the same overall site and you're just moving it into a sub-domain, then chances are our algorithms are going to look at that and say, well, this is all just part of the same website. They're using different subdomains, which is fine. But it's not that we need to treat these subdomains as really separate websites because they're actually kind of the same.

AUDIENCE: OK, thank you.


JOHN MUELLER: Another thing maybe worth mentioning there is if there's low quality content on your site that you want to keep for users, but you want to prevent it from causing problems with Google, one idea could be just to just use a noindex metatag for that content. So if you know that some pages are low quality but you think many people, when they're browsing through my website, they want to see this content, then the noindex lets you keep it on the site. But it prevents it from causing problems on Google's side.

AUDIENCE: Got it. Thank you,

JOSHUA BERG: John, also about Panda, I'd like to know, is the newer Panda-- I mean, isn't it more page specific than site specific so that it filters more of a page level much better?

JOHN MUELLER: It does try to be more granular, but it's not going to be such that it does on a per URL basis. So we do try to take into account parts of a website when we can pick that up. But it's not the case that's on a per URL basis. So this is really just something where if you recognize that there's low quality content overall on your website, then that's definitely something I'd work on cleaning up overall and making sure that, in general, when we look at your website, we understand that this is a high quality website. So when a new page appears on your website, we don't have to analyze the page's content first. We can say, well, this website is a great website overall. We don't really have anything to fear by ranking this new page that we don't know that much about fairly highly in search results, whereas if overall we think generally the content on this website isn't so hot, then new content that we might find there will probably be treated a little bit more cautiously.


JOSHUA BERG: I was saying that was a little awkward that a site I suspected had gotten hit by the newer Panda algorithm. The home page was reduced a lot in ranking. But a lot of the articles, especially some of the very good, popular articles, didn't even budge. They stayed very high in rankings. So would it be safe-- am I assuming wrong that maybe that wasn't Panda related?

JOHN MUELLER: I don't know. It's hard to say without knowing the site. But theoretically, a situation like that could be possible. For example, if the homepage isn't very strong but these individual articles are really popular, then maybe these articles are also reduced slightly in ranking. But that reduction is so small compared to the overall good signals that we have for those individual articles that it's not extremely visible.

JOSHUA BERG: OK, so we could have Panda maybe give an overall minor reduction and then maybe a stronger reduction on certain pages as well? Or you mean just like an overall?

JOHN MUELLER: I imagine it's just an overall reduction there. And I don't know. From your description, my first thought would be it's probably something else. It's probably not Panda. But there are situations where there's a slight reduction based on these broader site-wide algorithms. And you might see subtle changes like that. So individual pages might drop a little bit more than other pages, just because we have so many good signals for those other pages as well. So--

JOSHUA BERG: OK, there was some layout possibilities like the layout algorithm or maybe something Payday Loans related.

JOHN MUELLER: I don't know. I'd keep an open mind in a situation like that. But it's something where if you're seeing subtle differences in reduction in ranking for these individual pages, it's probably worth looking at a variety of factors. Maybe things are just adding up in a weird way.

JOSHUA BERG: OK, thanks.

SPIROS GIANIOTIS: John, coming back to what you were saying earlier regarding subdomains-- in my case, where we're talking about these separate hotels, would Google see these subdomains as different sites if they weren't in subdomains since they have a distinctive character?

JOHN MUELLER: It's very possible, yes. But it all kind of depends on how you build that website up. If it looks like it's essentially one big website with different subdomains for individual places, then that's something where we'd say that this looks like one big website and we'll just treat it as one big website. And that's not necessarily a bad thing. So that's not something where we'd say you only have one [INAUDIBLE] in the search results. Sometimes you have multiple slots regardless. So that's something where I wouldn't focus so much on Google side of whether or not it looks at it as one site or not, but rather, find the layout that works best for you as a webmaster and that makes the most sense for your users.


JOHN MUELLER: All right.

JOSH BACHYNSKI: Hey there, John?


JOSH BACHYNSKI: Hi, John. I had a quick question about the new signal everyone's talking about, of course, is HTTPS.


JOSH BACHYNSKI: I'm wondering. Is it a part of Panda? Or is it a standalone algorithm?

JOHN MUELLER: It's separate.

JOSH BACHYNSKI: It's separate, OK. So does it run on an infrequent basis? Or does it run on a regular basis?

JOHN MUELLER: Essentially, it looks at what we have indexed for the website. So it's not something like Panda or Penguin with that's like a site-wide algorithm that has to aggregate a lot of signals about the website. We essentially look at it on a per URL basis. So that's something that kind of runs automatically on it's own. It's not something you would need to wait for a refresh for. It's essentially a continuous updating [INAUDIBLE].


AUDIENCE: So businesses can take their time, right? I mean, to go to this--

JOHN MUELLER: Sure. I mean, at the moment, it's a very lightweight signal. So it's not the case that if you have an HTTP website, you will disappear from search. We think it's a great idea to move over. And I imagine over time, it's something that more and more websites will be doing. So I'd definitely look into it, especially if you're doing an update at some point. But I wouldn't see this as something where you have to halt everything that you're doing and move over to HTTPS so that you can remain in search. So I'd definitely keep in mind and think about it as you're revamping your website. But I wouldn't see this as something that should cause you to stop everything else and move over.

AUDIENCE: John, suppose you install the certificate, the HTTPS certificates, but you have an error, like the certificate is not installed correctly at 2048. And so would you still give the site a good-- would the ranking still be implemented if that person hasn't correctly installed the certificate?

JOHN MUELLER: So what usually would happen there is that every user, when they access the URL, they'd see that error directly in the browser. And the browser would block them from going to your site.


JOHN MUELLER: So that's a fairly big block. And that's not something we have under control. That's essentially the browser saying this certificate doesn't work for this website. So that's something that's fairly problematic that I'd work to fix as quickly as possible. From our point of view, what happens there is we usually, assuming that same content is on HTTP and HTTPS, then we'd see that error as well when we try to crawl those pages. And we'd say, OK, we know the same content is on both of these URLs. But this one has a broken certificate and this one is OK. So we'll use the OK version, the HTTP version, assuming the same content is on both URLs. If you're doing a redirect, then of course it doesn't matter which one we show. The user is going to see the certificate and Error page anyway in the browser. But if it's just implemented incorrectly and you still have the same content on both versions, we'll probably just fall back to the HTTP version for indexing.

AUDIENCE: OK, thanks.

WILLIAM ROCK: Hey, John? Hey, John? Can I ask a little bit more on the SSL? I've got one in the Q&A based around this. There's a lot of questions that I've got from just random companies, CEOs, about why it's so important. If they're not actually running a CMS or they're not running something else, why is it important for them to go to SSL? And then the ones that are basically running e-commerce, what kind of levels of SSL are they-- I read through the document as well so I know that answer. But I want to kind of get it from you.

JOHN MUELLER: Yeah, so from our point of view, if you implement HTTPS properly, then that's fine for us. It's not something where we'd say this specific certificate is good and the other one is bad. I imagine maybe in the long run, we'll be able to differentiate a little bit more. But at the moment, it's really just either it works or it doesn't work. And that's kind of what we look at there. With regards to the type of sites, I think it's important to keep in mind that there are three things that HTTPS does. On the one hand, it's authentication. So it tells the user that they're accessing the right website. On the other hand, the content that's being transferred between the website and the user, in both directions, is encrypted. So on the one hand, it can't be modified by third parties. We've seen ISPs add ads into those pages, add tracking pixels. Hotels tend to do that every now and then that they'll put extra ads in the pages. They'll change some of the ads maybe even. On the other hand, this content can't be listened to. So it's not something where if you submit something to a site, then that would be incomplete to open. It wouldn't be transferred. So using HTTPS kind of protects you from that. A good analogy, I guess, there is essentially, if you're using HTTP without encryption, you're kind of sending your content on a postcard written in pencil to the user and hoping that they get it right. And in general, the people along the path, they might be good people and say, OK, well, this is good content. I'll just forward this postcard on without reading it, without changing it. But you never really know. And the user, when they receive this postcard, if it's written in pencil, they can't really tell if this content has been changed, if others have been listening in, watching this. So it's really hard to tell what has happened there. And even for seemingly uncritical sites, sometimes the user feels that this isn't something that they just basically randomly want to look at. So if you're looking at a small business and you're looking at the job section there, then maybe that's not something that you want their employer to know that you're looking at. So these are kind of the situations where even if the content isn't that credit card number, even if you're not doing financial transactions, there's a lot of stuff that maybe users want to keep private. And it's almost hard for you as a webmaster to make that decision for the user. So being able to have everything on HTTPS gives you that security by default.

WILLIAM ROCK: Thank you, John. I think the other piece is basically the security of a company physically versus a company online. And I think that a lot of people forget that they secure their companies with security alarm systems and this and that. But then when you go online, they forget that that's another portion or extension of their business that can potentially get ruined.

JOHN MUELLER: Yeah I think it's just important to keep in mind that HTTPS doesn't protect your website from being hacked or it doesn't protect your servers from being manipulated. It essentially just protects the connection between the user and your server. So if your server gets hacked by someone, if somehow malware makes it to your server, that's something that you can't protect with HTTPS that you really need to stay on top of separately as well. So it's not a magic bullet. It's not that if you switched to HTTPS, then all of your security problems will be solved. But it's definitely something that at least keeps the connection from your server to the user in a secure way so that random people can't listen to it. They can't manipulate it. And kind of protects you on that front.

WILLIAM ROCK: So it's kind of an additional layer for businesses to help protect themselves, and Google is for protecting the experience of it.

JOHN MUELLER: I guess you could see it like that, yeah. I mean, at the moment it's a very lightweight signal for us. So it's not the case that if you don't switch to HTTPS, then you'll disappeared from search or that you'll have this big disadvantage compared to your competitors. But I think over time, that might change as users become more and more used to HTTPS as they see that it makes sense to have a secure connection to those websites that they're active on, even if they're not exchanging financial information.

WILLIAM ROCK: Yeah, I think it's just an addiction that people are wanting the ranking portion of this versus the security portion of this.

JOHN MUELLER: Yeah, it's always a tricky situation.

WILLIAM ROCK: Thank you, John.

JOHN MUELLER: All right. Let's grab some more from the Q&A. "Why is Penguin still not ready? Too susceptible to negative SEO, Google isn't happy with the results it will give out?" This is definitely something that the engineers are working on. And we're looking into what we can do to speed that up. At the moment, I don't have a magic answer for you as to why. That's always hard to answer anyway. But at the moment, we don't have an update just around the corner. But we're working on speeding that up and making sure that it also works a little bit faster in the future. "Do you know 100% that there will be a Penguin update?" It's definitely something we're working on, yeah. So I'm pretty sure that it will be something like a Penguin update. It's not the case that we'll just leave this data like that forever. It's definitely something that we're working on cleaning up. "I'm building a site map and adding hreflang tags to it. Should I include the x-default as well? Your article doesn't explain anything about the x-default in the site map." So the x-default is a way of specifying the default language and location pages if you'd have anything that you'd like to treat as a default page. And you can use that in a site map file just as you can use any other language tag. So it's not something that would be specific to just the on page markup or just the site map. You can treat it the same as you would maybe the EN or the German or whatever pages that you have there. So that's something you can definitely include in your site map. Another hreflang question. "I have a site that uses subdomains to target a dozen countries specifically, like the UK, US, Australia, et cetera, as well as by region, such as asia.domain, africa.domain. What's the best way to approach hreflang for these regions?" So one thing you can do with hreflang is you can use the same page and include it for multiple language and location tags. So you could, for example, have one page that's valid in the UK and Australia, and a different URL that's valid for the US. And what you would do there is just include separate hreflang metatag or site map entries for the different language location areas, and just specify the same URL again. And that same URL can also be your x-default. So it's not the case that you have to have separate URLs for each of these language variations. You just need to specify and say this is the page for Australia. This is the page for the UK-- might be the same one. And this is the one for the US. So that's something you could split up like that.



JOSH BACHYNSKI: I have a hypothetical search question for you.


JOSH BACHYNSKI: Is it hypothetically-- I'm just going to brainstorm this a bit. Is it hypothetically possible to run version of Penguin that just releases the sites that have done their due diligence and deleted the bad links and disavowed the rest? Is it possible in the interim just to run a version of Penguin that just releases those guys? Because they've been waiting for a long time. I don't want to get whiny on you. They've been waiting for a long time, for over 10 months. And I imagine some of them have gone out of business waiting so long. Is it just possible to do a version of Penguin-- could you pass this onto the engineers-- just one that will let people up that have done the work to clean things up?

JOHN MUELLER: Essentially what that would need is a complete data refresh. So it's not something that is like just a tweak. It would essentially need to have everything re-run completely. So that's not something where we'd probably just do that randomly one afternoon and just push that out. I think one of the reasons also why this is taking a little bit longer is because we just want to make sure that the next data that we push is actually the right kind of data that we'd like to have reflected in search results. So it's not something where we'd kind of just rerun a part of the algorithm and push that data. We'd really need to update that data completely.

WILLIAM ROCK: OK, thanks, John.


JOSHUA BERG: Is the-- yeah, go ahead.

AUDIENCE: Sorry, can you hear me?


AUDIENCE: I was just wondering if you have the ability to-- sorry, as the previous guy was just saying. You know sometimes you release things that say we've run the algorithms without using links, for example. And the results are terrible. Or they're worse than they are with the links. Internally, can you run the algorithm by removing Penguin altogether? And does that look worse than currently? Because I think that's essentially what he's saying. So run it without Penguin internally, does it look worse? No, OK, then forget it.

JOHN MUELLER: Well, I mean, if we can improve the search results by just turning something off, then that's something we'd love to do. In general, the less complexity that we have in web search, the happier the engineers are, the easier they can work on future projects as well. So if with any algorithm that we had, the search results were the same or better with turning something off by deleting code, by deleting files, then that's something we'd definitely want to do. So since that's not quite what we're doing here, then I'm pretty sure the engineers have done those evaluations and said this makes a big difference. And it's vital for us that this algorithm remain in place until we have something that replaces it. So that's something where if the possibility where to exist that it would be better by not having this in place, then we'd definitely jump to do that. And every now and then, we do take specific parts of our algorithms out and we say, OK, our new algorithms are covering this area as well as maybe two or three other places. We can take this out. We can remove this algorithm completely. We don't need the data files anymore. We don't need this algorithm at all. It saves us time. It saves us complexity. It makes it easier for us to create new content, new algorithms. So we'll just delete that code completely. And that's something that I think every healthy software company has to do. They have to go through the code regularly and say, hey, this is something we don't need anymore or maybe it focuses on some aspects that webmasters aren't doing anymore-- maybe, I don't know, keyword stuffing is probably something that webmasters generally aren't doing anymore-- then maybe algorithms like Panda are picking up on it a lot better. Maybe we don't need a separate keyword stuffing algorithm when we can just delete that. So if we can, we'll try to delete stuff and clean our code base up that way. But if it's still in place, then we've probably been looking at the metrics there and saying this does make a really big difference. And it's vital that we keep this in place for the moment.

AUDIENCE: Probably.

JOHN MUELLER: Yeah. I mean, these are all things where we regularly talk to engineering teams about this. And we give them examples of things that we've seen from the help forums, from Google+, elsewhere. And we say, hey, in these situations, it looks like the webmaster has been doing the right thing. Our algorithms should be reflecting that at some point. And this is the kind of data that they use to make those kinds of decisions as well. And at some point, they might just run a new evaluation and say, hey, what would our search results look like with this algorithm turned off or with this algorithm turned on or slightly tuned. And where does it makes sense to make those changes? And it's definitely not the case that we'd artificially keep our search results bad by sticking to algorithms that don't make much sense. If we can improve the search results, we'll do that.


JOSHUA BERG: John, is there quite an increase in negative SEO or maybe just the controversy of it that would be an important part that's like a hold up with the new Penguin or just being careful in that regard that we don't have that--

JOHN MUELLER: Yeah. I mean, we always have to be careful in that regard and to make sure that we're algorithmically and manually picking up on those issues so that they don't cause any problems. And it's not a new topic. It's been out there since, I don't know-- since beginning of Google almost where people would say, oh, if Google thinks this is bad, I'll make it look like my competitor is doing this. And this is something that our algorithm has to live with. We have to understand that this is happening and to kind of work around it.

JOSH BACHYNSKI: Hey, John. So just to make sure my point was clear because the conversation got sidetracked a bit, I just think it would be a good part of your public relations strategy to pass onto the engineers that whether or not Google thinks the sites are worthy to be in the index, if the sites have done the work to clean up the supposedly bad links, personally, I just think that they should be rewarded in taking those actions on their own expense, at their own time, and not being held down for so long. That's just my personal opinion. I won't go on with anything further. And I think that passing that onto the engineers-- that's not a good public relations situation for Google. That, I think, might be a good idea. Thanks very much.

JOHN MUELLER: Sure, I'll pass that on.

AUDIENCE: I have a question about the Removal Tool. Can I just ask a one minute question?


AUDIENCE: So when I submitted the stuff-- for instance, for a client that had stuff that's was cached, and so the other person on the other end removed, for instance, like blog comments. So they closed that, right? So when I submitted it, I basically let them know that the content has been outdated and so forth. Is there certain ways that you need to write to them? I mean, it got denied for no reason. The stuff is not there anymore about that specific person.

JOHN MUELLER: So we have two variations of that. One is for the webmasters. So if you have the site verified, then essentially, we'll take those URLs and do that automatically. If it's not your website, then you have to specify individual words that were removed. So you don't write like a sentence saying, this and that and that was all removed, but rather just the individual words where all the very words that you mentioned are no longer on the page itself, and they're still in the cached page,

AUDIENCE: Yeah, the page has changed. And the Google's cached version is [INAUDIBLE].

JOHN MUELLER: Yeah, so you'd just-- for example, if they removed the word "John," and if you search for the page and the word "John" is no longer on the page at all, then you'd specify just "John" in that keyword area where you'd say this was removed. You wouldn't say "this guy in Switzerland that I know," because maybe some of those words from that sentence would still be on there.


JOHN MUELLER: So just the individual words that were actually removed.

AUDIENCE: OK. Thank you.

JOHN MUELLER: All right. "Should I delete old pages immediately after 301 redirection or better to wait several weeks?" So technically, if a page is redirecting, then there's no content there. So you can delete those pages immediately as long as those URLs are still redirecting. "We experienced a boost in indexed pages by manually submitting our XML site map files in Webmaster tools. What are the reasons for Google to increase index pages if one submits XML site maps in Webmaster Tools manually?" In generally, we use site map files to crawl and re-index content. So if we think that there's content there based on your site map files, we'll try to pick it up like that. Usually, if it's a normal website, we can also pick up the same content through normal crawling and indexing. So this isn't something where you'd need to submit a site map file. But sometimes it helps to make sure that we pick up all the change or the new pages, especially if crawling is somehow complicated. So for example, if you have a website that's very large that has a lot of lower level page that are maybe, I don't know, 20 links away from the homepage and you changed some of those pages, it might take a while for us to recognize that those pages changed or that there are new pages added there. So with a site map file, you can let us know about those changes within your website fairly quickly. There are other ways you can do the same thing-- for example, by linking to those pages from your homepage and saying, OK, on this lower level page, we have some new content-- maybe new articles, an e-commerce shop, maybe an updated article, or something like that where you'd have a listing in the sidebar, something like that on the homepage. So the site map file is great for letting us know about these updates. But you don't necessarily need to do that. Let's see. Here's another one. "Should we disavow links that have been obtained from others stealing our content such as content hyperlinks now on other domains and pointing back to us. Does Google know these are duplicates and devalue the links and not count them?" In general, we recognize those kind of situations and treat them appropriately and just ignore those links. What I'd recommend doing there is if you find something like that, I'd just disavow the whole domain. And then you're sure that it's definitely covered. I wouldn't take any situation where you run across some kind of a problem and you know you could fix it, but maybe Google could fix it automatically and just leave it and hope that the search engines will magically handle everything. If you can fix it, why not just take it into your hands and clean it up yourself? So that's something where putting something on a domain and a disavow file is trivial to do. And if you've seen it there, then fixing it on your side is really easy. We'll probably also handle it appropriately on our side as well. But if you see it, why not just take care of it yourself?

AUDIENCE: John, the Disavow Tool in Webmaster, it has really strong warning on, you know, this will void your warranties so most people don't have to use it. But I guess what you're saying is if you can, just do it.

JOHN MUELLER: I guess my point is more that if you see this problem and you're worried about this problem, you can take it into your own hands and just take care of it so that you don't have to worry about whether or not Google fixes it on its own. If you can take care of yourself, then you're sure that it's taken care of. You don't have to rely on this vague algorithm that will probably be able to handle it right.

AUDIENCE: But what I'm worried about is for somebody who doesn't have a disavow file, and you suddenly discover some scraper who got your content and also links back to you. And I put that in a disavow file. That's the only thing in my disavow file.

JOHN MUELLER: That's fine.

AUDIENCE: That is signal to Google that I think that everything else is OK when the reality is--

JOHN MUELLER: No, that's--

AUDIENCE: --I haven't really looked at anything else.

JOHN MUELLER: That's fine. We use that mostly as a technical tool. So if we find the domain listed in your disavow file, then we don't follow those links to your website. It's not the case that we would say, oh, they have a disavow file. Therefore, they must be spammers and they know what they're doing. It's more that we take these links. And we say, oh, they're in the disavow file. We'll ignore them. Fine. That's done. It's not the case that you have any kind of negative impact from using the Disavow tool.


JOHN MUELLER: All right. We just have a few minutes left. I'll open it up to you guys.

JASON: Hi, John.



JOHN MUELLER: I didn't quite hear you.

AUDIENCE: Jason from

JASON: Oh, yeah. This is from Jason from I'm asking on behalf of our team. On a product assortment page for shopping, is it considered valuable to include related products as part of primary content of the page? Or should related products be presented only as supplement content?

JOHN MUELLER: That's fine to have on those pages as well. From our point of view, it helps us to understand the context of those pages better because we see links to those related products. So that's fine. If you really have a good way of picking up the related products, then that's a good thing to cross-link like that.

JASON: All right, perfect. Thank you.

AUDIENCE: Was there any update on August 8?

JOHN MUELLER: I'm sure there was. I don't know. When was August 8? We do updates all the time. So I'm sure there was an update on August 8. But I'm not sure exactly what you'd referring to.

WILLIAM ROCK: Hey, John,. I've got a question.


WILLIAM ROCK: And I know-- it goes [INAUDIBLE] local rank and what's happening in the SERPs and basically it goes with Yelp. We've seen some weird results [INAUDIBLE]. I'm looking at that as a possible false positive. But some of those reviews that are coming up are just low quality signals. They've got multiple local queried results. They're basically pulling up localized search queries, which is good that that's happening. But there's also ways that they created-- not these guys, but other companies out there with similar-- not really [INAUDIBLE] pages but local ranked pages that still actually rank in the algorithm today based off the techniques that they've done. I'd like to show you that later down the road. But I think I'm seeing something interesting showing with the switch of local-- not just the Google+ by business, but the actual [INAUDIBLE]. [ALERT TONE]

JOHN MUELLER: Yeah, I'd probably need to take a look at examples there.


JOHN MUELLER: It's always tricky when you're looking at things like Yelp and kind of other local directories. Because for some websites, those could essentially be their homepages. And maybe they don't have a big homepage of their own. Or maybe the homepage that they have is essentially just a PDF or big image trial that doesn't have any content that we can index. So to some extent, it makes sense to show some of those local directories because sometimes those are kind of like the homepages for those business. But--

WILLIAM ROCK: What I want to show you is more how it's being spammed for doctors, especially. And those reports are actually showing up in top of results as a negative. And basically what Yelp has done is they've actually told us that we have to pay for those links to be removed, even though they're actually fake names that are actually bashing on a doctor. So it doesn't make any sense. One was an old employee. And the other one was-- you know. And so it's easy to actually manipulate Yelp. And I'm looking at that as a false positive [INAUDIBLE].

JOHN MUELLER: Yeah, I'd probably look at that with them. I don't know. We don't really have much influence on them. But yeah, it's always good to send examples. So if you see something where we're picking something up incorrectly, where we're ranking them badly, so that we can bring that up to the team and take a look and see what we can do to improve that.

JOSH BACHYNSKI: Hey, John. Do you have time for an entity search question?

JOHN MUELLER: A really quick one.

JOSH BACHYNSKI: Oh, OK. The quick entity search question is how is this is going to work together with the HTTPS signal? My concern is that I have all these social signals and stuff like that for the HTTP version. How are we going to coordinate that with the new HTTPS version? Because essentially, it's a domain move. And there's all kinds of SEO issues with domain moves, as you know.

JOHN MUELLER: Domain moves are a bit different because they're really with different host names. What we've seen with most of the social information like that-- at least Google+, the +1 button-- they essentially transfer completely from HTTP to HTTPS if you have no redirects set up, if they're set up appropriately. So that's something that I imagine will essentially just work. It might take a little bit of time for things to move over for the individual URLs, kind of like when you'd have with the move from www to non-www or the other way around. But essentially, the move from HTTP to HTTPS is a lot less critical than a domain move and a lot less critical than even a subdomain hostname move from www to non-www. So that's something that I think works out fairly well. That's not something where there'd need to be anything special that you would need to do on your side, or that we'd need to significantly change on our side.

JOSH BACHYNSKI: Thanks, John. As always, I just want to say that I think you're great and these Hangouts are some awesome. Thanks very much.

JOHN MUELLER: Thanks, Josh. All right. One last question. We're a bit over time.

AUDIENCE: Can I ask, John?

JOHN MUELLER: Sure, go for it.

AUDIENCE: Well, it's the same question I've had over the last few months really, on whether you've got time to look at-- you said we were waiting for an algorithm update for our site, which wasn't one of the major algorithm updates. But we've really seen no change, even though you've previously confirmed that everything was now fine with the site. There was no problems with it other than waiting for the algorithm to update. But--

JOHN MUELLER: I'll check, yeah. I'll check with the guys on that again. I don't know what happened, like what well has gone the last few weeks. But--

AUDIENCE: Yeah, it's been a while since we spoke. Actually, it was on the last dedicated-- one of these where there was this 10 minute slots, which I'm sure you remember. But we've since implemented an hreflang to another site, specifically for the US, which seems to have bolstered it a bit. But the original is still no where near what it should be in our opinion. It's still at least 60% down on last year. Now we're just not sure when-- I've just stuck that into the comments, the URL again-- when that's going to update.

JOHN MUELLER: OK. I'll double check, yeah.

AUDIENCE: --algorithm. And we won't find anything in the news because it's not-- as far as you said last time, it's not an algorithm that anyone will report on.

JOHN MUELLER: Yeah, OK. I'll double check with the team on that, see what we can do.

AUDIENCE: OK. All right.


AUDIENCE: Excellent.

JOHN MUELLER: So with that, I'd like you all for your time. Thanks for all the good questions. And I hope I'll see you guys again in one of the future Hangouts.


JOSHUA BERG: Thanks, John. Great show.

AUDIENCE: Thanks, John.

WILLIAM ROCK: I'll see you, Josh. | Copyright 2019