Reconsideration Requests
Show Video

Google+ Hangouts - Office Hours - 07 November 2014

Direct link to this YouTube Video »

Key Questions Below

All questions have Show Video links that will fast forward to the appropriate place in the video.
Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video

JOHN MUELLER: OK. Welcome to today's Google Webmaster Central Office Hours Hangouts. We have a bunch of people here already. We had some lively discussions just before we clicked the button. And there are tons of questions submitted as well. So with the questions, I thought I'd do something a little bit different and skip over the Penguin-type questions, because we've been going through those over and over again. And I think there's lots of great information already around the Penguin algorithms in the earlier Hangouts. And there's probably some things that we've skipped over because we never got to in the past. [MUELLER'S VOICE ON DELAY] I think you have-- you're probably watching it delayed. [MUELLER'S VOICE ON DELAY] All right. [MUELLER'S VOICE ON DELAY] OK, Let's grab one here. What should we look at our web pages to prevent low quality content? For example, any technical specifications. For example, duplicate content on external sites without linked to us and thin content? I think overall when it comes to the quality of content I would try to-- [MUELLER'S VOICE ON DELAY] I think you have the video playing in the background somewhere. When it comes to the quality of content on a website I try to use technical methods as a tool, but not as a primary means of recognizing high quality or low quality content. So if there are things you can do on your website to kind of in an aggregate way to-- [MUELLER'S VOICE ON DELAY] OK. Things you could do on your website to kind of recognize where users are finding higher quality content, maybe ways that you can recognize higher quality content on your own. But I wouldn't use that as a primary way of making that distinction. So essentially, when it comes to the quality of your content, that's something that an algorithm can't really determine on your side. That's something where I'd really recommend looking at it yourself, taking a step back, and looking at your website overall to make sure that everything that you provide on your website that's kind of on these pages is really showing that this is the highest quality content. And that goes past just the text. So you mentioned, for example, when people copy your text on other websites, or when there's thin content, it goes past, just like the black and white text that you have there, and really includes everything on these pages, which could be things like the design, could be things like the way you set your site up, the navigation, all of that, when a normal user comes to your site, they kind of bring impressions about the quality of the website together with them, and give the user a feeling of whether or not they can trust this website, whether or not the information there is actually real information, or if it's just something put together by some random blogger, for example. Not to say that random bloggers are lower quality, but if you're looking for something, for example technical or medical, then maybe you want to make sure that this information is actually from someone who knows what they're talking about. So really kind of taking a step back, and not just looking at the text, and instead really making sure that everything around your website tells the user that this is high quality content. This is content that they can trust. This is a website, a business, that they can trust. All of that should come together.

AUDIENCE: Hi, John, can I ask a quick question on thin content?


AUDIENCE: It's kind of linked. Essentially read a blog page recently this week, where someone essentially had had their content that was classified as thin content before. It was a test that they'd run, had been reclassified as a soft 404. Is there a specific way that you classify what you think is thin content in Webmaster Tools?

JOHN MUELLER: So, Webmaster Tools essentially shows the information that our algorithms are pulling together there. So if our algorithms think that this is the kind of soft 404 content, then that will be visible in Webmaster Tools. I looked at that specific case as well, and I noticed that thin content page was actually a blank page. And that's something where I'd say, OK, there's no content here. It looks like this page was put up accidentally. This is a perfect example of a soft 404 page. So that's something where it's not a matter of the content being thin, but actually there being no content at all on those pages. Or other examples that were there, I think, were the kind of no results search pages where you're searching for a tag or keyword. And that's a URL that we could be indexing like that, but actually the page says, oh, there are no results found for this keyword or for this tag. And that's also a typical soft 404 page, where essentially there's no content there, nothing that we should be indexing. We should be treating it like a 404. And that's kind of where these algorithms are headed, in the sense that they look at your site, they try to find pages that don't really belong in the index because they're actually-- there's no content on these pages. And then we take those out as soft 404s. And we bubble the information that the algorithms figured out up in Webmaster Tools so that you have kind of an understanding of the pages that we took out. In a lot of cases, these are also technical issues on the website, where maybe it's a normal page that returns 200, but it says, oh, this page doesn't exist. And that's a kind of a technical issue that the Webmaster could fix. And by fixing that issue, we can essentially optimize our crawling a little bit and avoid having to recrawl those pages that actually don't have any content, and kind of focus more on the pages that actually do have content.

AUDIENCE: OK, no, that's fine. And I know you mentioned in Hangout a couple of weeks ago with regards to 404 pages, if obviously, you don't want that 404 page to be kind of shown anymore, then I see that it just kind of filter out. Is it the same with soft 404? So, if essentially you don't kind of value that page, essentially with like your tech pages, where you've essentially got no results for the search, allow them to just kind of fall out.

JOHN MUELLER: Yeah. Yeah. Exactly. Essentially pages that we flag as soft 404, we treat them as 404s internally. So they fall out of the index. We crawl them less frequently. We're essentially kind of trying to optimize the results that are shown for your site there and making sure that we're not needlessly crawling your site in ways that doesn't really make sense for your site.

AUDIENCE: OK. That's perfect. Thanks.

AUDIENCE: Hey, John, can I ask a question?


AUDIENCE: On that note, we started looking at some of our links that are reported on Google Webmaster Tools, pointing back to our site. One common thing that we came across was a lot of websites use Google search to generate pages or Bing, either Google or Bing, to generate pages of content. Here's an example. And then those pages get indexed by Google and get reported as links back to our site. One suggestion, so that doesn't look right. And of all content, at least the searches actually were the one recognized. And this is the most common type of spammy links we get back to our site, links that are automatically generated from Google search results. And I was wondering whether you guys have thought about this, have any way of removing these from the index and not even showing them, the webmasters, those links pointing to their site, because it's massive and it's junk.

JOHN MUELLER: Yeah. That's is an interesting idea. I know we index these pages sometimes and we show that as links in Webmaster Tools sometimes, but essentially these are pages that we're not going to count like for or against your site because we can recognize that it's basically linking to everything. And my recommendation there would be, if you see this kind of thing just disavow the whole domain and you're kind of done with the problem on your side.

AUDIENCE: Here's the interesting thing on this part. That's what was our initial reaction. This particular domain that I sent here on the chat, the main domain is the sixth largest website in Poland and it's very reputable. And then they have all of these subdomains where all of this crap activity is happening. Would you disavow the subdomains and not the main domain? We're in USA.

JOHN MUELLER: Yeah. If you think that there are good links on the rest of the site, definitely kind of split that up. But this is the kind of thing you could also submit with the spam report form, to kind of let us know about that. Because we do try to take these out but we primarily just try to take action on them algorithmically so that we can process them, we can index them, but we're not going to use those links for anything.

AUDIENCE: Cool. OK. So, once suggestion from the webmaster's standpoint is, at least if you guys could not even report those to us, could be a lot of help because that's the majority of the spam we get.

JOHN MUELLER: That's an interesting idea. I think, in the past, the Webmaster Tools team has been reluctant to kind of filter the links by weight because then we'd give out some of our information about which links we count and which ones we don't count as much. But maybe if there's a way to recognize things that are really spammy and filter those out, maybe that would make sense.

AUDIENCE: Right. Or maybe a category of links. If it is, someone, like the same way you wouldn't report a Google search page as a link and you don't index that, other Google results in other domains, maybe a category of pages even if the domain is very reputable should not be-- is not even based on quality, but based on category of links.

JOHN MUELLER: It's tricky but I'll definitely pass that on to the team.

AUDIENCE: OK. Thank you.

JOHN MUELLER: How is Google handling website in HTML5? They're multiple H1 tags and they're not in semantic way, meaning the H2 tag could be above a H1. What does Google want us to do? Do we just focus on one H1 tag or multiple? Essentially you can do either way there. So we do understand HTML5 a little bit. We're not currently processing the individual tags from HTML5s in a special way, but that's something we do try to recognize there. And we also try to render these pages on our side, which helps us understand which ones are really headings and which ones aren't headings. So that's something where I wouldn't artificially dumb your pages down just because of Google. And instead, if you think that this markup makes a lot of sense for HTML5, then by all means, use that. We still have a manual action hacked site and no idea why. Does Google take manual actions hacked sites down notifications because of AB testing or any other tracking code? No, we don't flag AB testing as hacking unless you're AB testing something that, on the one hand, looks like a flower shop, and on the B version, is a pharmaceutical site, for example. That's the typical situation that we run across with regards to hacked sites, is that you have your main site that's focused on one topic and the hacked content that someone has kind of snuck into your site, is focusing on something, like the typical spammy pharmaceutical type thing. And sometimes what we'll find is that these hackers are cloaking the content to Googlebot. so I definitely use the Fetch as Google tool to double-check the content on your site. I'd also double-check the content on your server directly. Like, maybe check your server logs to see what actually is being pinged up, what is being accessed. Because sometimes these hackers also add the content to your site in a way that doesn't get indexed directly. So for example, if someone is using email spam, and they're pointing at hacked content on your site that isn't meant to be indexed for search, then that email spam essentially still points at that hacked content on your site, which is something that you need to clean up. So I wouldn't focus just on like a site query. For example. I wouldn't focus just on what you see in your browser, but instead try to double-check what Googlebot sees with Fetch as Google and double-check your server logs to see if there are any parts of your site that aren't actually meant to be active at all, which might not be shown in a search but which are a sign that something is hacked or still broken. It's definitely not something where we'd say, you're doing AB testing and the one version has a blue background, the other version has a red background. Therefore the site is hacked. So we really have found something really problematic on a site in order to flag it as being hacked. There's a Penguin question I'll skip. And another one. I'm sorry. Question from Josh. In the wake of Google's current algorithms, it is worth trying to recover, aside from Penguin and Panda, if the business doesn't care about the domain at all, would it likely be faster just to start again with a new site, new design, and a viral content strategy. I think this is something that you always have to kind of keep in your mind. In some cases, it might make sense to start over with a new website, especially if the business has just recently started on a domain, then that's something that probably is easier to make that decision. If you've been working on the same website for years, and you run across a situation that you can't solve or can't solve in a reasonable amount of time, then maybe moving to a different domain makes a lot of sense. I try to avoid that as much as possible because I know that if you've been working one domain for a really long time, then it's a lot of work to start over with a new domain. It's not something that you can just copy and paste a code and everything works as it did before. So I think this is always a tough decision, and sometimes it makes sense, sometimes it's a little bit of an easier decision. But it's never something I'd say, there's a general answer for this that everyone should follow. So if your site, I don't know, has low quality content, then you should always move to a new domain. Because if you have low quality content and you move it to a new domain, that doesn't really solve the basic problem. But it's always a tricky one. Let's see, domain A with a manual penalty is being 301'd to my client's site. We can't stop the redirect because we don't have control of it. Is a penalty affecting my client's site? If it is, how can we stop it? This is something that we tend to recognize fairly well and that we tend to kind of ignore in our systems fairly well. So if you have a normal website and some random person is redirecting a banned website to your website, then that's not something that's going to be skewing our algorithms in any way. So that's not something we'd see as a site move. That's not something where we'd say, these signals need to be forwarded to your site as well. Our algorithms are generally pretty good at being robust against that kind of manipulation. I've heard you talk about no indexing pages of lower quality content. For example, if we add a new author and aren't sure of their quality yet, do pages with original content that are no index, generate the same of page rank as an indexed page? So page rank is essentially based on the links to those pages, and if those pages aren't indexed, then those pages aren't really going to be collecting page rank for the site in general. Because they're going to page that essentially isn't shown in search, so it's kind of disappearing there. So from that regard, that's something maybe to keep in mind. On the other hand, that's also something where I'd say, if a lot of people are linking to pages that you think are lower quality content, then maybe you need to adjust your definition of lower quality content. If everyone is recommending these pages, maybe they aren't as bad as you thought. Maybe they're worth showing in search normally. So that's something you kind of always have to balance and think about when you're cleaning up your site with regards to lower quality content, what you want to do with those pages. Maybe you can combine a few of those pages together to create one version that's definitely really high quality content. And in that case, you could 301 those pages to your new page that kind of replaces them. But if you're just always no indexing new content, and you're seeing that lots of people are linking to some of that new content, maybe you need to think about that strategy again and change that slightly. My client uses best in class content and links. But after two years now, we're running out of steam with poor rankings, decent social and brand presence, hard to keep any site going without some organic love. I haven't looked at the site. But I can take a look afterwards, copy the URL out. And see if there's something that I can spot there. But, in general what I'd do there, is maybe go to the help forum, maybe post somewhere else where you can talk with peers and see if there's anything specific that they can spot, be it from a technical point of view, or from a quality point of view, for example. I'm kind of worried that you mentioned they have best in class content and links, which sounds like you've been artificially building links to some extent. I don't know if that's the case, but that's something you probably want to watch out for and potentially clean up, if that's something that maybe a previous SEO has been doing.

AUDIENCE: Hey, John. On that point, your feedback about conducting a survey, can be very helpful for webmasters to get some feedback from people that have been working on the site for a long time and get biased. It was certainly very helpful for us. So I would definitely encourage other webmasters do it as well.

JOHN MUELLER: So I think you're mentioning the 23 questions, blog posts from [INAUDIBLE] from a couple years back now. That's something that I think makes a lot of sense, especially if you've been focusing on your own website for a long time, to kind of give external people who aren't associated with your site, a task to complete on your website and have them go through those 23 questions to kind of figure out, do random, external people see any issues that you might be overseeing. Maybe have them completed the same tasks on your strongest competitor as well to see where the difference might be. And it's always tough if you've worked on your website for a really long time, because it's your baby and you've kind of gotten used to the quirks. And you think, oh, well this is kind of complicated but actually it's good for you, if only you would like do in the way that I recommend it there. But when you're talking to users, I think it's really important to take their feedback seriously and to kind of take a step back and accept all of this feedback, even if you don't necessarily agree with it offhand.

AUDIENCE: Yeah absolutely. It was definitely very eye-opening for us. So highly encourage people. And on that note, if did you get also my message with the results that we got.

JOHN MUELLER: I didn't get that, no.

AUDIENCE: I sent it yesterday. So--

JOHN MUELLER: OK. I'll double-check again today.

AUDIENCE: Perfect. Thank you.

JOHN MUELLER: And we do that as well. So we are looking to visit various webmasters and agencies, SEOs, to see how they use our tools, to see where our opinion is skewed, where we say, oh, this is an obvious and easy to use tool, and when you talk to normal people, they say, I have no idea what to do here. This is all black magic. I just click around and hope it does something. So that's the kind of feedback that's painful for us to hear sometimes because we think, oh, we understand everything and we have the needs covered. But it's important to get that kind of feedback so that you can move forward. What's the best technique to shift from HTTP to HTTPS? 301, I think, is good, but what what's the use of a 307? We generally recommend a 301 redirect, because that's a permanent redirect, saying that all traffic should be moving to HTTPS version. I'm not sure where the 307 is coming from. But I'd really make sure that if you're moving everything to HTTPS, make sure that kind of everyone is moving to HTTPS there. But this is something where these recommendations are still kind of new in general. So our kind of push to encourage sites and HTTPS is fairly new and there might be some issues that you run across which will be useful for us to hear about as well. So for example, we've seen issues with regards to ads on your site. We've seen issues with regards to widgets that are either placed on your site or that you might be providing for third parties to put on their sites. So if they move to HTTPS and you're not on HTTPS, then that kind of clashes with their security features on their pages. So that's something where, depending on the type of site, it might be trivial to move to HTTPS, and for other sites it might be really, really complicated and not something that you'd want to do from today to tomorrow. So take your time. Make sure you're really covering all bases and that things are actually working out for your users and with regards to search. In Webmaster Tools under Content Keywords, insignificant words appear regarding my site. How can I change this? That's something I wouldn't worry about. So the Content Keywords is based on what we find while crawling your website. It's not a sign that we focus on those words for indexing or for ranking. So this is primarily something where you can double-check to see that the primary content of your site is actually being picked up on, and you can double-check to see if there are really unrelated words being picked up on. For example, if you have a, I don't know, a flower shop, and you see pharmaceutical terms in there, then that's a good sign that maybe your site is hacked somewhere. But I wouldn't worry about it if it's bringing up insignificant words, if it has things like dub, dub, dub or HTTP in there as well, because these are just things that we picked up on while crawling your site. They don't mean that these are the main words we use for ranking or for indexing your site. And the same thing applies to the singular and plural words that you mentioned and words in Latin characters or non-Latin characters. Essentially this is what we pick up while crawling. It doesn't mean that we use that for actually ranking your site.

AUDIENCE: John, Can I just ask another quick Webmaster Tools question?


AUDIENCE: We've got a site that we recently inherited that, essentially, in Webmaster Tools, when we go into the sitemap section, it's showing that we've got 1600 URLs submitted to sitemaps. only one index. All of URLs within that sitemap are fresh, they're all new We do a search in Google itself and it's showing that there's 5,000 results for that domain. So URLs are actually indexed. And the URLs that are in the site map there are saying that they're not indexed are actually in when you search Google search. So it just seems there's kind of-- there's not really a clarity between the two. I didn't know if that was something that you've come across before.

JOHN MUELLER: Usually that means that the site map URLs don't match the exact URLs that we actually index. So that's something-- for example, you might have dub, dub, dub, non-dub, dub, dub mixed up there. You might have HTTP, HTTPS mixed up there. You might have URLs with tracking parameters maybe in the sitemap and we index them without the tracking parameters. But usually the sitemap statistics is, I'd say, the one module that I feel is as current as possible in Webmaster Tools, that works really, really well, where we calculate the index counts every day. And it's something I'd say you can really rely on. So, if you're seeing that kind of mismatch there, then that to me is a sign that either the sitemap file is completely new, that maybe it was just submitted today and it just hasn't been recalculated yet, or that, most likely, that you're submitting some URLs in the sitemap file that don't match the ones that we actually pick up for indexing. So what I would do is open the sitemap file in a text browser or text editor, for example, copy some of the URLs out and do an info query on Google for that specific URL, to see which version we're actually picking up for indexing. And you might see that there are extra parameters in one or the other version. Maybe there is like upper and lowercase difference in the URLs somewhere. But all those things kind of add up and essentially if it's not a one-to-one match, then we don't count that as being indexed. And that's also a sign that your sitemap file probably isn't optimal the way it is there. Because what we crawl your website, we pick one version of the URL and your sitemap file says something different, so it's kind of clashing with what we figure out during crawl. So making sure that it's all consistent really helps us to be able to trust both of those versions a little bit better

AUDIENCE: OK, no, it's fine. I'll do some digging. Cheers.

JOHN MUELLER: Sure. The same thing applies for images. We see that as well. Where you'll submit an image sitemap file and you'll see a really low count. And looking at the image URLs that are actually in the sitemap file, you can often see that these aren't the ones that are actually embedded on the pages. So it's kind of normal to see a disconnect between those counts. But the more consistent you could make it, the easier we can actually focus on that.

AUDIENCE: John, question back on the previous question about HTTPS, did you ever put together that little chart for HTTPS and hreflang and--?


AUDIENCE: I'll take that as a no.

JOHN MUELLER: No, no, no, no. I'll actually prepared it.

AUDIENCE: Oh, you did?


JOHN MUELLER: --sharing that.

AUDIENCE: --share it at the end or something.

JOHN MUELLER: Here we go. So, I think, let me double-check. Yeah. I think this is right. I just finalized this just earlier today. So essentially what you have is a different language versions, in this case English, French, and German. You have the HTTP and HTTPS versions. So this is essentially the version that redirects or has a rel canonical to the HTTPS version and between the canonical versions, you have the hreflang markup. So this is the same with mobile and desktop pages. where you'd have your mobile page here that redirects or has a canonical to your desktop page. And the important part is that the hreflang tag is between the versions that we actually pick up for indexing. So if you have a redirect or rel canonical to HTTPS version, that's the one that you should be using for the hreflang. If you have it set up differently, that the redirect is pointing in the other direction, then, of course, you have a different version that's actually indexed and that's one you should be using for the hreflang. Does that makes sense?

AUDIENCE: John, can I step in with a question related to this one?


AUDIENCE: I mean, so you say that the HTTP versions shouldn't have hreflang between them because they already use the rel canonical links, right?

JOHN MUELLER: Yes. I mean you can have them there. But we would be ignoring them if that's not the version that we pick up as a canonical.

AUDIENCE: I understand . Because it's easier to-- when you implement the hreflang, you can implement it all over. It's much easier than to implement it on half of the website. And I was just asking if it hurts.

JOHN MUELLER: No. If you have the hreflang, between pages that aren't canonical, we basically ignore that. So the important part is really that it's at least in there with the canonical URLs. If you also have it with other ones, that doesn't really matter that much because we ignore that, but the ones that you do want to kind of have reflected should be between the canonical URLs. And that also includes things like tracking parameters, for example. So if you have any URL on your website that has a canonical that's slightly different, then make sure that hreflang is between the canonical versions and not that kind of tracking URL version that you don't want to have indexed.

AUDIENCE: I understand. Is this extending to the pagination part? I mean to have rel next and rel prev on the-- along with the rel canonical pages?

JOHN MUELLER: If you have the pagination set up that it makes sense for that, then you can use that. I don't know if that makes sense for all pagination in general. Because maybe you have like different pagination for the French or the German version just because the text is much longer or much shorter. So that's something you'd want to watch out for. But if you have really equivalent URLs and one version that's indexed like that, sure you can use hreflang.

AUDIENCE: Yeah, well, I was thinking that I can give you a short example. You have two types of listing the products on your page. One is list view and the other one is grid view, with much larger photos and so on and so forth. You already have the pagination implemented all both batches. But the rel canonical, it's used just to point out for Google that I want to be indexed and to get the visitors only on the list view. But since I've implemented it at the site level, I've implemented the pagination on all versions. So that's why I was talking about mixing up the pagination with the rel canonical and if it hurts.

JOHN MUELLER: If these URLs aren't being indexed, if they have a rel canonical pointing somewhere else, then that hreflang is essentially ignored. So it doesn't hurt your site to have that there. But at least I'd make sure that between the URLs that you do want to have indexed, like the canonicals and the version of that list that you want to have indexed, make sure that it's at least there and that it's consistent there. So not that you've got the hreflang pointing from the grid version to the list version because then they're not equivalent pages. But it sounds like you have that covered.

AUDIENCE: Correct. Yeah. Thank you very much.


AUDIENCE: John, back on the site map questions. For a site that has had let's say, many or most of its links dropped from the index and then the site map was greatly trimmed down and then even more links were dropped, I mean, since it's-- we're talking about an e-commerce site here. Is that probably a strong indicator of quality issues barring other technical problems because those pages were just maybe not considered useful enough to the index?

JOHN MUELLER: So, I don't understand the question quite. So you're saying that you have a site that has a lot of pages and those pages are also in the sitemap file. And we've gone from indexing all of the site to a much smaller part of the site?

AUDIENCE: Yeah. Yes. Some people that were working on this site prior to my looking at it, had trimmed the site down somewhat and they'd also set the different parameter settings, which was causing thousands of pages to be shown. But then it went-- oh, and they also had images in the same sitemap file, marked as files instead of images, which was showing an excessive amount. But after they fixed all that, then it's still went down and the amount of indexed pages went from thousands to less than 100 and then even down from there. There was some quality issues as well, so--

JOHN MUELLER: It's hard to say. It sounds like that's a complication of different problems, which might include some technical issues that are there as well. So depending on how they trimmed the site down. Maybe they put like no index in certain parts of the site. Maybe that no index is kind of being propagated to parts of the site that you don't want to have no index. So what I tend to do in a case like that, when I run across that in the forums, is take the sitemap file and double-check to make sure that the sitemap file is representative for the website, that it really includes the URLs that should be indexed and then just take individual samples from the sitemap files and see if they're indexed. Double-check to see that these pages are actually technically OK, that they don't have a no index, that they're not 404, that they're not redirecting, that they don't have a broken canonical tag on them, all of those things. And usually by looking at a sample of pages, especially when you're talking about something going down from thousands to less than 100, then that's usually a sign that there's some systematic problem that you can pick up by looking at these samples.

AUDIENCE: And in the index, there are still a considerable amount of links that are showing up, but they're not showing up as indexed in the Webmaster Tools submitted.

JOHN MUELLER: So again, this sitemap, the sitemap information in Webmaster Tools is based on the exact URL that's in the sitemap file. So if you do a site query and you see 100,000 URLs there, that might be because we're indexing URLs that aren't one to one of your sitemap file. So if there's this kind of disconnect between the sitemap file and the canonicals or the indexed version that we happen to pick up on, then that's something you would see in Webmaster Tools there, that the submitted URLs is fairly high but the actual indexed URLs is really significantly lower, which usually means that you're submitting URLs that don't match what we pick up for indexing.

AUDIENCE: Yeah, there were some conflicting issues with the canonical text, actually.

JOHN MUELLER: That's something where I'd take the sitemap file, and just split it up into parts. You can submit the parts of the sitemap file separately in Webmaster Tools as well. And maybe you can find a section where it says, submitted 100 and indexed only one, and that means you can probably take pretty much any URL from that sitemap file and kind of double-check or diagnose what's going wrong based on that one.

AUDIENCE: All right, good. Thanks a lot.

JOHN MUELLER: Sure. Question here. If a blog post was published in 2013 and then updated at a similar time in 2014, should we note the date when it was updated on the page or should we change the original published date when it was updated to keep it looking fresh? So first of all, you don't need to make it look fresh. We'll usually recognize these kind of changes ourselves. But I'd still update, or maybe put an updated date on the blog post itself, so that it's clear that it was updated. I'd definitely also change the last modification date in the sitemap file if you're using a sitemap file there so that we can kind of pick up those changes, index them fairly quickly and recognize that. But again, this is not something that you need to artificially make it look fresh by putting a new date on something that hasn't changed at all, but rather it kind of helps us to understand what's happening on these pages. So we'll usually recognize those kind of changes anyway, but having a date makes it a little bit easier to help pinpoint those changes and figure out, for example, how frequently we should be crawling this page. So if this page changes every day and we thought it just changes monthly, then maybe we need to recognize that easier and start crawling more frequently.

AUDIENCE: Hey, John, how do you deal with scrapers of blog posts? I always thought that the solution to that to help you guys not get tricked is to file a DMCA. And if the DMCA gets approved, then you guys would not consider that. But last week you told me that even URLs that have been approved by your DMCA team are considered in the index and therefore I'm assuming that they're taken into consideration in Panda. We think when we have the ability to do disavow with that, how do we have the ability to inform you of scrapers?

JOHN MUELLER: I wouldn't worry about scrapers when it comes to like our quality algorithms. That's where we focus mostly on your website directly. So scrapers are always happening out there. There's always copies of the content anyway for technical reasons sometimes. Things like dub, dub, dub, non dub, dub, dub is a really common example. A lot of websites have multiple domains where they mirror their same content. And those are issues where we wouldn't say, this is a quality problem per se, because essentially the quality on your website might be perfectly fine, but the quality of the content on the scraper site is probably seen as being pretty bad. So that's not something where you need to worry about that when it comes to the quality algorithms. I'd still-- I mean if people are copying your content and using it in ways that you don't want them to do, I'd still look into the DMCA process to see that kind of works for you there, just to help clean those kind of issues up. But it's not something where I'd say you need to do this to have the Panda algorithms look better on your website.

AUDIENCE: But can you elaborate a little bit on that. If the Google doesn't understand who is the original source of the content, and this is same content spread across, wouldn't that dilute the quality of content on your site? I had an incident, someone recently sold me an example that of the website that had a manual action and one of the three examples on the manual action was a scraper.

JOHN MUELLER: That sounds like--

AUDIENCE: I'm pretty sure an algorithm can get tricked.

JOHN MUELLER: Yeah, I mean that's that sounds like something where, from a manual point of view, we should have used better examples. And that happens. We pull together these examples manually. So that kind of thing can happen there. But when it comes to the original content, theoretically, we could run across a situation where we rank a scraper higher than an original site. But I think, for the most part, that's been working fairly well. And I haven't seen a lot of specific complaints about recently. I know in the past, it was a little bit different. But we've worked really hard on being able to recognize original content and treating it appropriately. So if you run across situations where scrapers are ranking it higher for generic queries for your own content, that's something that's really useful for us to have, but I believe for the most part, we're catching those cases appropriately.

AUDIENCE: And Panda doesn't consider it low quality if they see the same content you have all over the place.

JOHN MUELLER: Yeah. I mean like I mentioned before, it's not just about the text. It's not just like being able to copy and paste the text and putting into Google and saying, there's only one result for this text. It's essentially about everything around outside that site, around the way that you present your content.

AUDIENCE: Good, thank you.

JOHN MUELLER: I think there was a question from Carlo.

AUDIENCE: Yeah, hi there, John. What was the question I put in the queue, the Q&A, or the question I put in the chat?

JOHN MUELLER: I don't know. I don't have the chat open. I don't see it.

AUDIENCE: I put one in the chat when we first started to hang out. It's just my [INAUDIBLE]. I know were not talking about Penguin, but since you've asked me for my question, and I'll chuck it in there. I've seen a good recovery, which is fantastic and I'm really happy about that. But obviously, it's not actually the keywords where I think my website is relevant for. So I'm just wondering if we're still being held back by Penguin to some degree. I'm guessing Penguin has levels of strictness. Have we done enough to remove most of that or are we now totally clear? Be fantastic just to know so I can basically move on from the whole incident.

JOHN MUELLER: Can you post your URL in the chat?

AUDIENCE: Yes, it's in the chat. It's the first question. If you just go straight to the topics there.



JOHN MUELLER: I don't know if I can pull up something real quick. We're still seeing some issues there. So I'd have to take a look at exactly what's happening there, but I think that's something where I wouldn't kind of count like the web spam issues, maybe the links that you found as being completely fixed. And maybe dig for some more to see if there's some more things that you can clean up there.

AUDIENCE: Always digging. Always digging. If you ever do get a chance to have a look, if you could put me a sample or two link, and give me another pointer or maybe I'm missing something totally obvious and I'm obviously not sure. But I was making a great recovery so I've obviously done something right.

JOHN MUELLER: Maybe disavow.

AUDIENCE: Maybe it's a problem with disavow, I'm not quite sure.

JOHN MUELLER: Sounds like you're on the right track then, but we're still seeing significant things there that you could probably be doing better. But it's hard to say without looking at the exact situation.

AUDIENCE: I appreciate you taking a look. Thank you.


AUDIENCE: John, I think a mirror would make a really good background for you there.


JOHN MUELLER: Like you get those tools.

JOHN MUELLER: Oh. I mean, we have to have various diagnostic tools internally to figure out where things are going wrong, what could be happening there. I guess for the most part, the tools I would use in the forums, for example, are more focused on technical issues, because that's the type of issue that I think most sites tend to have. That's the type of information you get in Webmaster Tools as well. So for the largest part, that's something people already have available if they knew where to look. And in many cases, they just don't know where to look. Or they kind of missed the email or Webmaster Tools says, hey, you no indexed your home page. Are you sure that was what you wanted to do? And we see a surprisingly large number of sites that you would think know how to make good websites, make mistakes like that.

AUDIENCE: Many people don't use Webmaster Tools sufficiently. I really liked the new mobile usability selection. That's really useful. Do you think we might see some of those become ranking factors?

JOHN MUELLER: I don't want to pre-announce anything there. But I could definitely see that as being useful. So especially if you're on a smartphone and we show you a site that we know ahead of time doesn't work really well on smartphone. That seems like something we could be doing better. So I wouldn't be surprised if something like that happened. But at the same time, I don't have anything specific to announce yet.

AUDIENCE: But because those things affect user's usability, dwell time and things like that, in some roundabout way, they at least do have some effects on--

JOHN MUELLER: There's always that indirect effect of a user going and clicking on your site, and saying, oh, I can't use this at all. This is crap. I'm never going to tell my friends about this. Whereas, maybe they go to another site and they say, oh, this works fantastically on my phone and I always browse the internet on my phone on my commute. So this is something I should tell my friends about. So that kind of indirect effect is definitely there. And as more and more people are using smartphones as their primary internet device, that's going to become really strong, I'd imagine. But taking that into account as a ranking factor directly is probably a whole different ball game.

AUDIENCE: Yeah. All right. Thanks.

JOHN MUELLER: All right. Question from Gary. Your city office co UK, ranks number one for virtual office in Wolverhampton. I'm probably pronouncing that wrong. Oh, Gary's not here. OK. In Google com--

AUDIENCE: I have him on Skype, John. I'm his proxy today.

JOHN MUELLER: OK. I changed the hreflang so that ranks for The site dropped to position five. Does your city office icon still have some penalty issues? I think when you change things like hreflangs back and forth like that, you'll probably see some effects in the search results. It's always possible that we also treat these domains slightly differently, based on maybe older issues that were still there that are still kind of lingering there. So I think if you're seeing changes like that, one possibility is to kind of leave it like that and kind of let it settle down better. Another possibility is to say, OK, this isn't working. I'll just revert it back to the way it was before because that was working a lot better. We just changed all content on our website. What if our competitor or anyone else copies our content, how does Google know which side put it originally and who are copying? Do we have to change our content again because someone copied our new content? No. This is something that we tend to pick up on fairly well, I think. We tend to recognize original content sources fairly well. So if you're seeing people copy your content, sometimes you can do things like a DMCA process, depending on, I don't know, the type of content and how it's legally handled. That's one way to kind of take that content out of Google. Maybe you could also apply that to the hoster and they'll take action as well. But in general, we try to recognize original content fairly well. And it's not the case that you have to constantly tweak your own content to stay slightly different than scrapers. And some scrapers essentially set up-- are not set up essentially as scrapers, but rather as, what do you call them, proxy servers, for example, that essentially go to the live content of another website. And that's something we also try to recognize ourselves and handle appropriately. So we wouldn't necessarily take those kind of sites out of the search results completely, because maybe they're useful for people who aren't able to access your site directly. So for instance, if your content is, let's say, blocked in one country and a proxy makes it available for users in that country, maybe that's a good way for those users to at least get to your content. But that doesn't necessarily mean that this is going to cause problems for your website just because someone else proxying your content somewhere else. So I wouldn't focus too much on these copies. If you think there's something legal that you can do about it with a DMCA process, that might be a step you could take. I can't give you legal advice there, so I don't know if that always applies to your website. But past that, I wouldn't like, artificially try to constantly change my content just to not match what scrapers have.

AUDIENCE: John, on that point, can Penguin-- I know you said Panda doesn't get tricked, can Penguin get tricked by seeing as a guest post when the offer-- when it appears that someone went and posted this piece of content and it has a rich anchor text as well because it was intended for internal use?

JOHN MUELLER: I don't know. I'd have to see some examples to look at that. That sounds like a pretty edge case. But I am happy to take a look at some examples if you spot something like that. I haven't run across that kind of content for use yet. But it's something where we understand that this situation is sometimes problematic and we try to take into account appropriately. Usually it's more problematic if this is really done on a large scale. If we see tons of guest blog posts, for example, that are kind of trying to pass page rank to your website, then that's starts looking a little bit sneaky.

AUDIENCE: But how would we spot it? We wouldn't know if it's holding us back or not.

JOHN MUELLER: I wouldn't worry about that. Like with all problems, If you spot that, and you recognize that this is something that could be seen both ways, then I would just put it in disavow file and move on. That's, I think, one of the advantages of the disavow file is, these things don't need to keep you up at night. If you disavow them, you essentially have it solved.

AUDIENCE: John, like would you disavow something like this?

JOHN MUELLER: I'd need to take a look at the links directly. That looks like a doc file, like a Word file.

AUDIENCE: It's a Word file with links inside it. So they copied one of our articles. It has a lot of rich anchor text because it was for internal use and they're legitimate. But they copied--

JOHN MUELLER: Yeah. I wouldn't worry about something like that.

AUDIENCE: You wouldn't disavow it?

JOHN MUELLER: No. I mean, like I mentioned before, if you're unsure, if you ever like run into situations where like, ah, should I do something about this or not. If you put it in your disavow file, it's essentially solved. So you kind of take that load off your shoulders and you can sleep better at night by just knowing that it's essentially fixed. You don't have to worry about it anymore.

AUDIENCE: Right. But it creates additional burden whether we need to be doing that and checking to find all of these places or not.

JOHN MUELLER: If you see them and you think they're problematic, I'd take action, but otherwise if you, like, are considering watching your links every day, then that's probably overdoing it. So just because you're worried that these things might show up, isn't something that I'd say you need to be kind of tracking this all the time. OK. Our disavow file is being constantly read or are they only read on a Penguin refresh? Oh, no, another Penguin question. We do process these automatically all the time. So there, essentially you submit them and they're live. You see the error status right away in Webmaster Tools and the next time we crawl, those URLs that are mentioned there, we kind of discount those links that are on those pages. So that happens automatically. That happens right away when you submit the files. The Penguin data just essentially takes into account the data that has been collected until then, and reflects that with the next algorithm update. But essentially all our other algorithms that take [INAUDIBLE] into account, already treat those as being nofollowed or similar to nofollowed already so, that's something you don't really need to worry about them. Another Penguin issue about a site. I'll copy that out. Just to make sure we don't miss that. Maybe look into it. Is there a potential duplicate content issue if I add client testimonials given to me on LinkedIn to my website? I'm not doing this for SEO reasons. The testimonials are for people who come to my website but don't visit me on LinkedIn. Sure. You can do that. That sounds like a good idea. And this isn't something where I'd say, duplicate content will penalize your website. If you have that, essentially this is the kind of content that make sense to put on a website. So, by all means, put that on there, make it easier for people to understand who you are, how you do business, I think that always helps. How often does a new mobile usability feature in Webmaster Tools update? I have 61 pages with errors. I think I fixed those now and would like to know if what I did worked. I believe it's updated every couple of days at the moment. We might be able to speed that up a little bit. I imagine in the beginning, it's a little bit irregular with the updates, but that's something that should become fairly regular, like the other features in Webmaster Tools. If we have a sitemap that only contains a part of a website, just the product pages, for example? That's fine. If you let us know about those pages, the sitemap is a file that is a great way to do that. It doesn't have to contain the whole website. We're not going to drop any pages from our index because they're not in the sitemap file or no longer in a sitemap file. It helps us to know about all pages on your website if you can do that. But if you can just do it for a part of your website, that's fine too.

AUDIENCE: John, did you say before that hreflang takes time to settle?

JOHN MUELLER: Any time you make changes with regards to which URLs are shown, which URLs are kind of chosen as canonicals that can take a bit of time to kind of settle down. So that's not something you'd see the change immediately.

AUDIENCE: So there could be fluctuations that you would see. Even though I've seen the change and the swap happen, the dot com is obviously the powerful site the credit UK doesn't really have any links or anything going to it. So I'm surprised to see the dot com that's had the penalty previously, not kind of [INAUDIBLE] swap out one for one with the credit UK. And the only signal would be that your city office dot credit UK is a co dot UK signal for a UK inquiry. Otherwise, there's really nothing that's different at all about the two sites.

JOHN MUELLER: Yeah, it's hard to say. I imagine we have various signals that we also kind of aggregate together to try to figure out how we should handle that. For example, sometimes we run across sites that use hreflang incorrectly. And we need to be able to recognize that and treat them slightly differently. So that's something where I imagine, to a large extent, it'll just flip over when we kind of pick up that new canonical. But some of that information also takes time to settle down and to be kind of solidified and make sure that this is actually trustworthy information that we could use like this. So there's always some-- like, you'll see a jump happening at one point and then some settling down when it actually reaches where it matches all of the signals that we have and everything's kind of consistent.

AUDIENCE: Sure. So I should just sit on it for maybe a couple of weeks and see what happens. And if it doesn't work out, then I know the city office dot credit UK is stronger signal.

JOHN MUELLER: I leave that up to you. If this is something that's causing serious problems, by all means revert it. If it's something that you feel comfortable with testing or trying out, then maybe you ought to leave it like that. I don't want to kind of make that decision for you.

AUDIENCE: Yeah. No. The idea is that I want to have my dot com site to be ranking against the hreflang EN inquires or English inquiries and my credit UK to manage the ENGB inquiries. But I'm not confident to make that switch after just testing this one page that we don't consider it to be vital to our business. So that's giving me the feeling that I don't want to now switch my dot com back to all EN inquiries because the difference between being number one and number five is somewhere in the region of about 90% of the traffic.

JOHN MUELLER: Yeah. I mean that's a good reason to do individual page tests like that. So that's one way, if this is not a critical page for your site and you can live with it kind of taking a while to settle down, that will give you information about whether or not it makes sense to do this move for the whole site or maybe to revert that, even for that testing page.

AUDIENCE: All right. Thanks, John. I appreciate it.

JOHN MUELLER: Sure. And here's a not Penguin question. Can you confirm if there's a Google sandbox and if so, what the approximate time span is for a relatively new website in a highly competitive niche to get out of it? We don't have a sandbox specifically. It's not something where we'd say we have an algorithm that holds back all the websites. But there might be some effects that look similar. So it really kind of depends on what website you're looking at, what kind of problems you're looking at there. For the largest part, if we can recognize that a new website is really fantastic and really something that everyone loves, that we should be showing more, then we'll try to reflect that right away. And that could mean that we show this site within hours of going live for competive queries even. But that doesn't mean that all websites will be shown highly ranked hours after they go live. For some sites, it might just take a long time. For some other queries, for example, it might be that we just have so much information about the existing search results and no information about your website that it's really hard for us to judge where we should be placing it. And sometimes what we'll see is that we'll make guesses about where we should be placing this website, and over the course of a couple weeks or so, it'll kind of settle down to the place where we think, OK, this really is a place where it matches. And you might see that it kind of goes up a little bit, or you might see that it starts a little bit higher and then goes down a little bit. That's the kind of thing that's really hard for us to judge If we see a really new website that we don't have a lot of information about. So that's something that could be seen as a kind of a sandboxy effect. But it's not the case that we have any algorithm that says, oh, you're a new website, therefore you're going to be locked down to page number 10 for the first couple of months. Because we think that doesn't really make sense. That doesn't really do justice for the new websites that are coming out there, that some of which might be really fantastic and kind of blow away the existing competition in some of these areas. So we try to find the right approach to there, but with new websites it's sometimes tricky to find the right place to put them in this exercise. I'd like to know how you guys rank images when search for on Google and how do you know what the images submitted in the sitemaps are original not downloaded from Google? We do try to recognize original images just like any other kind of content and treat them appropriately. I don't know what the specific ranking factors are for like, image search. Because I imagine slightly different than web search because it's a lot harder for us to recognize what an image is actually about. In general, what I'd recommend doing is making sure you have some kind of unique and valuable landing page for your images, so that when you have something really interesting on your website, a really interesting image, we can kind of pick up that image and understand what that image is about, based on the context of that page. So problematic, for example, would be if you just have a gallery page, where you have a grid of hundreds of images and there's no information about those images. Much better would be, for example, to have some kind of an image landing page where the image is in the foreground, there's some text about that image that tells us what the context is of this image. And that naturally happens, for example, if you put an image into a blog post, for example. If you have a blog post about a topic and you have your nice image that you have next to that blog post, we automatically see a lot of information and can associate that with the image. So the easier you can make it, so that we can associate information on your pages, usually textual information with the image, the more likely we can use that for image search as well. And again, like the extreme examples of no information at all on a page versus information on a page, that's one thing to keep in mind. Sometimes with regards to the original images, we also kind of have that clash where maybe the original photographer puts it up in a gallery page on his site, and the page just has as a title like DSC, and some long number and in the description below, it basically has the attributes of the photograph, what aperture it used, what speed it was taken in, what device it was taken with, but no real information about the image itself. And if someone were to take that image and put it into a blog post, then we'd have a lot more information about that image, based on the blog post. So if someone goes to image search and searches for something that would bring up this image, and we have that photographer's landing page that has absolutely no textual information, and compare it to a blog post that has a lot of information, then chances are, we'll show the blog post instead of the original, maybe the photographer's landing page, just because that has the information that the user was searching for. And that provides us additional context about what this image is about. So that's something to kind of keep in mind. Where if you're putting original images up, make sure that the context of those images is available there as well, both for us, so that our search engines can understand it, but also for the user, so that when they click on the image to actually understand what this image is about and see something around that image. OK wow, we're way over time. And lots of questions left, even by skipping the Penguin questions. Wow. Let's just take one last question from you guys and then well--

AUDIENCE: Can I have the first or send it to someone else. So my question is that we have a UK site as well, which does essentially exactly the same thing. But obviously the content is different because it's all UK activities rather than US. There's not really anything we do differently in the UK than we do in the US, so we create unique content, we got a big social presence, et cetera. Am I at risk of that one having exactly the same issues?

JOHN MUELLER: I don't think so, no. No. That should be fine

AUDIENCE: That was very quick.


AUDIENCE: Hey John, can I ask a question?


AUDIENCE: It's in fact related to HTTP and HTTPS. Maybe you've answered this earlier, but my question is that it is really important for all of the websites to move over to HTTPS because I'm reading around some articles on the web that are suggesting that after moving to HTTPS, actually they are facing their problems and their [INAUDIBLE] so would you comment on this, regarding the importance of moving to HTTPS.

JOHN MUELLER: I think in the midterm, you'll see more or less most sites moving to HTTPS. That's a development that's not going to be really held back by any of the other issues. To a large extent, moving to HTTPS shouldn't have any negative effect on your search results. We have seen individual cases where it didn't quite work out like that and that's the kind of a thing where we need to have feedback and make sure that our algorithms are doing the right thing. But for the largest part, when we look at our kind of aggregated information and check the sites there, that seems to be working really well. So I think in the future, more and more sites are going to be moving to HTTPS in general, and they're going to be starting on HTTPS directly. So that's kind of a development that's not going to change. It's not like a trend. It's not something that's fashionable today, that won't be fashionable next week. That's a development that's just going to happen. And whether or not you want to be like on the forefront of that and move along with that, learning kind of the problems that are associated with that, as one of the first people in there, or whether you want to do that later on when everything is working really well, that's kind of up to you. Personally, I like to be more on the forefront of things, and I think it builds up your knowledge base a little bit by having experience of problems that might show up there. But ultimately, it's up to you and it depends on the website. For some websites, I think it's critical that they move as quickly as possible. If you're handling user content, if you're doing something that's personalized to the user, I think that's something that should be in a secured connection as quickly as possible. For other websites, maybe it doesn't make that much sense now, but I think in the future, people are going to expect that.

AUDIENCE: OK. Thank you.

JOHN MUELLER: All right. Let's take a break here then. Thank you all for your questions and the lively discussions. It's been really interesting again and I wish you guys a great weekend.

AUDIENCE: Thank you, John.

AUDIENCE: Have a great weekend, mate.

JOHN MUELLER: Bye, everyone. | Copyright 2019