Reconsideration Requests
Show Video

Google+ Hangouts - Office Hours - 10 October 2014

Direct link to this YouTube Video »

Key Questions Below

All questions have Show Video links that will fast forward to the appropriate place in the video.
Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video

JOHN MUELLER: Welcome everyone to today's Google Webmaster Central office hours Hangouts. My name is John Mueller. I am a webmaster trends analysts here at Google Switzerland, and part of what I do is I talk to webmasters like you all to make sure that you have information you need to make fantastic websites and so that our engineers have information from you guys to see what we need to do to improve our search results. So one thing we talked about, I think in one of the previous Hangouts was to do some more general information, some themes maybe. I compiled a short list of best practices and myths that I'd love to share with you guys just start off with, and after that, we'll start off with the Q and A. There are lots of questions submitted already, and feel free to submit more as we go along. All right. Let's see if I can switch over to the presentation here. So I guess we can start off with something basic. Essentially something you would assume one of the things we love to see is that we can actually look at the content, for example. So we recommend using fast and reusable technologies the way to kind of present your content in ways that works across all different types of devices. So Googlebot can render pages now so JavaScript isn't something you need to avoid completely, but I'd still make sure that it can actually view it, so check on a smartphone, check with the Fetch as Googlebot feature in Webmaster Tools, check with the tool to see that it's actually loading in a reasonable time. Responsive Web Design is a great way to create one website that you can reuse across a number of devices. So if you don't have anything specific for mobile yet, that might be something to look into. I'd recommend avoiding Flash and other rich media for textual content. So if there's text on your page and you'd like to have it indexed directly, make sure it's actually on the page, make sure it's loaded as HTML somehow so that we can pick it up and index it as good as we can. Someone has a bit of noise in the background. Let me just mute you. Feel free to unmute if you have any questions. If videos are critical, make sure that they work everywhere. Certain types of videos don't work on mobile phones, for example. So there's a special HTML5 tag you can use to kind of provide alternate medias for those devices, or if you use a common video system like YouTube or Vimeo, then those almost always work as well. And regarding your website architecture, most CMS systems work out of box nowadays. You don't need to tweak anything. You don't need to re change everything to use search engine-friendly URLs or anything like that. Most of them pretty much just work. For indexing, we recommend using sitemaps and RSS or Atom feeds. Sitemaps let us know about all of your URLs, so these are essentially files where you this everything that you have on your website and you let us know about that. We can use that to recognize URLs that we might have missed along the way. RSS and Atom feeds are a great way to let us know about new or an updated URL. So if you push both of these at the same time, then usually what will happen is we'll crawl the sitemap files regularly, but we'll crawl their feed a lot more frequently because usually it's a smaller file. It's easier for your server to server that. We can use something like pubsubhubbub to fetch it essentially immediately once you publish something. An important aspect there is to make sure you use the right dates in both of these files, so don't just use a current server date, make sure this is actually the date that matches your primary content. So if you have, for example, a sidebar that changes randomly according to daily news, that's not something you'd use is a change for these pages. It should really be the primary content. You can include comments if you want, but essentially it should be the primary content, the data [INAUDIBLE]. And the indexed URL account confirms indexing of exactly the URLs that you submitted. If the index count doesn't match what you think you have indexed, for example, if you check an index status, then usually that's a sign that the URLs you're submitting there are slightly different than the ones we find during crawling. So that's something where I really double check the exact URLs that are submitted there, I'm happy also help with that if you have a thread in a forum or on Google+ where you have an example where it looks like the index count is lower than it should be, but in almost all cases that I've looked at so far, essentially the URLs have to match exactly what was actually found during crawling. We recommend using the rel canonical where you can. This is a great way to let us know about the preferred URL that you'd like to have indexed. If you use tracking parameters like analytics, then that's something that sometimes is shown. Sometimes people will link to those URLs with tracking parameters, and the rel canonical lets us know that this is actually the URL that you want to have indexed. It can be the right upper lowercase. It could be on the right host name with the right protocol. That's essentially the one you want. It's fine to have that on the page itself. Make sure you set up correctly so that they don't all point at the home page. That's a common mistake we see, and it's important that the pages be equivalent, so don't do this across different types of pages. So if you have a blue shoe and a red shoe and different pages for that, those pages aren't really equivalent, so I'd recommend not setting a rel canonical across those pages. Sorry. Was there a question?

AUDIENCE: Oh, yes.


AUDIENCE: I'd like to ask you about canonical setting. My question is right or wrong. So can I send a group job?



JOHN MUELLER: Oh, wow. OK. Long question. Best practice for canonical setting, small ecommerce site with not so many items with pagination, faceted navigation. I'd have to take a look at exactly what you're looking at there, but I'd recommend taking a look at our blog posts we did, a blog post on faceted navigation earlier this year.

AUDIENCE: Yeah. I've already read this. But in my example, the faceted stops are not so important to the main canonical page, so does it matter if it is canonical or in [INAUDIBLE]?

JOHN MUELLER: Yeah. It should be essentially equivalent content, so one thing the engineering sometimes do is they take a look at the text that actually matches between these pages and if the text matches, then that's a good sign. If the order is slightly different, then that's less of a problem, but the text itself should be matching. So if you have different items in a different order, that's fine. If you have completely different items, then that's something where I'd say don't use a rel canonical there.

AUDIENCE: I see. Thanks.

JOHN MUELLER: All right. Let's look at the next one here. This is one we see almost in every Hangout, a duplicate content question. So there's a myth that duplicate content will penalize your site, and that's definitely not the case. I certainly wouldn't worry about duplicate content within your website. We're pretty good at recognizing these kind of duplicates and ignoring them. It still makes sense to clean them up for crawling, and if you're reusing content from other websites, if that's the kind of duplicate content you have, then just make sure you're being a reasonable with that. Make sure you're using that with reasonable amounts and not just taking all of your content from other people's websites. With regards to where duplicate content does affect your site, I made this nice and simple diagram, but maybe not so simple. Essentially this is our somewhat simplified web search pipeline. We start off with a bunch of URLs that we know about from your website, we set them to a scheduler which makes sure that we don't crash your server by crawling everything all the time, and Googlebot goes off and crawls them from your website and brings the results back and says, oh, I found a bunch of new URLs. These are here. It also takes the content that it found on those pages and passes them on to indexing. And essentially what happens is, if you have a lot of duplicate content with different URL parameters, for example, we'll collect a ton of URLs for your website that we'll know about with all these different variations. We try to schedule them regularly so that we can like double check to make sure that we're not missing anything, and we'll essentially be kind of stuck in this cycle here crawling all of these duplicates instead of being able to focus on your main content. So if you have things that change quickly on your website, we might be stuck here kind of crawling all over your duplicates in the meantime, which makes it a little bit slower for us to pick up your actual content. So that's one aspect where duplicate content can make a difference. This is especially true if you have a lot of duplication. If you just have like dub, dub, dub, non dub, dub, dub, those duplicates then that's less of a problem. That's essentially everything twice, but it's not that much of a problem if you have everything 10 times or 20 times or 100 times, then that's going to make a big difference in how you can crawl your website. Then once we've picked up the content, we send it off to indexing. Indexing tries to recognize duplicates as well and will reject those kind of duplicates and say, hey, I already know about this URL. I already know about this content under a different URL. I don't really need to keep an extra copy of this. So this is another place where we'll run through your duplicates. We'll pass them onto the system here, and at this point, the system will say, well, I don't actually need this content. We didn't need to waste our time essentially trying to pick it up. Sometimes what also happens is that pages aren't completely duplicate, and in cases like that, for example, you have one article that you post on your US English website and you post the same article on your UK English website, which makes sense. It's your company website. The same article might be relevant for both of your audiences, that's essentially fine. And in those cases, we do actually index some, but we try to filter them out during the search results. So that looks a little bit like this with a very simplified figure here where essentially we have these three pages that we were crawling and indexing and the green part here is essentially the content that's the same across these pages. So this could be, for example, the same article that you have on your UK English page, your US English page, and Australia English page, for example. So the article is the same. The headings are different. There's some other content on these pages maybe. And what happens in these cases we'll index all of these three pages and in the search results, if someone searches for part the generic content, we'll try to match one of these URLs depending on whichever one we think is the most relevant that might be, for example, the location-- these are different country versions-- and we'll just show that one. So one of these will be shown here. That's one we'll show. We'll show other search results as well, but essentially, we're just picking one of these to show in the search results. And what happens then if someone searches for part of the generic content that also something specific to the page itself, then, of course, we'll show that specific version in the search results. So the generic text, of course, that people were searching for as well something unique. So if this article is an article about a certain type of shoe, for example, if someone searches for that shoe type directly, we'll show one of these pages depending on whatever our algorithms think makes sense. If someone is searching for this type of shoe and mentions that they want to buy it in the UK, then we'll probably pick the version that actually has UK address on it and show that one in the search results. So this isn't something where we're penalizing your website, but essentially we're taking these three pages and trying to pick the best one to show the search results and the others will be filtered out because it doesn't make sense to show the same content to the user multiple times. So I hope that kind of helps with the duplicate content question.

AUDIENCE: And John, and to have any sort of play in and this whole thing, you know, when I looked at quality across all the pages, would you in an ideal world be better off doing something different? I mean, in our circumstances, obviously, we have locations in close proximity to each other. Or perhaps if you're looking for within a certain distance office space that has one desk available, you're going to start to get these groupings of locations where there is duplicate content that quite suitably across the board. And I wonder whether people still are talking about this in multiple forums, is Panda still going to have some effect on your site, in other words, on the quality.

JOHN MUELLER: So primarily this type of filtering that we do for duplicate content is a technical thing that we do just to find the right match there, and it's not something that we primarily use as a quality factor. There are lots of really good reasons why you'd duplicate some of your content across different sites. There are technical reasons, there are marketing reasons, there might be legal reasons why you would want to do that. That's essentially not something that we would say this technical thing that you're doing is a sign of lower quality content. On the other hand, this is something that users might notice if you do it excessively, if you create doorway pages, if you kind of duplicate content from other people's websites and act like it's your content. We see that a lot with Wikipedia content, for example. And those are the type of things where the duplicate content itself isn't really the problem, but essentially the problem is you're not providing any unique value of your own on these pages. You're not providing a reason for us to actually index and to build these pages in the search results. In this case, where you have multiple versions for different countries, for example, there's a good reason to have all of these different pages. There's a good reason to have them indexed. There's a good reason to have them shown in search sometimes because it is very relevant for the type of queries here. But if you kind of take this excessively and you create pages for every city and actually the content itself isn't really relevant for every city, then that's something where our algorithms would start seeing that as a sign of lower quality content, maybe even WebSpan. So that's a situation you'd want to watch out for. Technically, having content duplicated by itself is not a reason for us to say this is lower quality content. There can always be really good reasons why you'd want to do that.

AUDIENCE: Sure. Thanks, John. Is there a sort of a set amount of words you use as a guide to what you would consider to be duplicate content. I mean, would anything let's say over five words that are similar be considered to be duplicate content or would you simply take it right down to a single stroke to letter words, sentences, or do you take it in batches of sort of 10 or 20 words in a row? Does that make sense, that question?

JOHN MUELLER: Yeah. So when it comes to technically recognizing duplicate content like this, we use a variety of ways to do that, and sometimes we'll even look at things like saying, well, this looks very similar. It's not exactly duplicate, but it's very similar, therefore, we'll treat it as being the same. So that's something where we wouldn't say you need to focus on a certain number of words or where you'd need to like alternate every other sentence or something like that. I don't think this type of filtering is something you'd need to kind of work around. This is essentially the normal part of web search, a normal thing that you don't need to kind of like eliminate from your website.

AUDIENCE: Yeah. OK. Excellent. Thanks, John.


AUDIENCE: Just a question on when you mentioned duplicate content from other websites. Would you recommend putting a kind of locking citationing back to that website just to kind of ensure that you kind of want to show that you're not trying to steal that content, you're actually just referencing it?

JOHN MUELLER: You can do that. It's not something that our algorithms specifically look for, but I think that's always a good practice. It helps people to understand your content a little bit better, but it's not something where we technically watch out for. And similarly using something like I think the HTML5 site tag is something you can do if you want to do that. It's not something that we would say, oh, this looks like a quote from another website, therefore, we shouldn't count it for this website. Sometimes discussions around a quote are just as valuable as a quote itself. So it's something I think is a good practice, but it's not something that our algorithms would specifically look for.

AUDIENCE: OK. No, that's fine.

JOHN MUELLER: All right. Let's look at a few more of these. Robots.txt, we have a lot around robots.txt nowadays. So it used to be that people would say they used robots.txt to eliminate duplicate content. We'd see a lot of things like this on the side where the robots.txt file disallows everything with parameters in it. And that's essentially causing more problems than it solves, because if we can't crawl those pages, if we can't recognize that they're duplicate, and we can't filter them out afterwards. Well, essentially you have your original content, all of these duplicates with the parameters, which will index just with the URL alone, and we won't know that they actually belong together. So if someone were to link to one of these URLs with a parameter, we wouldn't know that we can actually forward this link onto your main content. We'd say, oh well, someone is linking to this URL that happens to be roboted. It must be relevant, so maybe we'll show it in the search results as well. So robots.txt is a really bad way of dealing with duplicate content. Best is, of course, to have a clean URL structure from the start. That's not always possible if you're working with an existing site. Using 301 redirects is a great way to clean that up. Using rel canonical is very useful. The parameter handling tool also helps you in situations where you might not be able to use redirects or rel canonical.

AUDIENCE: John, just going back to that. You're saying just let Google crawl everything, because we do have some of that stuff blocked in robots where the dropdowns for search and dropdowns for filtering are all being or were being indexed, and so we'd have thousands of pages. And previously that was a general feeling that that was duplicate content, so it doesn't look good. Particularly if you go into the web master talks forums and ask that question, you get jumped on straight away. The general feeling in those forums is, block it, get rid of it, don't have it expire if it's duplicate.

JOHN MUELLER: Yeah. I think it's always a good practice to clean up duplicate content if you recognize it. I think that's something I'd recommend doing if you can do that.

AUDIENCE: No, the duplicate I'm talking about is category pages where someone has, like on our side, for example, which I know you know, someone sorts everything in California by price, but then by location, and then by something, activity type. You've got three different filters there, which gives half of [INAUDIBLE] depending on how you do it. You can't clean that up. You have to let people, the users, use those dropdowns, because they're useful for the user. So you either let Google crawl those dropdowns and crawl a half a million URLs and decide, or just you say actually it's one, or do you call everything in canonical back to the main category?

JOHN MUELLER: It depends. So again, we have a blog post I think on faceted navigation from a while back. I'd double check that. I'm really exactly--

AUDIENCE: The one with the gummy candy that I think was referenced there.

JOHN MUELLER: No, I think it's a newer one from this year or late last year maybe, but it kind of goes into the faceted navigation aspect there where it's tricky sometimes because we have all these different categories and filters and options and ways to kind of sort this content out. But I touch upon that I think a little bit back here, so we can look at that a bit later as well. But those are always tricky to handle, the faceted navigation and how much of that should you allow it to crawl, how much of that should be no index or rel canonical. That's always hard.

AUDIENCE: Is there a [INAUDIBLE] or don't we just stick it in [INAUDIBLE]?

JOHN MUELLER: it really depends on the website. Some content it makes sense to index separately because it provides value when you index it like that or people are searching for events in California then having a California landing page for that.

AUDIENCE: Right. Which we know we do have.

JOHN MUELLER: Yeah. That makes sense.

AUDIENCE: But some people in California are just actually looking by price and they're not that.

JOHN MUELLER: Yeah. But I think that's the situation where people would go to your website and they kind of use it first where nobody would be going to search and say, I'd like to have all events in California listed by price in search.

AUDIENCE: No. I mean, it does happen a little bit from everything in California with gifts under $100. People would do search for that stuff in particular, these days search for a budget because they assume Google knows everything. So it will do it.

JOHN MUELLER: Yeah. I mean, that's something where you have to make a judgment call yourself, like how relevant are these specific pages or how random are they. Are they're just like people entering search queries and we're indexing every word that people are searching for, and that's probably not that useful, but that really depends a lot on your website and how you have that set up.

AUDIENCE: But in general, best practice is to canonical all of the search results into the main category.

JOHN MUELLER: It depends. I think this is a good topic for another Hangout. Yeah. I can see that there are lots of aspects here that we could cover. So let me just go through these and--

AUDIENCE: One little bit of advice for everybody regarding that robot subtext though is that I actually blocked a hell of a lot of pages on my site in the effort to stop Googlebot from crawling all of those pages and wasting its time doing it. What I've done by homing in on what Googlebot needs to look at, it's actually now indexing calling my pages faster than ever before because it's not wasting time crawling pages it doesn't need to, and we've seen an incredible change in robust calls as a result of that. So in my experience, if you know what you're doing, block whatever you can and you're going to allow Googlebot to really do a good job on your site. That's my opinion.

JOHN MUELLER: I'd have to take a look at how you implemented that before saying, yes, but the kind of the caveat if you know what they're doing applies to a lot of things especially around websites.


JOHN MUELLER: OK. So let's go through some more of these. With regard to robots.txt, we especially want to be able to crawl CSS JavaScript pages nowadays because we want to know what they just look like, which is especially important when you have a mobile-friendly page. Because if we can't crawl your CSS and JavaScripts, we can't recognize that this page is a really great mobile page, then we wouldn't be able to be treated as such and search them. Another aspect is error pages where if you have a URL, for example, where all of your 404 pages redirect to, if we can't crawl that URL, we can't recognize that these URLs are errors, and we will try to recrawl those URLs more often than we might otherwise. So being able to see that a URL as an error is really useful for us and not something that causes problems for you. Mobile pages, we sometimes see that the end.domain, for example, is blocked by robots.txt. That's a problem. International and translations, we still occasionally see that that people say, well, this is my main page, and this is my page for Germany, but the German version is just a translation, therefore I'll kind of block Google from kind of crawling that. That might be a problem because then we can't see the German version. With regards to robots.txt threads practices, if your content shouldn't be seen by Google at all, then by all means mark it. If there is, for example, legal reasons why you don't want that indexed, that's something you might want to just block it by robots.txt. If you have resource-expensive content, for example, complicated searches, if you have tools that take a long time to run, then those are type of things you might also want to block by robots.txt so that Googlebot doesn't go through your site and say, oh well, let me try possible words that I found on the internet and insert them into your search page because I might be able to find something new there. That might cause a lot of kind of CPU usage on your website slowing things down. So that's the type of thing where you'd probably want to block that with robots.txt. Robots.txt doesn't prevent indexing so if you don't want it indexed then I'd recommend using no index or server-side authentication. For example, if there is something confidential on your website, the robust text file isn't going to prevent it from ever showing up in search. You really need to block that on the server itself using something like a password so that people, when they find the URL can't actually access the content. As I mentioned, don't block JavaScript, CSS, other embedded resources. If you use AJAX, don't block those replies from being crawled so that we can actually pick up all of this content and use it for indexing. Another myth we always hear is that my website worked for the last five years. Why is it suddenly not showing up in search anymore, and where webmasters essentially say, well, it's been working so far, so I'm not going to change anything. And it's important to keep in mind that the web constantly changes. Just because it worked earlier doesn't mean it will continue working. Google also constantly works on its algorithms, and the user needs to change over time as well. So I definitely recommend staying on top of things and making sure that you're not like just like sticking to your old version out of unnecessary reason. So make sure that you're kind of going with the times and really offering something that users want now in a way that they can use now, which sometimes means kind of enabling mobile-friendly websites, for example, to let all the new users who are using smartphones as a primary device also get your content. Some other myths that we see regularly, shared IP addresses, that's fine. We know that there's a limited number of IP addresses. I wouldn't worry about it if someone else is using the same IP address. That's something that happens on a lot of hosters. Too many 404 pages, that's also fine unless, of course, these pages are ones that you want to have indexed. So that's something you want to watch out for, but if we're crawling pages that shouldn't exist, and we see a 404 and we report that in Webmaster Tools, that's fine. That's the way it should be. Affiliate sites are also OK from our point of view, but you should really have your own content. So don't be an affiliate site and just copy all of the affiliate content as everyone else has. Really make sure that you're providing something useful of your own. The value should be with your content, not with the link that you're providing. Disavow files, we sometimes see webmasters say, I don't want to submit one because then Google would think I did something wrong, and that's totally wrong. You shouldn't hold off on using a disavow file. If you find something that is problematic that's linking to your website that you don't want to be associated with, go ahead and disavow that. For us, it's primarily a technical tool. it takes these links out of our system. It treats them similarity to no follow links and then you don't have to worry about those links anymore. So even if those links are things that you didn't have anything to do with, maybe a previous SEO is set up and you don't want to kind of admit that maybe they did something wrong, use of this and make sure that they're out of the system so that you don't have to worry about it. The order of text in an HTML file isn't important. You can put your main content on the top or the bottom. We can get pretty large HTML files in the meantime and still recognize the content there, so that's not something where you have to kind of micro optimize at that level. Keyword density is something we always hear regularly, just write naturally instead.

AUDIENCE: So in order to manage HTML5 file, you're saying text though. That's only from a coding point of view. You're not referencing it from a visual point of view, so the content text wise I think you've discussed before is essentially probably better being higher up rather than from a visual aspect?

JOHN MUELLER: I'd definitely make sure that at least part of your primary content is visible when the user first lands on your page so that when they click on the search result, they can recognize, oh, this is what I was looking for. And from that point of view, that's something that's generally higher up. It doesn't have to be the first thing on the page. But essentially what I'm mentioned here is sometimes people think that if they move the div with the primary content to the top of their page and then use CSS to show it at the right position, then that would be better than just having a div wherever it is in the HTML, and that's not something you need to worry about.

AUDIENCE: Yeah. I mean, we're actually internally discussing right now we're about to put nice, large images at the very top of our location pages, and this is one of the questions I actually wanted to put to you today. They're going to be somewhere in the region of sort of an 800 byte, 400 view. So the first thing really you're going to see is this beautiful image of a location that you were actually looking for and our customers identify better with an image than they do with a lot of text but more on the term they're just interested in pricing. So are we going to be affected by maybe Hummingbird or something like that by doing this?

JOHN MUELLER: No. You should be fine.


JOHN MUELLER: Yeah. I mean, this is something where if this image is your primary content for that page, that's fine. I wouldn't do it in a way that the logo is the primary image on this page. So if your company's logo is taking up the whole page and you have some random information from a sidebar showing up on the first page, then that's probably not that great, but if the image of this specific location, the image matching this specific content is primarily visible, that's fine.

AUDIENCE: Yeah. So Googlebot is pretty much or some of your crawls are able to kind of distinguish the difference between what is a consistent top image as a logo and what is a unique image to that specific page.


AUDIENCE: OK. Wonderful Thanks.

JOHN MUELLER: Another one I didn't put on here, but we get a lot of questions about is the keywords meta tag. It's one of our most popular blog posts even now, and essentially we don't use the keyword meta tag at all for ranking. So I imagine some of you well know that. If you're new to this area, then that might be something where you think, oh, this a great way to put keywords for ranking, but we don't use that at all. Let's see. This is the last slide so--



AUDIENCE: Which other tags would you say are almost completely irrelevant?

JOHN MUELLER: There are a lot of tags out there.

AUDIENCE: Yes, a lot of them like the ones that say follow and abating and stuff like that, things that--

JOHN MUELLER: Those are the ones that we essentially ignore. We ignore, what is it, the revisit after tag. That's something that's very commonly used. That's something we don't use at all. I'd have to think which ones we don't use. That's always harder than which ones we do use. But essentially we try to look at the primary content on the pages instead and not focus so much on the meta tags there.

AUDIENCE: In fact, it's important, but in many cases, it doesn't get used. It's not relevant enough to query. Is that right?

JOHN MUELLER: Which one did you mean?

AUDIENCE: This description.

JOHN MUELLER: Description. Yeah, we use that for the snippet in the search results, but we don't use that for ranking. So if you have a specific snippet you want to have shown, that's something you can use, but that's not something where you need to stuff any keywords. Essentially make it so that users will recognize what your content is about. And maybe it includes a keyword say we're searching for so that we know this is relevant for their query.

AUDIENCE: A lot of people have used different tools that will put matching keywords and all their tags and URL, and surprisingly this is still very common. There's the popular SEO tool used in WordPress, and I suggest people not use those because it makes everything very much the same.

JOHN MUELLER: Yeah. That's something where I'd primarily focus on what works for your users. Sometimes it's easy for users to work with wordy URLs and with like identifiers in the URL, but essentially that's something that we wouldn't focus on primarily. So if you're spending a lot of time tweaking those kind of keywords, you're probably spending time on something that we're not really valuing that much. All right. Let me go through these last four and then we can jump to the Q and A. One thing that I think is important is make sure you have all the versions of your site listed in Webmaster Tools. We treat these sites as separate sites, so you'll have different information potentially for some of these sites. If you have a clear canonical setup for your website, then we'll have most of your information in that canonical version. If you've never set up a canonical for your website, then we might have some of this information split across different versions of the URL than we do indexed, so that can include things like links to your site. It can include things like the index status information, those kind of things. The Fetch as Google render view is an extremely valuable tool that I recommend using regularly. Go, for example, to the search queries feature in Webmaster Tools, pull out the top 10 URLs from your website, and make sure that they really render properly with Fetch as Google so that there's no embedded content that's blocked by robots.txt that it kind of matches what you would see when you look at it in a browser. Mobile is extremely important at the moment. There are lots of people who are using mobile primarily to access the internet, so make sure you can use your site on a smartphone, and don't just look at your home page. Try to actually complete a task. So if you have an e-commerce site, try to search for something within your website. Try to actually order it. See that you can fill out all of the fields that are required, that's it's not a complete hassle to actually go through an order something there. Kind of take it step by step and go from someone who's first visiting your website to actually completing whatever task that you'd like them to do. And finally don't get comfortable. Always measure. Always ask for feedback. Always think about ways that you can improve your website. The whole internet is changing very quickly, and our algorithms are changing quickly. What users want is changing quickly, and if you get too comfortable, then it's easy to get stuck in a situation that everything has changed around you and suddenly your website isn't as relevant as it used to be. So make sure you're kind of staying with the trends. And with that, I think that's it with this presentation.

AUDIENCE: John, why is the Fetch as Google limited when it comes to submissions? And so you've got I think it's 500 submissions for singular pages and only 10 for the larger option, and why is it limited to Webmaster Tools it counts, specifically, because it's not a site.

JOHN MUELLER: I don't know why we chose to do that specifically there, but that's something where we kind of have to reprocess those URLs in a special way. So that's something where we'd like to kind of limited the use there so that it doesn't become like the primary way that content is embedded in Google. And in general, if you make normal changes on your website, we can pick that up fairly quickly. I wouldn't use this tool as a normal way of kind of letting us know about changes on your website, instead I'd use things like sitemaps and feeds to kind of automate that whole process. So that something where they're usually better ways to get this content into Google, but there's sometimes exceptions where you need to kind of manually tell Google to, OK, go ahead and re index this content as quickly as you can. And that's kind of why we have that there and to avoid people from kind of overusing this for things that it doesn't actually need to be used for. We have those limits there.

AUDIENCE: So if you were to set something in your sitemap and to say this page was updated 10 minutes ago, whatever it is, if Google then picks up on that sitemap on a very regular basis, the minute it picks that up, will it kind of perform a similar action?

JOHN MUELLER: Usually, yeah. It's something where if we can trust your sitemap pile, if we can see that the URLs you submit there are real URLs, if we can trust your dates to be correct, then we'll try to take that into account as quickly as possible, and that can in some cases, be seconds or minutes after you submit the site that file. For new sites, for example, that happens really quickly, if we see news content on a site that we know has been submitting good sitemap files with the proper dates in them, then we pick that up within a few seconds. So that's something that is fairly automated. It works really well, and it's a great way to kind of get the new content into the search results.

AUDIENCE: And so if you were to incorrectly index something to say these 20 pages will get updated weekly but really it's more monthly or sometimes six monthly, is there a signal that kind of distrusts your sitemap?

JOHN MUELLER: It's not so much that we kind of have a signal about your website and say we don't trust you on that, but if we look at a site map file and we see all the pages on the whole website change just 5 seconds ago, then chances are your server is kind of just sending the server date, and that's the kind of thing we'd be looking for. If we look at the site map file, and we say, well, this kind of matches what we expect from this website and there are some new pages there, then that's kind of a really strong signal for us to say, OK well, this time around this site map file looks great. We should trust it. We should pick up this new content. And that's kind of what we're looking for. That change frequency is something we don't use that much from a sitemap file so if you can, just submit the actual date of the change instead.

AUDIENCE: Excellent. Thanks, John.

AUDIENCE: Just a very quick question on the sitemap bit. Do you use priority in sitemaps? So I should prioritize pages that you think is more important on your website than others. Is that something you use at all?

JOHN MUELLER: I think we don't use that at the moment for web search. I think it might be used for custom search engines, but I'm not 100% sure. But for web search, we've gone back and forth about using that. We've looked at ways that people are using this, and to a large extent, it hasn't been as useful as we initially thought, so we're not really using that.

AUDIENCE: OK. No, that's perfect. And just very quickly on RSS feeds. Is there anywhere in particular that you kind of advise to upload that feature? You don't upload it as a sitemap do you within Webmaster Tools, that sort of thing?

JOHN MUELLER: You can submit it as a sitemap in Webmaster Tools. What I'd recommend doing if you can do that, if you have a really like fast-changing website is make sure you're using pubsubhubbub as well. So pick a hub and make sure that you're kind of setup on your CMS supports pubsubhubbub so that you can push the content that way, because with pubsubhubbub, you're essentially telling us every time you make a change on your website then we can pick that up immediately.

AUDIENCE: No, that's perfect. Thank you.

JOHN MUELLER: All right. Let's go through some of the submitted questions and a bunch of these that people voted on as well. You spoke of duplicate content within the website. What about duplicate content on a site that has copied or scrape view. I realize we should attempt to try DMCA removal request, but is there any way Google can better determine the original content? We do try to do that to some extent. Sometimes that works really well. Sometimes it doesn't work so well. Oftentimes, when I see it not working so well, I see that the site that's being copied has a lot of other issues as well. So it's always a tricky situation. If we run across one website that's essentially where we can tell that actually this is the original content but this website has so many bad signals attached to it that we don't really know how much of this content we should trust, how much you trust this website. So that's something where if you're seeing scrapers ranking above you, I'd just double check to make sure that your website is doing things right as well and maybe use a DMCA tool if that's something you can use. Check with your lawyers, your legal team. If you can't use a DMCA or if something just doesn't work right with the ranking where you're seeing these scrapers rank above your site and you're actually doing everything properly, then that's something you can definitely always send feedback to us about so that we can take a look and see what we should be doing different here. That's always a tricky situation. I know our teams are looking at this kind of a problem as well and seeing what we can do to kind of make that easier for the web master. Links added in the source code, but not present on the front end. Does Google consider this as a backlink? Is it something spammy or a way of connecting to your website? So I think the question is more around like hidden text, hidden content, hidden links, those kind of things. In general, we recognize when text or content is hidden and we try to kind of discount it for web search. The same applies to links essentially. So if these are hidden links, then I wouldn't count on those always being used the same as something that's directly visible. So if this is a link that you want to have counted for your website, then that's something I'd make sure that within your website or however you're linking that is something that's actually visible to users, that's usable for users so that users can use those links as well and it's not just something that you're kind of hiding in your HTML code for technical reasons, so make sure it's actually something that works for your users. Is it true that Google is in the final stages of releasing Penguin. Would you estimate it's days or weeks away now? Yes, we're working on it. I estimate probably like a few weeks, not too much longer. But as always, these kind of time frames are really hard to judge because we do a lot of testing towards the end, and we need to make sure that things are working right before we actually release these kind of changes. So that's something where if it's not released yet, then I can't really promise you that it will be released whenever. I know it's pretty close and I know a lot of you are waiting for that, so soon, but not today. When rebranding, would you recommend going through backlinks in requesting the anchor text if it's a brand name in the URL we changed in the new brand name and domain? We definitely recommend making sure that the links change to your new domain as much as possible so that we can forward kind of page rank or the value of that link on specifically to the right URL directly instead of having to jump through redirects. Also for users, when they click on those links, the redirect, sometimes, for example, if they're on a mobile device takes fairly long to get processed and kind of go through. So if you can have those links go directly to your site, if these are important links, then that's something I recommend checking in with them and having them update. Whenever I make a change to any of the content on my site, I see a drop in performance often a few weeks before I'm back to where I used to be. Why is this? When you make significant changes on your websites specifically around the layout and the text [INAUDIBLE] of that, I think that's normal because we have to reprocess everything, especially if you're changing things like your internal linking structure. If you're switching to a slightly different CMS, then that's something where we essentially have to really understand the whole website again, and that would be normal to see changes and fluctuations in search because of that. If you're just fixing typos or changing small text pieces on a page, then usually that shouldn't be something where you'd see fluctuations in search for. So it kind of depends on what kind of changes you are making there. For small textual changes, I wouldn't worry about that causing any kind of fluctuations unless, of course, you're removing something that people are very desperately searching for. So if you have a page about blue shoes and everyone is searching for blue shoes because they're really popular at the moment and you change that page to talk about green t-shirts, for example, then, of course, you're going to see some changes there because people aren't finding what they're looking for. Our algorithms can't confirm that this is actually a good page anymore. We have to understand it again. Why is my site crawled every week? How often should it be crawled? Well, that kind of depends on your website and how often you're changing your content. The important thing to keep in mind is we're not crawling the sites, we're crawling pages. So usually we kind of differentiate between the types of pages on our site. We'll try to pick up a home page maybe a little bit more frequently than some of the lower-level pages especially when we can recognize that something has to change for a really long time. So if you have an archive from 1995, chances are those articles aren't changing that regularly and we don't have to recrawl them every couple of days. So maybe we won't recall those every couple of months since then. On the other hand, maybe the home page has current news events on it, so we'll have to recrawl it every couple of hours or every couple of days. So that's something where there isn't any fixed time where I'd say this page should be recrawled this frequently. It really depends on your website, on the number of changes you make better, but also, to some extent, also on how we value your website and how important we think it is to pick up all of these changes. So we'll sometimes see that with like lower-quality blogs that are essentially just like resharing copied content from other sites already that sometimes it will happen that we'll kind of slow down our crawling of those sites just because we're saying, well, there's nothing really important that we've missed here if we went to crawling every couple of days instead of every couple of minutes. So that's something where we'd probably adjusted our crawling scheme, but if you have a normal website and you have regular changes on there, you're using something like sitemaps or RSS to let us know about those changes, then we should be kind of keeping up with that and trying to keep up with the crawling there. The relationship between the usability and SEO. I guess that's really kind of a big and almost philosophical topic. Essentially if your website isn't usable for users, then users aren't going to recommend it and that's something that will recognize in search as well and try to kind of reflect in search, but it's not the case that we read any kind of usability tests on normal web pages to say, oh, this has, I don't know, wrong text color, therefore, people can't read it directly, and we should be kind of demote it in search. So that's something where there's more of an indirect relationship there. What are some current SEO tactics as per the latest algorithm updates? I guess the best tactic, if you will, that is still current is having good content. Go ahead, Joshua.

JOSHUA: I think maybe that's tactics in quotes. Latest tactics.

JOHN MUELLER: I think, I mean, the good parts about all of this at the moment is that if you're using a normal CMS system, then the technical foundation for your website is probably pretty sane, and it's not something where you'd have to apply any kind of special tactics to kind of get that into Google. And the actual tactics on ranking better are more about like finding ways to create really good and useful content, finding ways to be timely in search, finding ways to kind of provide something that users value, and that's not a technical tactic, if you will. There's no step-by-step guide to getting there. That's something where you have to work with your users and find the right solution. Is it true that any changes you do to site, disavow bad links, for example, won't come into effect before the next algorithm update? Disavow file is processed continuously, so that's taking into account with every change that we do on our side. Let me see if there are any higher--

AUDIENCE: Rodney had a question.

AUDIENCE: John, can I ask one?


AUDIENCE: Let me just throw this doc into the chat, if I can. Is someone playing squash in the background?

JOHN MUELLER: Ooh. Ken, let me mute you out for a second. I can't seem to mute you. OK. Go ahead.

AUDIENCE: OK. I just pasted a URL, a doc into the chat, and it's something that I've asked before, and Gary has asked before in relation to hreflang and secure. So I wondered if given that we have two versions of each now on four sites, as per that diagram, can we effectively cut out the middle and not bother with-- I don't know if everyone else can see that, but I know other people have have asked similar questions before.

JOHN MUELLER: Yeah. Oh, gosh. Yeah, I wanted to make a slide on that. Yeah. I know someone who is working on a blog post--

AUDIENCE: I did one for you.

JOHN MUELLER: I know someone who is working on a blog post around hreflang and canonicals, so I think that would apply there as well. But I think in a case like this, essentially what you have is your two sites. I'm just going to assume like one is US, one is UK.

AUDIENCE: No, it's all US. We're only using the hreflang as a kind of work around as you know. They're both US so we just had the one site which was previously subject to some unknown algorithm issues unless you have an update for me on that. But moving with hreflang to also a US-based site helped that. But then the secure thing was releasing, if possible, use secure. So we did, but we actually saw a drastic drop when secure was released and the same as when Barry Schwartz was the call a couple of weeks ago and he said his secure saw it. He showed you his Webmaster Tools if you remember, and we had very similar percentage drops. We're now actually considering unwinding that because it's not recovered within the last 30 days. After moving to secure we've lost another 50% of our traffic, but it's just not covered. So we're thinking of unwinding that. But aside from that, we think there's no point in having four sites in the mix when we could just have two and go from the original to the secure new rather than the original to secure then over to new non-secure then over to new secure. Because it's just more work and more processing for-- the more steps the more that Google can in one way or another misunderstand what we're trying to do whether rightly or wrongly.

JOHN MUELLER: So essentially with the hreflang, you need to make sure that you're doing it between URLs that are canonical. So for example, if you have one URL with a parameter and one URL without a parameter, and you say you're rel canonical or you're a redirect is going to the one without the parameter, then that's the one you should be using for the hreflang pair. So essentially if you're using redirects that point to one version of your site, then that's the version you should be using kind of between the hreflang settings there.

AUDIENCE: These four are absolutely identical, just so you. They are absolutely identical apart from HTTP versus HTTPS. The content is 100% identical. So going from one to four, given what you said there would make sense rather than going from one to two, from three to four and then two to four.

JOHN MUELLER: I'm trying to visualize how--

AUDIENCE: I may draw you a diagram for you so you can visualize easily.

JOHN MUELLER: I see some of that but I'm not actually sure how it's actually implemented at practice, but essentially you just need to make sure that you're kind of connecting the hreflang between canonical URLs because if we see an hreflang tag pointing at a URL that we're not indexing like that, then essentially we ignore that hreflang tag. So it really needs to be between the URLs that we're actually indexing. So if you're using 301s or rel canonical to point to one version of the URL, then that's the one you should be using for the hreflang param, and if you're not using 301s, if you're like keeping one version like this and the other version is the other canonical, then using the hreflang directly between those two is fine as well. But it should just be the one that we're actually picking up for search and indexing like that.

AUDIENCE: Right. Which I believe is--

JOHN MUELLER: I'll double check your sites because I'm a bit confused which one is which and how you have that actually set up. If you want, feel free to send me a note on Google+ and I'll--

AUDIENCE: Yeah, I'll send another email with what we-- because I know I answered your last one. So I'll send you another note of what we did, but I think it's come home again because of the secure issue, and I was hoping Barry will be here so he could say, but if anyone else in the chat, in the call area has had the same issues with secure. I haven't seen it on any forums.

JOHN MUELLER: Yeah. So we looked at a whole bunch of clients.


JOHN MUELLER: Yeah. We looked at a whole bunch of sites because we thought it seemed strange to hear this from a handful of people at the same time, but for most of them, it's actually doing the right thing. So I think there might be some kind of quirkies in some of our algorithms there's still, but on a large, I think, moving to HTTPS should just work out for most sites. But I am happy to take another look at how you have yours set specifically because that sounds like it's not the typical hreflang or canonical or secure site move. Yeah.

AUDIENCE: Right. I mean, if all else, they equal the boost that you would normally get from secure I assume is negligible anyway if everything else is normal, so it's better to unwind it than going the 50% traffic back, because you're never going to gain 50% by having secure versus non-secure.

JOHN MUELLER: Yeah. I think if you can confirm that it's really from that then that definitely makes sense, or it's something where you can say, well, I will just role this back to the moment and reconsider it maybe in a couple of months. When I see that other people are posting lots of good experiences about that move, that might be something to do. I think that's always a sane approach when it comes to these type of issues.

AUDIENCE: OK. I'll drop you an email. with diagrams.

JOHN MUELLER: Great. Yeah, I'll double check the pages. OK. So with that, I think we're a bit out of time. Thank you all for joining in. Lots of good questions. Lots of good feedback. I hope I'll see you guys again in one of the future Hangouts as well.

AUDIENCE: Thanks, John.

AUDIENCE: Excellent Hangout. Thanks, John. Have a wonderful weekend.

JOHN MUELLER: Thank you, everyone.


JOHN MUELLER: Have a good weekend. Bye.

AUDIENCE: Thanks so much. | Copyright 2019