Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video
JOHN MUELLER: OK.
Welcome everyone to today's
Google Webmaster Central
office-hours hangout.
My name is John Mueller.
I'm a webmaster trends analyst
here at Google in Switzerland
and part of what we do is talk
with webmasters and publishers,
like the ones here in
the hangout and the ones
that submitted a
bunch of questions.
It looks like there are some
technical issues with the Q&A
stuff so what I'll
be doing is first
going through the ones that
were posted on the event listing
and then going through
everything else.
But as always, before
we start with that,
for those of you who don't
come into these regularly,
is there anything on
your mind that you'd
like to get started with?
Any open questions, problems,
comments you might have?
Nothing?
Oh, I'm sure it'll come up.
BARRY: You guys
are perfect, so--
JOHN MUELLER: All right.
Thank you, Barry.
I'm glad I have that on record.
OK.
So let's see where we can get
started with the questions.
We heard sites that are
getting multiple DMCA hits
will suffer in ranking.
We're a UGC site and we average
four to five DMCAs every month.
Is that number OK or is there
a concern in terms of ranking
of the website?
So we don't have
any specific numbers
on number of DMCA
requests that are OK,
and the number that's too much.
So that's not something where I
can really say that much about.
It seems compared
to the sites that
are shown in the
Transparency Report
that four to five DMCAs is
pretty much on the low end,
so that's probably not
something that critical.
Regardless, I'd still respond
to these as appropriate
and make sure that you're
taking the right steps
to prevent these going
forward in the future as well.
Does Google give any
preference to internal links
from navigation over
ones on a web page?
And if you have multiple
links on the same page going
to the same internal page is
that a priority of which link
is more relevant?
We do try to understand
the structure of a page,
in general, when we
crawl and index pages
to understand the difference
between the boilerplate-- So
the part of the page
that doesn't change
so much across a
website compared
to the main primary
content of the page--
but I think with regards
to internal navigation,
if you have a website
that has a normal kind
of internal navigational
structure with a menu
or with related links,
those kinds of things, then
that's not something where
you need to worry about where
you have your link placed.
And having multiple
links to the same page
can be perfectly fine.
So that's not
something where I'd
say you need to
artificially tweak
the web structure
of your site so
that it matches esoteric design
that you think is optimal.
As long as these links
are not no follow,
as long as there's kind of a
normal structure of a website
there, ideally with maybe a
hierarchy with higher level
pages, lower level
pages, and categories,
those kind of
things, then that's
not something where
I'd really worry
about where those links
are actually placed.
hreflang question.
Let's say a site CMS pushes
up multiple copies of pages
to country specific directories
and each has a correct hreflang
tag but the content has
not been translated,
how will Google react
or treat to those pages
if it's all in English,
but the hreflang tag
says it's in Spanish?
I think we do try to have some
kind of protection in there
for these kind of issues.
What will probably
happen is we'll just say,
well, these pages
are all the same.
We should treat them as one page
instead of as multiple pages.
But in general, if you know
that your site is doing this,
I'd try to prevent that.
And really make sure that
you're using hreflang properly
so that when we
look at your site,
we know we can trust the
hreflang markup on your site
in general.
If I spend a lot
of time improving
a particular part of my site,
for example, product pages,
can these rank really
well separately
from the rest of my site or
will they never rank highly
if there are issues
elsewhere like on my site?
So yes.
Specific parts of your
site can rank individually
from the rest of your site.
For the most part, we
do try to understand
the content and the
context of the pages
individually to show
them properly in search.
There are some things where
we do look at a website
overall though.
So for example, if you add
a new page to a website
and we've never seen
that page before,
we don't know what the
content and context is there,
then understanding
what kind of a website
this is helps us to
better understand
where we should kind of start
with this new page in search.
So that's something
where there's
a bit of both when
it comes to ranking.
It's the pages individually,
but also the site overall.
We're a UK site but
we use CloudFlare CDN.
When I check online, it says my
website is hosted in America,
even though we have the
servers in the UK as well.
So what do I need to do there?
If you use normal
geotargeting setup
either by having a country
specific top level domain
or by setting it
in Search Console,
then we'll figure that out.
So that doesn't really matter.
Lots of sites use content
delivery networks,
and depending on
your location, you
might see a different
server location as well,
and that's perfectly
fine with us.
So I'd just make sure that
Search Console is set properly
or that you're using a
country code top level domain.
We've been trying to
improve our descriptions
and added useful content
around our product pages.
Can this affect the rankings
of the parent category pages
as well?
Sometimes, but in
general not so much.
So if we're just looking at
the content of these pages,
then we try to look
at them separately.
Obviously, if we see
overall that lots of people
are recommending your website
because of something lower
level on your website, then
we'll say, well, in general,
this website seems to
be well-recommended.
Maybe we should trust it more.
We use tabs on our
product pages for UX.
I know Google discounts
anything that's hidden.
We have lots of useful
info, so should we
use htags as titles,
which contain
hidden content to help improve
these since it's all relevant?
Essentially, we try to
ignore that if you're
using hidden content.
So we would find that probably
when we crawl your page,
when we look at the html,
but when we render it,
we'll notice that it's
kind of hidden content,
and we won't give
it as much weight
as we would something
that's visible.
So if you have
specific parts that
are important for
this page, I'd just
make sure that they're
visible by default
when this page is opened.
Also, so that users when
they go to this page,
they see kind of what
they were searching
for in the search results.
On the other hand, if you
have auxiliary content
that you think isn't
that critical for a page,
then maybe putting it in a
tab like this makes sense.
FEMALE SPEAKER: John?
John?
JOHN MUELLER: Yes.
FEMALE SPEAKER: Can I just ask
a quick question on that point?
JOHN MUELLER: Sure.
FEMALE SPEAKER: Say
for instance you're
using effectively
hidden content,
but it's great from
a UX perspective,
and then you
decide, well, Google
doesn't like hidden content
and you get rid of it.
And then the conversion
just drops through the floor
because people, humans,
actually like it.
If you put it back,
you're not potentially
putting yourself in a kind of
endangering position, are I?
In that Google would
just literally say,
well, I'm not going to
take it into account.
But it's great for humans.
It's obviously great just
getting that balance, isn't it?
JOHN MUELLER: Yeah.
I don't know if you could
argue that hidden content is
great for humans.
But in general in that situation
where you remove it and you add
it back, we would just
treat it algorithmically
like any other
content on the site.
So in particular what we
found is really problematic
is when you're
searching for something
and you find the
snippet, and you think,
oh, this matches what
I was looking for.
You click on the
results, but the page
that the website
shows you doesn't
have anything about what
you were searching for.
That's kind of the
tricky situation
that we're trying to
prevent with treating
hidden content like this.
So if it's-- I don't know--
you're searching for a hotel
and you click on this page,
and all you see is information
about car rentals, for example.
Because hotel is in a
separate tab that's hidden,
you don't actually
see that on the page.
So having a user
actually figure out,
well, if I just do the right
thing on this page, then
actually I'll see the content.
That's almost-- I don't know--
a hard step for a lot of people.
So if like the hotel
content is important,
I'd just put it
on a separate URL.
You can still use a
tab navigation on top.
But just make sure it's a
separate URL with that content
visible by default.
FEMALE SPEAKER: So effectively
you could have tabs,
but you could have it so
that the URL was connected
to from the original content?
JOHN MUELLER: Exactly.
FEMALE SPEAKER: OK.
But also, say for
instance you had
like a stepped
process for somebody
which was instructions
on how to use the site,
and that was a reasonably good
user experience, easy to use
for humans, but you didn't want
to include all that content
for Google, literally,
because it perhaps
has too much information
and you don't
want to be at risk of
being overoptimized.
Is that acceptable?
JOHN MUELLER: Yeah.
FEMALE SPEAKER: OK.
Well, thank you.
ROB: John, in that
scenario though, you
can't control what you rank for.
If someone is searching for
hotels and you show a result
for hotels, but when you
click through it's showing car
rentals, is that your
fault or Google's?
JOHN MUELLER:
That's kind of what
we're trying to improve by not
valuing that hidden content so
much.
So that-- From our
point of view, any time
a user clicks on
something and doesn't
find what they were
looking for, it's
our fault. By default,
it's always our fault.
Even if the website got
hacked during the time
we tried to crawl
it or whatever,
we think it's essentially
always something
that we should be able to
handle a little bit better.
But in a case like that when
you're looking for a hotel
and you get a car rental page,
then that essentially means
we weren't ranking things
the way that we should,
and that sometimes leads to
decisions like this where we
say, well, actually
this hotel information
was in a hidden tab.
Maybe we should have
devalued it a little bit.
And if we find a better match
for that hotel, we'll show it,
but if we don't
find a better match
and this is essentially the only
content we have for that hotel,
and it's in a hidden
tab, then we'll
just show it anyway,
because that's
the only source of this
information for that user.
ROB: Oh.
JOHN MUELLER: Let's see.
Some more questions here.
My company name is
actually three words,
which make up a phrase,
for example, Big Red Box.
So my website
within the content,
should I be referring to
my company as Big Red Box
or as one word BigRedBox,
as that's what the URL does?
So will Google recognize both
of these as being my company?
Essentially we try
to recognize kind
of like a phrase that
belongs together.
When we see that working
together within a website,
we'll try to recognize that
and treat it as one thing.
So it's not something
where you'd artificially
have to use your domain name
instead of your company name
when you're talking
about your company.
When a user explicitly wants to
go to my site via Google search
on mobile, does it count
towards the quality of the site
the same as if he
wanted to see my website
in the desktop search results?
Are browsing data from Chrome
browser used in web search?
As far as I know, we don't
use that in web search.
So that's essentially
something where
you have to interact
with your users
and if users are happy with
the content that they find,
then that's essentially
between you and the user.
And we kind of see the
more indirect effects
when people are, for example,
linking to the content.
Is there any SEO benefit of
using the title attribute
on links?
Is that a ranking signal?
Does it contribute in any way?
I don't think we use the title
attribute at all for the link
anchor text there.
I think what we do
use is if you're
using an image, for
example, and you
link that, then we take the alt
text out and try to use that.
But I don't think there is any
SEO benefit of using a title
attribute there.
So it definitely wouldn't
make sense to say, well,
here's a link with
this anchor text
and then within the atag also
add the same thing as a title.
I don't think that would give
any additional value there.
FEMALE SPEAKER: Can I ask
another quick question, John,
just on that particular subject?
JOHN MUELLER: Sure.
FEMALE SPEAKER: Because
it's something actually
that I've been thinking
about quite a bit.
You know that we've got Image
Bot that collects images,
and we've got the normal
Google Bot that crawls
text, if you'd like, webtexts.
So the image is collected
presumably by Google Image,
but the other week you
mentioned the alt text,
is that collected by
Google Web or is it
collected by Google Image?
Because you mentioned that the
actual alt text can be part
of the actual rendered content.
So is the actual benefit
actually in the fact
that it's Google
Bot versus Google
Image that takes that back,
and the same with text,
the title, potentially,
on links, so it's actually
part of the actual rendered
code, as such, the content?
JOHN MUELLER: In practice,
we'd combine that.
So we take what we crawl
with Google Bot Web, which
will be the text
part of the page
and combine that with
the image that we
crawl with Google Bot Image.
So it's not that just
like one of these crawls
ends up being in the
image in the index.
We essentially
take both of them.
So especially for
image search, we always
have to combine the image
with a landing page.
We can't look at them
individually and say,
well, we're just
focusing on the image,
and we don't care
about any landing page,
because we need that extra
context from the landing page.
And that includes the alt
text, that includes things
like captions, additional text
on the page in that section
of the page, all of that.
FEMALE SPEAKER: So in effect it
actually adds further context
to the content overall?
JOHN MUELLER: Yeah.
FEMALE SPEAKER: Effectively
on the whole page
by adding the alt tags.
[INAUDIBLE]
JOHN MUELLER: Yeah.
FEMALE SPEAKER: Right.
Thank you.
JOHN MUELLER: Sure.
I'm working on a website
and no matter what I change,
the schema rating stars aren't
showing in search results.
I've tried to use software
application reviews.
Nothing works.
At the same time,
the schema test tools
show that all is
OK on the website.
So when it comes to structured
data and which snippets
we essentially look
at it on three levels.
Primarily, on the one hand, it
has to be technically correct.
So that's something that you're
testing with those testing
tools.
On other hand, it has to be
compliant with our policies
so that you're marking up
the right type of things.
For example, one type of
issue that we sometimes see
is that people will take
a random piece of content
and mark it up as a recipe, and,
of course, it's not a recipe.
So that's the kind
of thing that we
look for, our
algorithms to look for.
And finally we have to be kind
of trusting of the website
overall.
So it kind of has to
reach a certain quality
bar from our point of view.
So that's something that
might be in effect here.
I'd probably have to check
the website directly.
But that's something
that is generally
the issue when we
see people say,
well, I have implemented
everything properly.
I've implemented it
in the right way,
but it's still not showing up,
then usually it's just a sign
that you need to work on
your website overall to kind
of really give it an extra
boost in terms of quality.
When should you have--
MALE SPEAKER: The
website in question--
JOHN MUELLER: Yeah, go ahead.
MALE SPEAKER: The
website in question
is already a few
years available.
[INAUDIBLE] Lots of traffic
here right now from Google, so--
JOHN MUELLER: I can
barely understand you
from-- It keeps breaking up.
Maybe you can add it as
a comment in the chat
and I can pick it up there.
But we can get back to that.
I have a travel blog with
a lot of old entries.
If I create a new site and link
to older posts at a new site,
is that OK?
Will Google think that it's a
manipulation of its algorithm?
In general, what I would
do if you are setting up
a new site is just
do a normal site move
so that you're actually
redirecting from your old site
to the new one.
So that's probably what
I'd try to do there.
If you can't do a
site move, if you
want to keep your old content
separately, then linking to it
is perfectly fine.
That's not something
where I'd say would
be particularly problematic.
We've now appeared
in press releases.
Somehow they forgot to
write the back link.
We've asked them to do so.
We didn't pay for that,
and now it appears.
What will Google think?
So in general, I'd recommend not
using press releases as a way
to build links, because
that's something that has
been abused a lot in the past.
And our algorithms and
our [INAUDIBLE] team
are generally
trying to recognize
those kind of situations
where essentially you're
creating the content
and publishing it
on other people's sites.
So that's something
where I'd really
try to avoid using press
releases as a way of kind
of artificially building links.
We did a site redesign,
at the same time
did a migration to
https just last year.
But since the switch, the
impressions and the click data
in Search Console has
become nonexistent,
yet we haven't seen any
loss in organic traffic,
and we're still number
one for our brand.
So what's up?
This is actually a really
common thing that a lot of sites
run into when they
move to https,
and it's not really
a sign that you're
doing anything particular
wrong or that your website is
penalized or anything.
It's essentially just that
Search Console is looking
at sites on a per URL basis.
And when you move to https,
that's a different URL.
So you have to verify the
https version of your site
as well in Search Console
so that we can show you
the https specific data there.
So probably everything is fine.
It's just a matter of adding
that site to Search Console
and looking there.
ROB: John, was that
Liam's question?
JOHN MUELLER: Yes.
ROB: Yeah, he posted.
I answered that on
the page, and he
said, yes, he did add the
other domain separately.
JOHN MUELLER: OK.
So probably not looking
in that domain then.
Maybe it's added and you're just
looking into the old version
instead of the new one.
That's really a common thing.
Rob spotted that
right away probably.
It's one of those things
that we should probably
be doing better in the UI.
We've improved our contents,
gained some back links,
solved 404 problems, and
now our site looks great.
How long do we have
to wait until we
see ranking improvements?
We essentially,
for the most part,
we work on adjusting
the ranking in real time
as we recrawl and reindex,
reprocess all of the URLs
that are involved,
so this is something
where you'd see a kind
of a steady change
in rankings over time.
Sometimes there are
bigger steps that
are taken when algorithm needs
to reprocess data for a website
overall.
But in general,
you should see kind
of a steady change
over time regardless
of any other changes.
We had a page on our old domain.
Now we're redirecting it
to a new domain via 301,
but by mistake we didn't
create the content.
Now we have the content but
there's been a two months gap.
Will Google consider
it as a new page?
Probably we'll
still follow the 301
and try to pick that up again.
So that's not
something where I'd
kind of worry if we treat it as
a new page or as an old page.
We'd be able to at least forward
the signals to that new page
and to kind of move
all of the information
that we have to that new URL.
So these things can
happen sometimes.
It's good that you noticed it.
And if you notice that you
might have missed other URLs
like that, I'd just go ahead and
kind of readd that content so
that it's back on your site.
FEMALE SPEAKER: John, can I
just ask another really quick
question?
JOHN MUELLER: Sure.
FEMALE SPEAKER: Sorry everybody.
On that point about the
lifetime between crawling
and the shuffle, et
cetera, and reindexing,
et cetera-- It may have
been asked before--
but what kind of
length of time is that?
So we get-- We analyze
our logs and we see
that Google is crawling today.
How long is it likely
to be before it
goes through the
machine at the other end
and then gets spat out
into a changed result
or reindexed in a different
position, generally?
JOHN MUELLER: Generally,
it really depends.
So for some things we're able
to process these things really
quickly, so maybe a
minute or two even.
For other things, it
takes a bit longer.
Maybe, I don't know,
half a day or a day
depending on what all
needs to be updated.
And that also depends a
bit on the data centers.
So it might be that it's
updated in one data center first
and a couple of hours later
you see it somewhere else.
But it's hard to kind of
give a general timing there
from the crawl point
to when it's actually
reflected in the live results.
And some of the algorithms
might have different update
frequencies there as well.
So I have seen
reports, for example,
that the cache page is
updated a couple days later
and the snippet
takes another day,
but actually the
page actually already
ranks for those terms that
are on those updated URLs.
So that's something where
individual processes
within Google might
take different long.
FEMALE SPEAKER: Is there
ever an instance where
a result gets removed and then
the shuffling around happens
and then it just goes back in?
Because sometimes
I'll see things
that just literally disappear.
And then they come back
in in a higher position.
It's almost like-- It literally
is like a card index file.
You taking them out and
you putting them back in.
You can kind of catch
those times where
it's just appeared for a while.
JOHN MUELLER: That
shouldn't happen.
I tend to take something
like that as a bug
and bring it to the teams here.
But in practice
that should be more
of a fluid change in that
we update the data and then
the new data is there.
It's not that we kind of take
the URL out, update the data,
and then put it back again.
That should kind of happen live.
What sometimes
happens is especially
with really maybe new content
that we picked up really
quickly where we said, oh,
we need to push this really
quickly to the search results
is we'll push it really quickly
to the search results, and
then after a while notice,
well, it's not getting the pick
up that we kind of assumed.
People aren't responding
to it by linking to it.
Those kind of things.
And we say, well, we'll just
put this in the normal queue
for the next time.
So kind of this time
between something
that we pushed as
something that we assumed
would be fairly urgent, but
then noticed, well, maybe it's
not that urgent after all.
We'll put it in the
normal queue and there
might be a small gap
between that sometimes.
But in general, if it's just
a URL that gets updated,
we should be able to
do that on the fly
without actually
things dropping out.
FEMALE SPEAKER: And sorry.
Just one final quick question.
I've been saving these
up for quite a while.
JOHN MUELLER: That's good.
FEMALE SPEAKER: So
say, for instance, you
have an issue with
an infinite loop,
et cetera, so lots of
random URLs in the search
and you're trying
to deal with that.
If you inadvertently
somehow let Google
into a parameter driven
section of the site,
and it literally churns
out lots of links,
but then you 410 them--
Effectively you chop off
that arm, so to speak-- those
links that have been crawled,
do they go into a massive queue?
So effectively, because they're
already sent off down the pipe,
you'll end up with them all
getting crawled at some point?
Because--
JOHN MUELLER: Probably.
FEMALE SPEAKER: --honestly,
I've got none of these left
on a site that I'm
kind of looking at.
There's no-- you know, when I
crawl it, there's no issues.
But when I check the
logs, some days there's
like 50,000 [INAUDIBLE].
And then other days,
it's like 30,000 410s.
So clearly I've got rid of them.
But then they're in a
queue somewhere, randomly.
JOHN MUELLER: Yeah, so I think
there are two aspects there.
On the one hand, kind
of the first time
that we try to crawl
list, they might end up
in a queue like that.
So that's something
where sometimes we
send a message to
the site saying,
hey, we found too many
URLs on your website.
And you look at it and
you say, well, of course.
My website is big.
I have a lot of URLs.
But usually this is because
we ran into a section
where we found a new type
of parameter in the URL.
We don't really know
what to do with it.
So we'll queue
all of these first
and we'll try to go
through them to figure out
is this parameter really
something relevant or not.
That's the one side.
The other part, once
we've crawled them
and we see even that
there's a 4040 or 410,
we'll try to refresh
these over time.
So it might be
that, I don't know,
every couple of
months we'll say,
well, we have extra
capacity for this website.
We could crawl it
a little bit more.
And we have all
of these URLs that
were 404 last time, but maybe
the webmaster changed something
and there's actually
content there.
So we'll go out and kind
of fill up the capacity
that we have for the
website with these URLs
that we think are
probably 404s but we just
want to double check.
FEMALE SPEAKER: But
if you 410 them,
you're basically
saying they're gone.
JOHN MUELLER: Well, the next
time they might come back.
So we try them again.
And the difference between
the 410 and the 404
is mostly when the
content used to exist.
When we know that there
used to be content there,
we've indexed it before,
and if you return a 404,
then we think, well, maybe
this is just a mistake.
We'll double check
again before we actually
remove the URL from the index.
And with a 410, we say, well,
this is a pretty strong signal
that the webmaster
doesn't want this,
so we'll take it
out immediately.
So the 404 410 difference
is mostly with regards
to how quickly it falls out.
But once it's out, we really
differentiate between a 404
and a 410.
So we'll check these again
every couple of months.
And if you have a
lot of URLs, then you
might have this kind of
steady stream of us just
like refreshing these URLs to
double check that we're not
missing anything.
FEMALE SPEAKER: So is there
anything you can do to stop it?
JOHN MUELLER: Usually you
don't need to do anything
to stop it, because--
FEMALE SPEAKER: Well,
I'd prefer Google to be
going in the right places.
JOHN MUELLER: Yeah,
I mean usually we
use this more to kind of
fill up the extra capacity
that we have.
We say, well, we've crawled
all the important URLs
from this website.
We know we could crawl
10,000 more URLs.
So let's just double
check the rest
of the stuff on our list
to make sure that we're not
missing anything.
So it's not that you'd lose
anything by us not crawling
those URLs or you
wouldn't gain anything
by us not crawling those URLs.
It's essentially just
our system saying, well,
there's this big
backlog over here
that we don't really know
what to do with, so let's
just double check it
when we have extra time.
FEMALE SPEAKER: But you could--
So you could effectively
if you see that
happening, you know,
that it's not
crawlable from all the
crawls you do with
crawling tools,
it's an indication that
maybe you're not actually
using your capacity, so
you could grow your site.
You could fill it back up
with good stuff, couldn't you?
JOHN MUELLER: You
can always do that.
Yeah.
I mean the capacity
is or the amount
of crawling that we do per day
from a website on the one hand
is tied to the technical
limitations that we see.
Where we see, well,
if we crawl more,
then your server starts getting
slow, starts returning server
errors, those kind of things.
So that's kind of a technical
capacity issue there.
And usually you don't
want to fill that up,
because if you have Google
Bot crawl as much as it can,
then probably your
users are going
to notice that things are a
little bit slower than usual
as well.
So that's kind of the
thing where usually you'd
expect to have a little extra
room available for Google
to crawl a little bit more.
FEMALE SPEAKER: OK.
That's great.
Thanks John.
Cheers.
MALE SPEAKER: As a
continuation to this question,
would it be a better
to switch a page
that I don't want to see
anymore from 404 to 410?
JOHN MUELLER: I don't think
that would make any difference
with regards to crawling.
Once it's removed from
the search results,
we essentially
treat them the same.
It's just that step from
having it visible in the search
results to it being removed.
That's a tiny bit faster.
And in practice, it's almost
like a theoretical difference
because normal websites they
kind of have this time where
these URLs drop out anyway, and
if that happens one day earlier
or one day later,
it's not really
going to make any
practical difference.
MALE SPEAKER: But you
just said that 410
will be removed
faster, and by that
I will be able to free
capacity for the other parts
of my website.
JOHN MUELLER: Yes.
But normally we should be
able to crawl more than enough
from a website anyway.
So it's really,
really rare that we're
so limited that we need
individual URLs to kind of drop
out faster so that we can
crawl a little bit more.
And that's usually a sign that
your web server is so limited
that it's already
slow for normal users
that your kind of running on
the last part of your server's
power.
ROB: John, would you combine
the signals from any 400 code
with a no index or
something, and then
use that as a
reinforcement signal,
say actually maybe that you
shouldn't bother anymore?
JOHN MUELLER: We don't do that.
ROB: Or will that just
prevent you getting there so
that you would never learn?
JOHN MUELLER: We
wouldn't pick up any
of the content on the 404 page.
So if you have a
no index there, we
would essentially ignore that.
ROB: So you wouldn't
come back to that
if you put it on that
parameter or that folder
or whatever it was.
JOHN MUELLER: I
think if you really
want us to kind of stop
crawling that part of the site,
I'd block in the robot's text.
But in practice for like normal
site changes where you're
adding things and removing some
things, that's totally too much
I think.
I'd just let it return 404.
We'll just recrawl those
parts from time to time
to kind of see that we're
not missing anything.
But I wouldn't worry about that.
FEMALE SPEAKER:
Just on that point,
just to sort of help
with that query.
Those URLs that I
was speaking about,
they are deindexed
now by the way.
Because I deindexed from
like 1 and 1/2 million
URLs to 130-odd thousand.
So they're not there.
They're not indexed.
But they're still
getting crawled.
JOHN MUELLER: Yeah.
FEMALE SPEAKER:
This is what I mean.
There's definitely like
some sort of random queue
that they're in.
So I don't know whether that
adds a little bit of help
for anybody.
They're not there anymore.
They're not in the index.
JOHN MUELLER: Yeah.
I think that happens to
a lot of sites over time.
And sometimes we see sites that
significantly revamped the kind
of content that they have.
Or they'll say, all of my blog
is gone or all of my forum
is gone, and suddenly 2/3 of
their URLs are returning 404.
And from our point of view,
that's perfectly fine.
We'll see that the first
time when we crawl it.
And then we'll
know, oh, probably
they don't exist anymore.
We'll still recheck them after
a couple of months, slower
than we would crawl normal
URLs with like a lower priority
almost.
And that's not
something that you
need to kind of
artificially suppress.
So I'd just let those return
404 and focus on the normal part
of the site instead.
FEMALE SPEAKER: OK.
Thanks.
JOHN MUELLER: All right.
Let me grab some questions
from the Q&A app.
MALE SPEAKER: Can you
get back to my question
about the rich snippet?
JOHN MUELLER: OK.
MALE SPEAKER: It
looks like I have
an internet
connection better now,
so I can ask a little follow-up.
JOHN MUELLER: OK.
Go ahead.
MALE SPEAKER: You said
that the website needs
to have some sort of trust.
How can I measure the trust?
The website in question
is on the internet
for a couple of years now.
It has quite a huge
traffic from Google.
It's really fast.
It has all the needed things
that I can think about.
How can I measure trust?
Because I do know that
this website doesn't
show rich snippet at all.
None.
[INAUDIBLE] no.
Stars for reviews.
And we did try to switch types.
For applications we
tried reviews only.
Each time we left it
for a couple of months,
like half a year,
just to see maybe it
takes some time to add them.
But we don't see them at
all on the [INAUDIBLE].
JOHN MUELLER: So what
you can sometimes
do is a site query
to see if you we're
technically able
to pick them up.
So sometimes we would show
that the rich snippet's there.
That might be something
to double check.
MALE SPEAKER: --I will check.
JOHN MUELLER: Yeah.
So in general, we
don't have any kind
of public trust metric
or quality metric
that webmasters can pick
up and kind of optimize on.
We really try to understand the
quality of a website overall.
And for that, one
thing that you could do
is go through the-- What is it?
23 questions that Amit Singhal
posted a bunch of years
ago now on the Webmaster
Blog about things
that you can ask yourself
about high quality sites.
And take some of
those questions and go
through maybe a group
of people who aren't
related to your website at all.
Have them go through some
tasks on your website,
some tasks maybe on competitors
where you think this the best
website of the same type.
And try to understand
where users
are seeing a difference
between the two websites
and where maybe you could take
a hard look at your website
and say, OK.
This part of my website that I
thought was really fantastic,
maybe people don't
recognize it as being
as fantastic as it could be.
Maybe I need to
focus on that more.
But these are the
things where it's not
a matter of using the right
html code, the right meta tags.
You really have to take a step
back and look at your website
overall, maybe with fresh eyes
to get kind of the hard truth
that usually the
webmaster doesn't
want to hear by themselves.
And I have the same difficulty
when I send things out
for review, I try to get like
the really honest feedback
and sometimes that clashes with
what I thought I would expect.
But that's the type
of feedback that you
need to get as a
webmaster to really
get a fresh look at your site
and to see where people who
aren't connected to your
site might see issues
or might see ways that
you can improve things.
ROB: The Webmaster
Forums are a good place
if you want harsh feedback.
JOHN MUELLER: Yeah.
MALE SPEAKER: Or if you had
some type of zero to ten
score that you could
assign to a web page--
Ten being the highest.
Zero being the lowest-- we
would be able to figure out
how important a page is.
JOHN MUELLER: Well, you kind
of see that already, right?
You enter your preferred
keywords in this box
and then you see a list of
sites that we rank in the order
that we think they're relevant.
So--
MALE SPEAKER: Right.
But you could rank
a site very well,
but not have rich snippets.
MALE SPEAKER: My point exactly.
Our website is ranked very
well for lots of keywords,
especially in
different languages,
but we don't see
rich snippets at all.
JOHN MUELLER: Yeah.
I'd really try to see what
you can do from a quality
point of view overall.
So I haven't looked
at your website.
So maybe I'm totally
off the mark.
But in general, when it
comes to rich snippets, when
we don't show them,
it's really a matter
of us assuming that the
quality of this website
isn't as high as we
would like to see it.
MALE SPEAKER: OK.
Thank you.
FEMALE SPEAKER: John, there was
a-- I stumbled across a Google
Webmaster's guide
to search engine
optimization for starters on
some article the other day.
It's quite old.
I think it was from 2010
or something like that.
There looked to be still quite
a lot of fairly current stuff
in there.
Will that ever kind of be
updated, that document?
It's kind of just really
well-buried actually.
I stumbled across it by
accident the other day.
It was about 40 or 50 pages
long, something like that.
JOHN MUELLER: Yeah.
We should do that.
FEMALE SPEAKER: It had a
lot of really useful stuff
that's actually
is still relevant.
A lot it is really
still relevant.
JOHN MUELLER: Yeah.
I know we did a revamp
a couple years ago.
So I don't know if you saw the
original one or the slightly
revamped one.
But even in the
revamped one, there
are things like all of the
mobile information that
has significantly changed.
So it's not so much a matter
of like your general SEO type
things, but it's more
that the technical details
have significantly changed.
And those are things we'd like
to get updated in that guide.
But I don't know what
the specific plans are
at the moment to update that.
I know it's a topic that
comes up every now and then.
FEMALE SPEAKER:
I'm just thinking
that you know that
probably would
be really useful for people.
JOHN MUELLER: Yeah.
We should do that.
FEMALE SPEAKER: Yeah.
JOHN MUELLER: All right.
Let me grab a handful of
questions from the Q&A,
and then we can open
things up for all of you.
We want to run A/B test
on our landing page
by serving one type of
content to returning
users and a different
one to new users.
Does Google consider this
type of A/B test cloaking?
So essentially we would never
see the returning user type
content because Google
Bot doesn't keep cookies.
So if there's something
useful that you're
showing to returning
users, we would essentially
miss all of that.
So from that point
of view, I don't
know if that's in
your best interest.
And also, this isn't a
traditional type of A/B test
because you're not comparing
the same type of users.
You're essentially
splitting things
up into returning
users and new users.
So that's something where
if you're testing something
specifically with regards to how
people react to your website,
I don't know if you'd
actually see useful results
in a case like that.
Obviously, maybe
there are things
that make sense by
doing it like this,
but if you're doing
a traditional A/B
test to see which
variation is better,
then you're looking at
very different user groups
automatically by
splitting it up like that.
If I was to show an extract
of product descriptions
on a [INAUDIBLE]
page, how should I
do that so it doesn't
negatively impact me,
as the text might be duplicated
on the actual product page?
In general, that's no problem.
So we'll recognize snippets
of text that are duplicated,
and we'll try to find, more or
less, the most relevant page
for that snippet
of text, and we'll
try to show that one
in the search results.
So it's not a
matter of us saying,
well, there's a lot of
duplicated content here.
But rather we say, well,
this specific piece of text
that someone is searching for is
duplicated on these five pages
and from these, this is
the most relevant one.
So we'll show that
one in search.
Trying to improve my product
descriptions for SEO.
I know that you discount
content behind tabs.
I think we looked at this one.
We're an e-commerce business
and have a dedicated website
for all countries where
English is a main language.
All of them use hreflang tag.
Is there any added
SEO benefit in terms
of rankings in serving unique
content in each one of these?
Any additional SEO
benefits-- Essentially
what would happen in
a case like this is we
would rank all of these pages
individually and depending
on where the user's
location is we'd
swap out the URL shown in
the search for the best
matching one.
So it's not so much that you'd
have a ranking difference here,
but rather we try to show
the most appropriate one
in the search results.
And if all of these URLs
are really exactly the same,
then you probably
don't have any value
in actually creating these
pages in the first place.
So that's something.
Maybe it's worth reconsidering.
But if you're changing
these pages subtly.
If you have different addresses,
different phone numbers
on there, different currencies,
different prices, then that's
obviously a good
use of the hreflang.
So the text itself
might be the same,
but the metadata on the
page might be a little bit
different.
But again, hreflang doesn't
change the rankings.
It only makes sure that the
most appropriate one is actually
swapped in at the place
where it would rank.
Why is content and
mass produced URLS
from hacked domains,
which Google even
identifies as being
hacked indexed,
sometimes over the original?
The links from these sites
to legitimate domains
is also counted indexed
in Webmaster Tools.
So with regard to the links,
I guess the tricky part
there is we do show
links in Search Console
even when we say that we're not
passing any page rank there.
So that might be
something that you
wouldn't need to worry about.
In general, we are aware of
all of these different types
of hacking, and we do try
to work to prevent them
from causing issues in search.
And sometimes that works
a little bit better.
Sometimes we have to
do more manual work
to actually clean that up.
But I know the teams are
working on improving this,
and they regularly
work on improving this,
because it's always this
kind of constant battle
between normal websites and
those that are getting hacked.
In general, I'd really
take this as a tip
to make sure that your
website, when you put it out,
is made in a way that
doesn't get hacked that easy,
that you're really
on top of things
when it comes to
updating your CMS,
that you're on top of things
when it comes to making sure
that you have the security
settings right when security
alerts come out, that you're
really paying attention
to what's actually
happening with your website.
But we do try to
figure out what we
can do with these kind of hacked
sites a little bit better.
One thing that I've been
talking about with the team
is that maybe we'll put together
a special form where people
can submit sites that
they think are hacked
and that might be
something that we need
to look at from a
manual point of view
or that we need to take
into account with regards
to our algorithms when we try
to deal with hacked content
to recognize it automatically.
All right.
Let's open it up for more
questions from you all.
We have a couple minutes left.
What else has been on your
mind this week or recently
or at all?
MALE SPEAKER: Hey, John.
JOHN MUELLER: Hi.
MALE SPEAKER: Hey,
good to see you.
I got a specific question
that just popped up when I
was browsing earlier this week.
So Google has its "no
splash screen on mobile
for downloading the app" policy.
When I browse
m.yelp.com, they take me
to a splash screen that's
not really a splash screen.
It's the website moved down.
So you see the
splash screen that
would be blocking
first click free,
but in fact, if you click
on Continue to the Site,
it just scrolls you down.
Is this OK?
Or something that can be
implemented more widely?
Something that just
Google's ignoring?
Because they're doing
essentially a splash screen
that takes up the full screen,
but it's on the same page.
JOHN MUELLER: Yeah.
We've looked into
a number of these,
and we'll probably try to
have to figure something out
on how to react appropriately
to issues like that.
So it's essentially trying to--
So I don't know specifically
what Yelp is doing that
you're seeing there,
because I don't
use Yelp that much.
But it's essentially
something where
people are trying to
get around our policies,
where we say, well, we
don't want you to run
an app install interstitial.
And they tweak it so that
technically it's not really
an interstitial, but actually
from a user interface
point of view, it does look
a lot like an interstitial.
So those are the
type of things where
we're considering maybe taking
manual action and saying,
well, we need to flag
this as an interstitial,
even if it isn't flag
from our algorithms.
But I think that the
teams are definitely
aware of these
kind of situations
and they're trying to figure
out how best to respond to that.
So I'd definitely
wouldn't use something
like this as an
example of something
that you should copy as well,
because just because one
person is kind of getting away
with doing something sneaky
doesn't mean that it's something
that you should copy as well,
especially when you
recognize that they're
kind of trying to sneak
their way past this policy.
MALE SPEAKER: And is there any--
I know you can't say specific,
but where does that
gray area kind of start?
Because with exit
intent, scrolling down
on the same page-- There's
a million ways to do this,
and some of them are more
invasive than others.
What would you-- If I think
it's OK for my experience,
that's fine.
Or would you say try to
follow the intent of the law
from Google?
JOHN MUELLER: I don't
know where we would
draw the line specifically.
So I have seen some blog
posts, drafts, go around
about that internally.
And for some of that, maybe
we can give some more guidance
on what specifically
to look for.
But I don't know
if we have anything
specific to announce just yet.
ROB: John, can I ask a question
following up from the last one
before that, about the
duplicate snippet on a product
page versus its category.
JOHN MUELLER: Sure.
ROB: I'll give you
a URL in the chat,
but it may well be
that the answer is
it's just your site, Rob.
Because that particular result
is for a little snippet of text
that's on the product page to
say that gift certificate is
for that specific person.
It's for one participant,
and it's for that experience.
And yet, the first
four or five results
are the category pages
for that product,
and nowhere on
that category page
does that snippet appear at all.
It's just kind of the page
above in the hierarchy
to find product.
And so that doesn't
seem to be right at all.
[INAUDIBLE] because it's
our site I can't even--
JOHN MUELLER: No, no.
No, no.
It's not everything
about your site.
ROB: So almost everything?
JOHN MUELLER: So I would
have to kind of double
check what's actually
happening there
where we're picking that up.
But my guess is some of that
snippet was on those category
pages and because your
category pages are pretty
prominent within a
website, in general,
then we assume that they're
more relevant for something
like this.
But--
ROB: They've never appeared
on a category ever.
They're in part
of the participant
and when you go
to a product page,
they're in the little tabs, a
bit like we were talking about
before.
[INAUDIBLE] the tabs-- who
it's for, when it's for,
the availability, all of those.
It's never ever appeared
on those pages ever.
JOHN MUELLER: Yeah.
I mean, it's a
fairly long query.
So if you're taking
like a full sentence,
then some of those words
might be things that we say,
well, this isn't that relevant
for this specific query
or we can rewrite
this slightly and then
try to find the best
matches for that.
So I'd assume--
ROB: It doesn't look for
an exact match first?
JOHN MUELLER: No, no.
It tries to find the
most relevant one.
If you want to find
an exact match,
you'd have to put it in quotes.
And--
ROB: If you do, then it goes
straight to the right place.
JOHN MUELLER: So I don't know.
So one thing I'd generally
try to do in a case
like this is figure
out is this something
that really users are seeing
or is this something that you
ran across because
you're kind of creating
this content yourself and you
know exactly what it should be.
And kind of looking
at the search query
information in
Search Console to see
does this really kind
of match an issue
that people are
actually having or
do I just think it's an issue
but nobody actually sees it.
ROB: Well, it's because
we're constantly
searching for different
things from our site
just to see if the page
is appear anywhere,
because you know how
closely we're trying
to monitor what's going on.
So any kind of-- I've
never seen that before,
that kind of behavior,
where obviously you're
seeing it as part of our site.
So you know that it's us,
because the first five
results or six are us.
So knowing that, rather
what you're really doing
is saying we don't like
the product page it seems.
JOHN MUELLER: I don't know
if we could say that we
don't like the product page.
It's just that probably
what we're seeing
is that the category
pages are more
prominent within your
website so we give them
a little bit more weight.
But I don't know.
I'd have to take this query to
the search quality folks just
to kind of see what
specifically we're picking up
with regards to ranking there.
ROB: But if you take
it to them, they're
just going to go,
oh, not again, John.
JOHN MUELLER: I can take
it to different people.
There are lots of
people at Google.
Don't worry.
But my guess is
just that this is
too artificial of a query
for them to really say, well,
we should rank this or this
first for something like that.
Maybe these are just really
close by with regards
to like the final score that
they get and we're like, well,
we'll just put
them in this order,
but it doesn't really matter.
ROB: OK.
MALE SPEAKER: Hey John, I
got a quick question for you.
JOHN MUELLER: All right.
MALE SPEAKER: So regarding
the new amp-carousel
for publishers, do you
rank within that carousel?
Does that interfere with the
html version of your page
showing up in the
traditional search results?
JOHN MUELLER: Can you
repeat that last part?
MALE SPEAKER: If you show
up in an amp-carousel,
does that inhibit the
html version of your site
from showing up within the
traditional search results?
JOHN MUELLER: I believe
at the moment it doesn't.
I believe we show
it in both places.
But that's probably something
that can change over time,
where if we see that we need to
deduplicate the results better,
then maybe at some
point we'll say, well,
if it's ranking up here, then
we'll filter it out down here.
MALE SPEAKER: OK.
Thanks.
JOHN MUELLER: All right.
MALE SPEAKER: Another question.
JOHN MUELLER: I
need to head out.
There's someone that
needs this room.
So if you have more
questions, feel
free to add them to
the next Hangout,
and I'll pick them up there.
Thank you all for joining and
FEMALE SPEAKER: Thanks, John.
Cheers.
JOHN MUELLER: Hope
to see you all again
in one of the future Hangouts.
FEMALE SPEAKER: Cheers.
Thank you.
Thanks.
Bye.
MALE SPEAKER: Take care.
Bye-bye.