Transcript Of The Office Hours Hangout
Click on any line of text to go to that point in the video
JOHN MUELLER: OK,
welcome, everyone,
to today's Google Webmaster
Central Office Hours Hangout.
Today, we have a little bit
of a special hangout in that I
allowed users to submit
individual questions
and their sites and to
have a little bit of time
to discuss those issues
directly one on one.
I invited a few people already.
Two of you are here already.
So let's get started.
Which one you wants
to get started first?
Or you want to do a quick
round of introductions?
DANNY KHATIB: Up to you.
JOHN MUELLER: OK.
Yeah, so you're first
on the list here.
So why don't you
go ahead, Danny?
DANNY KHATIB: OK, sure.
My name's Danny Khatib.
I'm the founder and
president of Livingly Media.
We've been around since
2006, publishing lifestyle
content across three brands--
zimbio.com, stylebistro.com,
and lonny.com.
We hit about 25 million
users, do around
400 million page views a month.
We're a venture-backed company.
We've been around for a while.
Our problem, I think,
is a unique one.
And it relates to zimbio.com,
our flagship property.
When we started
zimbio.com, we were
a mix of user generated content
and professional content.
We had an open
blogging platform where
users could write whatever they
want, sort of in an open mic
fashion or re-syndicate
content from Blogspot
or WordPress and
other platforms.
And then we would
mix that content
with content that our
editors would write
and photo and video partnerships
with folks like Getty
and music labels and all
that all mixed together--
same subdomain, no directory,
no easy way to regex it out.
After a few years, the blogger
network had exploded in size.
It was doing very
well for a while.
We had over 100,000
bloggers who published
40 million URLs, 40 million
articles-- a lot of content,
all indexed in google.
And over the last
year, we've come
to appreciate that the network
was overrun with essentially
user generated spam
despite having moderators,
despite having algorithms.
We had no control for it.
And so we killed
the whole feature.
We no longer have it.
The site is entirely
professional content now.
We did that at the
end of the year.
Recently, we found
manual actions,
which is this list
of Webmaster Tools.
And under Partial
Actions for Pure Spam,
we had over 1,000 URLs listed.
And so they were already
deleted from our site,
but they hadn't yet fallen
out of the Google index.
So we went through
the URL Removal Tool,
pulled them all out, filed
a reconsideration request.
Those 1,000 URLs
came off the list,
and another 1,000 URLs popped
in because the effected URLs
is capped at 1,000 URLs.
So now we're pretty scared.
We've submitted
reconsideration requests.
We've said, look, we took
down 40 million URLs.
All of these are gone.
Can somebody please just
manually go through the list?
I've submitted eight
reconsideration requests.
Every time the
1,000 URLs gets off,
another 1,000 URLs
pops up in its place.
This could take months or years.
And I'm not quite sure,
because 99% of the content's
already fallen out of
the index on its own.
So it's all gone.
It's just the manual action
is just hanging around.
Any advice on how
to either submit
a different kind of
reconsideration request
or have Google sort of basically
reprocess the list or the site?
JOHN MUELLER: So especially
for larger sites that
have a lot of user
generated content,
this list of user
generated spam that we find
is essentially
informational for you.
It's meant in the sense
that maybe these are things
that slipped under your radar
that you didn't actually
clean up yet.
But they're not
meant in the sense
that you need to
clean this regularly.
So we see this as
a way of letting
you know that there are
some things that we found
that you might want
to catch up on.
But it's not something
that would otherwise
be affecting your site.
So essentially, what
happens on our side
is we take those individual URLs
down from the search results
because we think they're spam.
But the rest of your
site essentially
is still treated the way it
normally would be treated.
It's not something that if
this is shown in your Webmaster
Tools account, the
rest of your site
will be demoted or
treated in any way badly.
So cleaning this up, I
think, is a good idea.
But especially when you're
talking about that many URLs,
I don't think it's
something that
makes sense to do this manually.
I just-- leave that list there.
Double check from time to
time to see that you're really
taking out the spammy
stuff, that you're actually
cleaning that up and
removing that from your site,
maybe checking your algorithms
to see that they're really
picking these things
up algorithmically
as much as possible.
But otherwise,
it's not something
that you need to
absolutely clean up.
DANNY KHATIB: Got it.
See-- go ahead.
JOHN MUELLER: Yeah,
and the other thing
I thought I'd mention
is the URL Removal Tool.
You don't necessarily
need to use
that for this kind of situation.
Because if they're already being
removed for webspam reasons,
you don't need to
remove them additionally
for whatever other reasons.
I'd use that tool more
for urgent removals.
If someone urgently
needs to remove something
that they accidentally
posted, then that's
a great way to get
that out there.
But if this is essentially
normal site maintenance
that you're doing,
then you don't
need to use the
URL Removal Tool.
DANNY KHATIB: So it's
unlikely, in your opinion
then, that if we had, let's
say, 50,000 of these URLs
flagged with a Pure Spam action
that it would affect the larger
domain's SERP rankings
at some point?
JOHN MUELLER: It would only
affect the larger rankings
if, in general,
your site's quality
were lower quality
kind of thing.
So it's not the
case that these are
flagged from a manual
webspam point of view,
but more that our overall
quality algorithms look
at your site and they
might say, overall,
the quality isn't that great.
It's not that the
quality algorithm
looks at the manual
web spam reports.
But if they were to
look at your site
overall and say
overall quality isn't
that great, then
those algorithms
might pick up on that.
But that's essentially
independent of anything
from the web spam side.
DANNY KHATIB: Got it.
Got it.
So as long as we've already
deleted all the content
and it's falling
out of the index,
we should be OK
even if it's still
hangs around in the Manual tool.
JOHN MUELLER: Yeah, exactly.
DANNY KHATIB: OK,
that's helpful.
Thank you.
JOHN MUELLER: Great.
All right.
Spiros.
SPIROS GIANIOTIS: Yes, hi, John.
I'm Spiros Gianiotis.
I'm from Athens, Greece.
And we've been around on
the internet since 1996
with the first travel
domain in Greece.
We're called traveling.gr.
We handle a lot of clients
in the tourism and travel
industry.
One of our clients is
a major hotel chain.
And they have something
like around 25 to 30
separate domains
with their hotels.
And they're considering
putting them
under either as subdirectories,
each hotel, or as subdomain.
The question is which
one would be best?
Please note that they're
all hotels in Greece.
Greece is a very small country.
And a lot of these hotels are
relatively close in vicinity--
not that they're nearby,
let's say, walking distance.
But geographically,
they're very close.
So what would you
have to say, John?
JOHN MUELLER: I think
it kind of depends
on what you want
to achieve there.
From a technical
point of view, you
could move that into subdomains.
You can move them
into subdirectories.
You could leave them on separate
domains if you wanted to.
Geotargeting might be a factor
that you might want to look at.
If they're on a .gr domain,
then they would be geotargeting
Greece.
Maybe that's fine for you.
Maybe you would like to have
it more generic on a .com,
Essentially, you could have an
international website on a .gr
domain.
And that's fine too.
But if you wanted
to, for example,
target users specifically in
France, than on a .gr domain,
you'd be targeting more of
those in Greece or the kind
of average global audience.
That's it.
Past that, I think
it's mostly up to you
how you want to organize these.
So you could leave
them separately
if you think that
these are essentially
separate entities that
should be shown separately,
that should be treated
separately by users as well.
If you'd rather like to see
this as one strong entity,
maybe putting them together
on the same domain.
Subdomain or subdirectory
is essentially up to you.
Sometimes, there are technical
reasons for one or the other.
So that might be an option too.
It kind of depends
on how strongly you
want to organize
everything into one group
compared to keeping it separate.
Sometimes there are
also maintenance issues
around there.
So with 25 hotels,
I imagine it's
handleable on separate
domains as well.
But if you'd go up to
100 or 1,000 hotels,
then at some point,
it makes sense
to combine everything
into one domain
and treat it more as a
package that you could easily
maintain instead of
solidly separate domains.
So from that point of
view, I don't really
have the magic answer for you.
But I hope that
kind of gives you
some ideas to think about there.
SPIROS GIANIOTIS: So because
I've heard various explanations
saying that if they were under
one domain in subdirectories,
that would strengthen the
domain rather than putting them
in subdomains-- does
that have any logic?
JOHN MUELLER: In
a case like that,
we treat subdomains the
same as subdirectories.
So it's not
something where you'd
have any big advantage there.
I'd look at it more from
a technical point of view.
Sometimes it's easier
in subdirectories.
Sometimes, it's
easier in subdomains.
SPIROS GIANIOTIS: Mhm.
OK, thank you very much.
JOHN MUELLER: All right.
Let me open it up
to everyone else.
But if either of you
have any questions,
feel free to ask
away in the meantime.
DANNY KHATIB: [INAUDIBLE].
JOHN MUELLER: Any questions?
SPIROS GIANIOTIS: OK.
DANNY KHATIB: Well, I mean,
I can ask another one.
JOHN MUELLER: Sure.
DANNY KHATIB: So as I mentioned,
related to the problem
I had before where we had almost
40 million articles that we've
actually deleted, now
about 96% or 97% of them
have fallen out of the index
after about 4 or 5 months.
But it seems like there's still
the 1% or 2% hanging around.
And they're not falling out
with the same pace and vigor
as the rest is.
So I'm a little
concerned that there's
almost a million articles
that have been deleted
that are still low quality and
that are still in the index.
Is there any-- there's
no regex pattern
that I can put to sort of
handle that targeted strike
to tell Google to remove
those million URLs.
Is there any way that
I can more efficiently
get those 1 million URLs
dropped from the index?
JOHN MUELLER: I mean
one thing you could do
is set up a site map
file with those URLs
and say that these essentially
get changed in the meantime
so that we start
crawling them again.
That's something you could
do to kind of trigger
a re-crawl there.
I wouldn't recommend leaving
that for the long run
because then you have the
disconnect between you're
saying these are URLs that
should be crawled and indexed,
and actually, there
is not content.
[PHONE RINGING]
But if this is something that is
a one time thing that you want
to have re-crawled
and re-indexed,
then that might
be a possibility.
Also with the site map
file, one advantage
that you have there is you'd
see how many URLs are indexed
for those individual site map
files within Webmaster Tools.
So you kind of see the progress
as it's moving along there.
But that's it.
Especially for
larger sites, there's
some URLs we crawl quickly
every couple of days.
And others that can take
several months, maybe
even up to a half a year
to be re-crawled again.
So there's some
amount of latency
that's expected for sites like
that where a large amount might
be re-crawled and
reprocessed fairly quickly.
And the rest just takes a while
to actually be picked up again.
DANNY KHATIB: Great,
that's helpful.
Thank you.
JOHN MUELLER: All right.
Let's grab some
questions from the Q&A
that were also submitted here.
"My site offers an
affiliate scheme.
And all articles
out there explain
what to do when
linking to a website
but don't explain
what to do if you're
the website all these web
masters are linking to.
I use affiliate
parameters in my URLs."
Essentially what
you'd want to avoid
is that you're using
this affiliate scheme
to build page rank so that it
looks like you're artificially
creating links to
your site like this.
What I'd recommend doing is
just making sure that the code
snippets that you provide for
your affiliates really include
the rel="nofollow" in there so
that they don't pass page rank.
And that's essentially the
best thing you can do there.
Apart part from that,
you don't necessarily
need to do anything in between.
I know some websites
have a domain set up
that they redirect
through that's
also blocked by
robots.txt, which
is another way of
additionally blocking the page
rank for passing.
But in general,
that's not something
that you'd really
need to do there.
So just to make sure that
your affiliates also use
a rel="nofollow" when linking yo
your site so that we don't pass
page rank those links.
"Does a sandbox really exist?
My website started to rank
after five to six months.
But all ranking
dropped after 10 days.
Now, my keywords rank after the
six page in the search results.
I haven't used any
unethical tactics."
So traditionally,
the sandbox has
been something where,
as far as I recall,
new websites would
essentially be kind of held
back almost when they start.
And what you're
describing there seems
something completely different.
So you're saying
this website was fine
for five to six months,
and then dropped.
So that would kind of point to
something completely different
anyway.
From our point of view,
we don't have anything
that we would call a sandbox.
There's some aspects
of our algorithms
that might have similar effects.
But there's nothing
specific where
we'd say all new
websites are blocked
for a certain period of
time until we show them
in search results.
In addition, sometimes
it's even the case
that new websites,
where they show up,
they show up fairly well
in the search results
because we don't have a lot of
information about this website.
But it looks great.
It looks like something that
we'd like to show our users.
So maybe we'll show it
even a little bit more
than we would otherwise.
So--
AUDIENCE: Is that
a bonus that you're
giving the site when it's
showing up better than others?
JOHN MUELLER: It's
not necessarily
something like a
bonus where we'd
say all new websites get this
extra cookie when they start
showing up in the
search results.
But sometimes,
it's just the case
that we don't have a lot
of data for a new website.
And we have to make decisions
based on limited data.
And just because we
don't have a lot of data
doesn't mean that it's
bad, the website is bad.
But at the same
time, it also doesn't
mean that it's the
best website out there.
So we have to make some
kind of an informed decision
algorithmically on how we
should treat the situation where
we don't have a lot of
data about a website.
And sometimes, it happens
that these websites
sink in ratings over time
until they settle down
into some stable state.
Sometimes they go up
a little bit over time
and settle into a little
bit of a better state.
So it's kind of
a situation where
you have a little bit of
information about a website,
but you don't really have
all of the information
that you can use to make a
really informed decision on how
and where this website
should be ranking.
And that's sometimes
something that webmasters
could see as a kind
of a sandbox effect
where we have
limited information.
We have to make do
with that information.
It takes a while for that
data to be collected.
And after a while,
you'll see some changes.
So it's not that
there's specifically
a sandbox to hold
those sites back.
But sometimes you
might see situations
where, after a certain
period of time,
the ranking kind of settles
down into a different position.
"Say I use 301 redirects
to move from site A to site
B. After six months, the
redirects are removed
because another company
will use site A.
Does the site authority,
page rank, et cetera,
stay with site B or does
it go back to site A?"
That's a bit of a
tricky question,
because six months is kind of
a problematic time in the sense
that some of the URLs will have
re-crawled and re-processed
with the new URLs.
And others, we might not.
So if after six months, you're
kind of splitting those two
domains into completely
separate sites,
then it's possible that some of
the signals that we have remain
attached to the old website.
And some are already
transferred to the new one.
So that's something
where I'd recommend
at least keeping the redirect
in place for longer-- maybe
a year, maybe even
longer than that--
and also making sure that
all of the external signals
are also updated.
So if there were links to
site A that are actually
meant for site B now, maybe
contact the webmaster and say,
hey, we moved to a
different domain.
Please update your
links to the new one
so the users get there directly.
And if those links
are updated, then we
can use those directly
and pass all of the data
to the new domain whereas
if there were only
these redirects in place and
it was only for a limited time,
then it's thinkable that at
least some of the signals
go this way, some of
them go the other way.
"Does performing a site
move with Webmaster Tools
and 301s cause an instant
reevaluation of the destination
site with regards to algorithms
like [INAUDIBLE] thus stopping
people from trying to
run away from penalty?"
We do have algorithms that
try to follow up on site moves
to make sure that any
signals that we attached
to the old website are
forwarded to the new one.
So if there is
algorithmic problems
with one version
of the site and you
just 301 redirect to a different
domain, then traditionally,
those signals would
be forwarded as well.
That's especially the case
with link-based algorithms,
where of course the 301 redirect
forwards the page rank anyway.
So all of those
problematic links
that you might have
with your old domain,
if you're just redirecting
them to a new domain,
then they'll be
forwarded as well.
So that's something
where you'd probably
see these problematic
parts of an old site move
onto the new site, if
you're just 301 redirecting.
If you think that
the old site is
so bad that you
can't possibly fix it
and you really
wanted to start over,
then I'd really recommend
starting a new website,
not redirecting
from the old one.
AUDIENCE: John, I have
a related question here.
JOHN MUELLER: OK.
AUDIENCE: So if a site
is affected by Panda,
and that's for thin
content, but the site
had some really
good content, what
if we take just
that good content
and redirect that to a new
subdomain of the same site--
not a new domain, but a new
subdomain on the same site that
is more relevant.
So for example, if it's
a broad-based website
like Wikipedia that
has a lot of content
and you might not be able to
figure out what it's about,
then you get a
specific sub-domain
for a specific niche,
like health care,
and transfer some of the
good content over there.
So will Panda still
transfer all the new domain
or what do you
think would happen?
JOHN MUELLER: It's hard to say
completely because sometimes we
treat subdomains as part
of the website itself.
So that's something
where maybe you're
just kind of like moving things
around within the same bucket.
The other problem there is if
you're moving the high quality
content out of
your website, then
that's kind of a weird situation
in that it looks like you want
to keep the low quality
content on your old website,
but at the same time, move some
of the high quality content
to a different website.
So what I'd recommend doing
there is more of the opposite
and either removing
the low quality content
or moving the low quality
content to a different domain
so that it's really clear
when we look at your domain,
we can see, OK, this
domain has a history
of really high quality
content overall.
Overall when we look at the
pages that are indexed there,
it's a good mix.
There's the right
amount of content here.
There's good high
quality content here.
And the low quality content that
we might have seen in the past
is actually no longer there.
Maybe it's on a
different domain.
Maybe it's noindex.
Maybe it's 404 completely.
So that's kind of the
direction I'd go there.
Instead of moving the
high quality content out,
maybe just really
cleaning things out
and taking the low quality
content out of the site
and focusing only
on the high quality
content on that existing site.
AUDIENCE: OK, so
you said that you
look at the subdomain as part
of the main website then.
JOHN MUELLER: Sometimes, yeah.
AUDIENCE: Sometimes, OK.
JOHN MUELLER:
That's [INAUDIBLE],
right? [LAUGHS] I mean, there
are situations where clearly,
a subdomain or
even a subdirectory
are separate sites.
So when you think of things like
shared hosting environments,
sometimes-- for
example, on Blogger,
there's like 1blogspot.com,
2blogspot.com.
And they're separate sites.
We need to treat them
as separate sites.
Sometimes they're shared
hosting that uses subdirectories
in the same way in the
sense that they say,
this subdirectory is this user.
A different subdirectory
is a different user.
And we can treat those as
separate sites as well.
But if you're talking
about the same overall site
and you're just moving
it into a sub-domain,
then chances are our algorithms
are going to look at that
and say, well, this is all
just part of the same website.
They're using different
subdomains, which is fine.
But it's not that we need
to treat these subdomains
as really separate websites
because they're actually
kind of the same.
AUDIENCE: OK, thank you.
JOSHUA BERG: John?
JOHN MUELLER:
Another thing maybe
worth mentioning there
is if there's low quality
content on your site that
you want to keep for users,
but you want to prevent it from
causing problems with Google,
one idea could be
just to just use
a noindex metatag
for that content.
So if you know that some
pages are low quality
but you think many
people, when they're
browsing through my website,
they want to see this content,
then the noindex lets
you keep it on the site.
But it prevents it from causing
problems on Google's side.
AUDIENCE: Got it.
Thank you,
JOSHUA BERG: John, also about
Panda, I'd like to know,
is the newer Panda--
I mean, isn't it
more page specific
than site specific
so that it filters more of
a page level much better?
JOHN MUELLER: It does
try to be more granular,
but it's not going
to be such that it
does on a per URL basis.
So we do try to take into
account parts of a website
when we can pick that up.
But it's not the case
that's on a per URL basis.
So this is really
just something where
if you recognize that
there's low quality
content overall on your
website, then that's definitely
something I'd work on cleaning
up overall and making sure
that, in general, when
we look at your website,
we understand that this
is a high quality website.
So when a new page
appears on your website,
we don't have to analyze
the page's content first.
We can say, well, this website
is a great website overall.
We don't really have
anything to fear
by ranking this new
page that we don't know
that much about fairly
highly in search
results, whereas if
overall we think generally
the content on this
website isn't so hot, then
new content that we might find
there will probably be treated
a little bit more cautiously.
SPIROS GIANIOTIS: John--
JOSHUA BERG: I was
saying that was
a little awkward that
a site I suspected
had gotten hit by the
newer Panda algorithm.
The home page was
reduced a lot in ranking.
But a lot of the
articles, especially some
of the very good, popular
articles, didn't even budge.
They stayed very
high in rankings.
So would it be safe--
am I assuming wrong
that maybe that
wasn't Panda related?
JOHN MUELLER: I don't know.
It's hard to say without
knowing the site.
But theoretically, a situation
like that could be possible.
For example, if the
homepage isn't very strong
but these individual
articles are really popular,
then maybe these
articles are also
reduced slightly in ranking.
But that reduction
is so small compared
to the overall good signals
that we have for those
individual articles that
it's not extremely visible.
JOSHUA BERG: OK, so
we could have Panda
maybe give an overall minor
reduction and then maybe
a stronger reduction on
certain pages as well?
Or you mean just
like an overall?
JOHN MUELLER: I imagine it's
just an overall reduction
there.
And I don't know.
From your description,
my first thought
would be it's probably
something else.
It's probably not Panda.
But there are
situations where there's
a slight reduction
based on these broader
site-wide algorithms.
And you might see subtle
changes like that.
So individual pages might
drop a little bit more
than other pages, just because
we have so many good signals
for those other pages as well.
So--
JOSHUA BERG: OK, there was
some layout possibilities
like the layout algorithm or
maybe something Payday Loans
related.
JOHN MUELLER: I don't know.
I'd keep an open mind in
a situation like that.
But it's something
where if you're
seeing subtle
differences in reduction
in ranking for these
individual pages,
it's probably worth looking
at a variety of factors.
Maybe things are just
adding up in a weird way.
JOSHUA BERG: OK, thanks.
SPIROS GIANIOTIS:
John, coming back
to what you were saying
earlier regarding subdomains--
in my case, where we're talking
about these separate hotels,
would Google see these
subdomains as different sites
if they weren't in
subdomains since they
have a distinctive character?
JOHN MUELLER: It's
very possible, yes.
But it all kind of depends on
how you build that website up.
If it looks like
it's essentially
one big website with
different subdomains
for individual
places, then that's
something where we'd say that
this looks like one big website
and we'll just treat
it as one big website.
And that's not
necessarily a bad thing.
So that's not
something where we'd
say you only have
one [INAUDIBLE]
in the search results.
Sometimes you have
multiple slots regardless.
So that's something where
I wouldn't focus so much
on Google side of whether or
not it looks at it as one site
or not, but rather, find the
layout that works best for you
as a webmaster and that makes
the most sense for your users.
SPIROS GIANIOTIS: Thank you.
JOHN MUELLER: All right.
JOSH BACHYNSKI: Hey there, John?
JOHN MUELLER: Yes?
JOSH BACHYNSKI: Hi, John.
I had a quick question about
the new signal everyone's
talking about, of
course, is HTTPS.
JOHN MUELLER: OK.
JOSH BACHYNSKI: I'm wondering.
Is it a part of Panda?
Or is it a standalone algorithm?
JOHN MUELLER: It's separate.
JOSH BACHYNSKI:
It's separate, OK.
So does it run on
an infrequent basis?
Or does it run on
a regular basis?
JOHN MUELLER:
Essentially, it looks
at what we have indexed
for the website.
So it's not something
like Panda or Penguin
with that's like a
site-wide algorithm that
has to aggregate a lot of
signals about the website.
We essentially look at
it on a per URL basis.
So that's something that kind of
runs automatically on it's own.
It's not something you would
need to wait for a refresh for.
It's essentially a continuous
updating [INAUDIBLE].
JOSH BACHYNSKI: Oh, OK--
AUDIENCE: So businesses
can take their time, right?
I mean, to go to this--
JOHN MUELLER: Sure.
I mean, at the moment, it's
a very lightweight signal.
So it's not the case that
if you have an HTTP website,
you will disappear from search.
We think it's a great
idea to move over.
And I imagine over
time, it's something
that more and more
websites will be doing.
So I'd definitely look
into it, especially
if you're doing an
update at some point.
But I wouldn't see
this as something
where you have to halt
everything that you're doing
and move over to HTTPS so
that you can remain in search.
So I'd definitely keep in
mind and think about it
as you're revamping
your website.
But I wouldn't see
this as something
that should cause you to stop
everything else and move over.
AUDIENCE: John, suppose you
install the certificate,
the HTTPS certificates,
but you have an error,
like the certificate is not
installed correctly at 2048.
And so would you
still give the site
a good-- would the ranking
still be implemented
if that person hasn't correctly
installed the certificate?
JOHN MUELLER: So
what usually would
happen there is that every
user, when they access the URL,
they'd see that error
directly in the browser.
And the browser would block
them from going to your site.
AUDIENCE: Right.
JOHN MUELLER: So that's
a fairly big block.
And that's not something
we have under control.
That's essentially
the browser saying
this certificate doesn't
work for this website.
So that's something that's
fairly problematic that I'd
work to fix as
quickly as possible.
From our point of view, what
happens there is we usually,
assuming that same content
is on HTTP and HTTPS,
then we'd see that
error as well when
we try to crawl those pages.
And we'd say, OK, we
know the same content is
on both of these URLs.
But this one has a broken
certificate and this one is OK.
So we'll use the OK
version, the HTTP version,
assuming the same
content is on both URLs.
If you're doing a
redirect, then of course
it doesn't matter
which one we show.
The user is going to see
the certificate and Error
page anyway in the browser.
But if it's just
implemented incorrectly
and you still have the same
content on both versions,
we'll probably just fall back to
the HTTP version for indexing.
AUDIENCE: OK, thanks.
WILLIAM ROCK: Hey, John?
Hey, John?
Can I ask a little
bit more on the SSL?
I've got one in the
Q&A based around this.
There's a lot of
questions that I've
got from just random
companies, CEOs,
about why it's so important.
If they're not
actually running a CMS
or they're not running
something else,
why is it important
for them to go to SSL?
And then the ones
that are basically
running e-commerce, what
kind of levels of SSL
are they-- I read
through the document
as well so I know that answer.
But I want to kind
of get it from you.
JOHN MUELLER: Yeah, so
from our point of view,
if you implement HTTPS properly,
then that's fine for us.
It's not something where we'd
say this specific certificate
is good and the
other one is bad.
I imagine maybe in
the long run, we'll
be able to differentiate
a little bit more.
But at the moment,
it's really just either
it works or it doesn't work.
And that's kind of
what we look at there.
With regards to
the type of sites,
I think it's important to
keep in mind that there
are three things
that HTTPS does.
On the one hand,
it's authentication.
So it tells the
user that they're
accessing the right website.
On the other hand,
the content that's
being transferred between
the website and the user,
in both directions,
is encrypted.
So on the one hand, it can't
be modified by third parties.
We've seen ISPs add
ads into those pages,
add tracking pixels.
Hotels tend to do that every
now and then that they'll
put extra ads in the pages.
They'll change some
of the ads maybe even.
On the other hand, this
content can't be listened to.
So it's not something where if
you submit something to a site,
then that would be
incomplete to open.
It wouldn't be transferred.
So using HTTPS kind of
protects you from that.
A good analogy, I guess,
there is essentially,
if you're using HTTP
without encryption,
you're kind of sending your
content on a postcard written
in pencil to the user and
hoping that they get it right.
And in general, the
people along the path,
they might be good
people and say, OK, well,
this is good content.
I'll just forward this
postcard on without reading it,
without changing it.
But you never really know.
And the user, when they
receive this postcard,
if it's written in
pencil, they can't really
tell if this content
has been changed,
if others have been
listening in, watching this.
So it's really hard to tell
what has happened there.
And even for seemingly
uncritical sites, sometimes
the user feels that
this isn't something
that they just basically
randomly want to look at.
So if you're looking
at a small business
and you're looking at
the job section there,
then maybe that's not something
that you want their employer
to know that you're looking at.
So these are kind of
the situations where
even if the content isn't
that credit card number, even
if you're not doing
financial transactions,
there's a lot of stuff
that maybe users want
to keep private.
And it's almost hard
for you as a webmaster
to make that decision
for the user.
So being able to have
everything on HTTPS
gives you that
security by default.
WILLIAM ROCK: Thank you, John.
I think the other
piece is basically
the security of a company
physically versus a company
online.
And I think that a
lot of people forget
that they secure their
companies with security alarm
systems and this and that.
But then when you
go online, they
forget that that's another
portion or extension
of their business that can
potentially get ruined.
JOHN MUELLER: Yeah
I think it's just
important to keep in mind
that HTTPS doesn't protect
your website from
being hacked or it
doesn't protect your servers
from being manipulated.
It essentially just
protects the connection
between the user
and your server.
So if your server gets
hacked by someone,
if somehow malware
makes it to your server,
that's something that you can't
protect with HTTPS that you
really need to stay on
top of separately as well.
So it's not a magic bullet.
It's not that if you
switched to HTTPS,
then all of your security
problems will be solved.
But it's definitely
something that
at least keeps the
connection from your server
to the user in a secure way
so that random people can't
listen to it.
They can't manipulate it.
And kind of protects
you on that front.
WILLIAM ROCK: So it's kind of an
additional layer for businesses
to help protect
themselves, and Google
is for protecting
the experience of it.
JOHN MUELLER: I guess you
could see it like that, yeah.
I mean, at the moment it's a
very lightweight signal for us.
So it's not the case that if
you don't switch to HTTPS, then
you'll disappeared from
search or that you'll
have this big disadvantage
compared to your competitors.
But I think over time, that
might change as users become
more and more used to HTTPS as
they see that it makes sense
to have a secure connection
to those websites
that they're active on, even
if they're not exchanging
financial information.
WILLIAM ROCK: Yeah, I think it's
just an addiction that people
are wanting the ranking portion
of this versus the security
portion of this.
JOHN MUELLER: Yeah, it's
always a tricky situation.
WILLIAM ROCK: Thank you, John.
JOHN MUELLER: All right.
Let's grab some
more from the Q&A.
"Why is Penguin still not ready?
Too susceptible to
negative SEO, Google
isn't happy with the
results it will give out?"
This is definitely
something that the engineers
are working on.
And we're looking into what
we can do to speed that up.
At the moment, I don't have a
magic answer for you as to why.
That's always hard
to answer anyway.
But at the moment, we
don't have an update
just around the corner.
But we're working on speeding
that up and making sure
that it also works a little
bit faster in the future.
"Do you know 100% that there
will be a Penguin update?"
It's definitely something
we're working on, yeah.
So I'm pretty sure
that it will be
something like a Penguin update.
It's not the case
that we'll just
leave this data
like that forever.
It's definitely something that
we're working on cleaning up.
"I'm building a site map and
adding hreflang tags to it.
Should I include the
x-default as well?
Your article doesn't
explain anything
about the x-default
in the site map."
So the x-default is a way of
specifying the default language
and location pages if
you'd have anything
that you'd like to
treat as a default page.
And you can use
that in a site map
file just as you can use
any other language tag.
So it's not something
that would be
specific to just the on page
markup or just the site map.
You can treat it the
same as you would
maybe the EN or the
German or whatever pages
that you have there.
So that's something
you can definitely
include in your site map.
Another hreflang question.
"I have a site that
uses subdomains
to target a dozen
countries specifically,
like the UK, US, Australia, et
cetera, as well as by region,
such as asia.domain,
africa.domain.
What's the best way to approach
hreflang for these regions?"
So one thing you
can do with hreflang
is you can use the same
page and include it
for multiple language
and location tags.
So you could, for example,
have one page that's
valid in the UK and
Australia, and a different URL
that's valid for the US.
And what you would do there is
just include separate hreflang
metatag or site map entries for
the different language location
areas, and just specify
the same URL again.
And that same URL can
also be your x-default.
So it's not the
case that you have
to have separate URLs for each
of these language variations.
You just need to specify and say
this is the page for Australia.
This is the page for the
UK-- might be the same one.
And this is the one for the US.
So that's something you
could split up like that.
JOSH BACHYNSKI: Hey, John?
JOHN MUELLER: Yes?
JOSH BACHYNSKI: I have a
hypothetical search question
for you.
JOHN MUELLER: OK.
JOSH BACHYNSKI: Is it
hypothetically-- I'm
just going to
brainstorm this a bit.
Is it hypothetically possible
to run version of Penguin
that just releases
the sites that
have done their due diligence
and deleted the bad links
and disavowed the rest?
Is it possible in
the interim just
to run a version of Penguin
that just releases those guys?
Because they've been
waiting for a long time.
I don't want to
get whiny on you.
They've been waiting for a
long time, for over 10 months.
And I imagine some
of them have gone out
of business waiting so long.
Is it just possible to
do a version of Penguin--
could you pass this
onto the engineers--
just one that will
let people up that
have done the work
to clean things up?
JOHN MUELLER: Essentially
what that would need
is a complete data refresh.
So it's not something
that is like just a tweak.
It would essentially need
to have everything re-run
completely.
So that's not something
where we'd probably
just do that randomly
one afternoon
and just push that out.
I think one of the reasons also
why this is taking a little bit
longer is because we
just want to make sure
that the next data that
we push is actually
the right kind of
data that we'd like
to have reflected
in search results.
So it's not something
where we'd kind of just
rerun a part of the
algorithm and push that data.
We'd really need to update
that data completely.
WILLIAM ROCK: OK, thanks, John.
AUDIENCE: John?
JOSHUA BERG: Is
the-- yeah, go ahead.
AUDIENCE: Sorry,
can you hear me?
JOHN MUELLER: Yes.
AUDIENCE: I was just wondering
if you have the ability
to-- sorry, as the previous
guy was just saying.
You know sometimes
you release things
that say we've run the
algorithms without using links,
for example.
And the results are terrible.
Or they're worse than
they are with the links.
Internally, can you
run the algorithm
by removing Penguin altogether?
And does that look
worse than currently?
Because I think that's
essentially what he's saying.
So run it without Penguin
internally, does it look worse?
No, OK, then forget it.
JOHN MUELLER: Well, I mean,
if we can improve the search
results by just
turning something off,
then that's something
we'd love to do.
In general, the less complexity
that we have in web search,
the happier the
engineers are, the easier
they can work on future
projects as well.
So if with any
algorithm that we had,
the search results
were the same or better
with turning something
off by deleting code,
by deleting files,
then that's something
we'd definitely want to do.
So since that's not quite
what we're doing here,
then I'm pretty sure
the engineers have
done those evaluations and said
this makes a big difference.
And it's vital for us that
this algorithm remain in place
until we have something
that replaces it.
So that's something where
if the possibility where
to exist that it would be better
by not having this in place,
then we'd definitely
jump to do that.
And every now and
then, we do take
specific parts of our
algorithms out and we say,
OK, our new algorithms are
covering this area as well as
maybe two or three other places.
We can take this out.
We can remove this
algorithm completely.
We don't need the
data files anymore.
We don't need this
algorithm at all.
It saves us time.
It saves us complexity.
It makes it easier for
us to create new content,
new algorithms.
So we'll just delete
that code completely.
And that's something that I
think every healthy software
company has to do.
They have to go through
the code regularly and say,
hey, this is something
we don't need anymore
or maybe it focuses
on some aspects
that webmasters aren't doing
anymore-- maybe, I don't know,
keyword stuffing is
probably something
that webmasters generally
aren't doing anymore-- then
maybe algorithms like
Panda are picking up
on it a lot better.
Maybe we don't need a separate
keyword stuffing algorithm
when we can just delete that.
So if we can, we'll try to
delete stuff and clean our code
base up that way.
But if it's still
in place, then we've
probably been looking
at the metrics
there and saying this does
make a really big difference.
And it's vital that we keep
this in place for the moment.
AUDIENCE: Probably.
JOHN MUELLER: Yeah.
I mean, these are all
things where we regularly
talk to engineering
teams about this.
And we give them examples of
things that we've seen from
the help forums, from
Google+, elsewhere.
And we say, hey, in
these situations,
it looks like the webmaster
has been doing the right thing.
Our algorithms should be
reflecting that at some point.
And this is the kind
of data that they
use to make those kinds
of decisions as well.
And at some point, they might
just run a new evaluation
and say, hey, what
would our search results
look like with this
algorithm turned off
or with this algorithm
turned on or slightly tuned.
And where does it makes
sense to make those changes?
And it's definitely not the
case that we'd artificially
keep our search results bad
by sticking to algorithms
that don't make much sense.
If we can improve the search
results, we'll do that.
AUDIENCE: OK.
JOSHUA BERG: John, is
there quite an increase
in negative SEO or maybe
just the controversy of it
that would be an important
part that's like a hold
up with the new Penguin or
just being careful in that
regard that we don't have that--
JOHN MUELLER: Yeah.
I mean, we always have to
be careful in that regard
and to make sure that we're
algorithmically and manually
picking up on those
issues so that they
don't cause any problems.
And it's not a new topic.
It's been out there since, I
don't know-- since beginning
of Google almost where people
would say, oh, if Google thinks
this is bad, I'll make it
look like my competitor is
doing this.
And this is something that our
algorithm has to live with.
We have to understand
that this is happening
and to kind of work around it.
JOSH BACHYNSKI: Hey, John.
So just to make
sure my point was
clear because the conversation
got sidetracked a bit,
I just think it would be a good
part of your public relations
strategy to pass
onto the engineers
that whether or
not Google thinks
the sites are worthy
to be in the index,
if the sites have done
the work to clean up
the supposedly bad
links, personally, I just
think that they
should be rewarded
in taking those actions on their
own expense, at their own time,
and not being held
down for so long.
That's just my personal opinion.
I won't go on with
anything further.
And I think that passing that
onto the engineers-- that's
not a good public relations
situation for Google.
That, I think, might
be a good idea.
Thanks very much.
JOHN MUELLER: Sure,
I'll pass that on.
AUDIENCE: I have a question
about the Removal Tool.
Can I just ask a
one minute question?
JOHN MUELLER: Sure.
AUDIENCE: So when I submitted
the stuff-- for instance,
for a client that had
stuff that's was cached,
and so the other
person on the other end
removed, for instance,
like blog comments.
So they closed that, right?
So when I submitted
it, I basically
let them know that the content
has been outdated and so forth.
Is there certain ways that
you need to write to them?
I mean, it got
denied for no reason.
The stuff is not there anymore
about that specific person.
JOHN MUELLER: So we have
two variations of that.
One is for the webmasters.
So if you have
the site verified,
then essentially,
we'll take those URLs
and do that automatically.
If it's not your
website, then you
have to specify individual
words that were removed.
So you don't write like a
sentence saying, this and that
and that was all
removed, but rather
just the individual words where
all the very words that you
mentioned are no longer
on the page itself,
and they're still
in the cached page,
AUDIENCE: Yeah, the
page has changed.
And the Google's cached
version is [INAUDIBLE].
JOHN MUELLER: Yeah, so
you'd just-- for example,
if they removed the
word "John," and if you
search for the page
and the word "John"
is no longer on the
page at all, then you'd
specify just "John" in
that keyword area where
you'd say this was removed.
You wouldn't say "this guy
in Switzerland that I know,"
because maybe some of those
words from that sentence
would still be on there.
AUDIENCE: Right.
JOHN MUELLER: So just
the individual words that
were actually removed.
AUDIENCE: OK.
Thank you.
JOHN MUELLER: All right. "Should
I delete old pages immediately
after 301 redirection or
better to wait several weeks?"
So technically, if a
page is redirecting,
then there's no content there.
So you can delete those
pages immediately as long
as those URLs are
still redirecting.
"We experienced a
boost in indexed pages
by manually submitting
our XML site map
files in Webmaster tools.
What are the reasons for
Google to increase index pages
if one submits XML site maps
in Webmaster Tools manually?"
In generally, we
use site map files
to crawl and re-index content.
So if we think that
there's content there
based on your site
map files, we'll
try to pick it up like that.
Usually, if it's
a normal website,
we can also pick
up the same content
through normal
crawling and indexing.
So this isn't
something where you'd
need to submit a site map file.
But sometimes it
helps to make sure
that we pick up all the
change or the new pages,
especially if crawling
is somehow complicated.
So for example, if you
have a website that's
very large that has a lot
of lower level page that
are maybe, I don't know, 20
links away from the homepage
and you changed
some of those pages,
it might take a while
for us to recognize
that those pages
changed or that there
are new pages added there.
So with a site map
file, you can let
us know about those
changes within your website
fairly quickly.
There are other ways you can do
the same thing-- for example,
by linking to those
pages from your homepage
and saying, OK, on
this lower level page,
we have some new content--
maybe new articles,
an e-commerce shop,
maybe an updated article,
or something like
that where you'd
have a listing in the
sidebar, something
like that on the homepage.
So the site map file
is great for letting
us know about these updates.
But you don't necessarily
need to do that.
Let's see.
Here's another one.
"Should we disavow links that
have been obtained from others
stealing our content such
as content hyperlinks
now on other domains
and pointing back to us.
Does Google know these
are duplicates and devalue
the links and not count them?"
In general, we recognize
those kind of situations
and treat them appropriately
and just ignore those links.
What I'd recommend doing
there is if you find something
like that, I'd just
disavow the whole domain.
And then you're sure that
it's definitely covered.
I wouldn't take
any situation where
you run across some
kind of a problem
and you know you could
fix it, but maybe Google
could fix it automatically
and just leave it and hope
that the search engines will
magically handle everything.
If you can fix it, why not
just take it into your hands
and clean it up yourself?
So that's something
where putting something
on a domain and a disavow
file is trivial to do.
And if you've seen it there,
then fixing it on your side
is really easy.
We'll probably also handle
it appropriately on our side
as well.
But if you see it, why not
just take care of it yourself?
AUDIENCE: John, the
Disavow Tool in Webmaster,
it has really strong
warning on, you know,
this will void your warranties
so most people don't
have to use it.
But I guess what you're saying
is if you can, just do it.
JOHN MUELLER: I guess
my point is more
that if you see this
problem and you're
worried about this problem, you
can take it into your own hands
and just take care of it so that
you don't have to worry about
whether or not Google
fixes it on its own.
If you can take
care of yourself,
then you're sure that
it's taken care of.
You don't have to rely on
this vague algorithm that
will probably be able
to handle it right.
AUDIENCE: But what
I'm worried about
is for somebody who doesn't
have a disavow file,
and you suddenly discover some
scraper who got your content
and also links back to you.
And I put that in
a disavow file.
That's the only thing
in my disavow file.
JOHN MUELLER: That's fine.
AUDIENCE: That is
signal to Google
that I think that
everything else is OK
when the reality is--
JOHN MUELLER: No, that's--
AUDIENCE: --I haven't really
looked at anything else.
JOHN MUELLER: That's fine.
We use that mostly
as a technical tool.
So if we find the domain
listed in your disavow file,
then we don't follow those
links to your website.
It's not the case
that we would say, oh,
they have a disavow file.
Therefore, they must
be spammers and they
know what they're doing.
It's more that we
take these links.
And we say, oh, they're
in the disavow file.
We'll ignore them.
Fine.
That's done.
It's not the case that you have
any kind of negative impact
from using the Disavow tool.
AUDIENCE: OK.
JOHN MUELLER: All right.
We just have a few minutes left.
I'll open it up to you guys.
JASON: Hi, John.
JOHN MUELLER: Hi.
JASON: [INAUDIBLE].
JOHN MUELLER: I
didn't quite hear you.
AUDIENCE: Jason
from walmart.com.
JASON: Oh, yeah.
This is from Jason
from walmart.com.
I'm asking on
behalf of our team.
On a product assortment
page for shopping,
is it considered valuable
to include related products
as part of primary
content of the page?
Or should related
products be presented only
as supplement content?
JOHN MUELLER: That's fine to
have on those pages as well.
From our point of
view, it helps us
to understand the
context of those pages
better because we see links
to those related products.
So that's fine.
If you really have a good
way of picking up the related
products, then that's a good
thing to cross-link like that.
JASON: All right, perfect.
Thank you.
AUDIENCE: Was there
any update on August 8?
JOHN MUELLER: I'm
sure there was.
I don't know.
When was August 8?
We do updates all the time.
So I'm sure there was
an update on August 8.
But I'm not sure exactly
what you'd referring to.
WILLIAM ROCK: Hey, John,.
I've got a question.
JOHN MUELLER: OK.
WILLIAM ROCK: And I know-- it
goes [INAUDIBLE] local rank
and what's happening
in the SERPs
and basically it goes with Yelp.
We've seen some weird
results [INAUDIBLE].
I'm looking at that as a
possible false positive.
But some of those reviews
that are coming up
are just low quality signals.
They've got multiple
local queried results.
They're basically pulling
up localized search queries,
which is good that
that's happening.
But there's also ways that
they created-- not these guys,
but other companies out there
with similar-- not really
[INAUDIBLE] pages but local
ranked pages that still
actually rank in the algorithm
today based off the techniques
that they've done.
I'd like to show you
that later down the road.
But I think I'm seeing something
interesting showing with
the switch of local-- not
just the Google+ by business,
but the actual [INAUDIBLE].
[ALERT TONE]
JOHN MUELLER: Yeah,
I'd probably need
to take a look at
examples there.
WILLIAM ROCK: Yeah--
JOHN MUELLER: It's
always tricky when
you're looking at
things like Yelp
and kind of other
local directories.
Because for some websites,
those could essentially
be their homepages.
And maybe they don't have a
big homepage of their own.
Or maybe the homepage that they
have is essentially just a PDF
or big image trial that doesn't
have any content that we can
index.
So to some extent,
it makes sense
to show some of those
local directories
because sometimes
those are kind of
like the homepages
for those business.
But--
WILLIAM ROCK: What
I want to show
you is more how
it's being spammed
for doctors, especially.
And those reports are actually
showing up in top of results
as a negative.
And basically what
Yelp has done is
they've actually told us that
we have to pay for those links
to be removed, even
though they're actually
fake names that are actually
bashing on a doctor.
So it doesn't make any sense.
One was an old employee.
And the other one
was-- you know.
And so it's easy to
actually manipulate Yelp.
And I'm looking at that as a
false positive [INAUDIBLE].
JOHN MUELLER: Yeah, I'd
probably look at that with them.
I don't know.
We don't really have
much influence on them.
But yeah, it's always
good to send examples.
So if you see something where
we're picking something up
incorrectly, where we're
ranking them badly,
so that we can bring that up
to the team and take a look
and see what we can
do to improve that.
JOSH BACHYNSKI: Hey, John.
Do you have time for an
entity search question?
JOHN MUELLER: A
really quick one.
JOSH BACHYNSKI: Oh, OK.
The quick entity
search question is
how is this is going to work
together with the HTTPS signal?
My concern is that I have
all these social signals
and stuff like that
for the HTTP version.
How are we going to coordinate
that with the new HTTPS
version?
Because essentially,
it's a domain move.
And there's all kinds of SEO
issues with domain moves,
as you know.
JOHN MUELLER: Domain moves are
a bit different because they're
really with
different host names.
What we've seen with most of the
social information like that--
at least Google+,
the +1 button--
they essentially transfer
completely from HTTP to HTTPS
if you have no redirects
set up, if they're set up
appropriately.
So that's something
that I imagine
will essentially just work.
It might take a
little bit of time
for things to move over
for the individual URLs,
kind of like when you'd
have with the move from www
to non-www or the
other way around.
But essentially, the
move from HTTP to HTTPS
is a lot less critical
than a domain move
and a lot less critical than
even a subdomain hostname
move from www to non-www.
So that's something that I
think works out fairly well.
That's not something
where there'd
need to be anything
special that you would need
to do on your side, or that we'd
need to significantly change
on our side.
JOSH BACHYNSKI: Thanks, John.
As always, I just want to
say that I think you're great
and these Hangouts
are some awesome.
Thanks very much.
JOHN MUELLER: Thanks, Josh.
All right.
One last question.
We're a bit over time.
AUDIENCE: Can I ask, John?
JOHN MUELLER: Sure, go for it.
AUDIENCE: Well, it's
the same question
I've had over the last
few months really,
on whether you've got
time to look at-- you said
we were waiting for an algorithm
update for our site, which
wasn't one of the major
algorithm updates.
But we've really seen no change,
even though you've previously
confirmed that everything
was now fine with the site.
There was no problems
with it other than waiting
for the algorithm to update.
But--
JOHN MUELLER: I'll check, yeah.
I'll check with the
guys on that again.
I don't know what
happened, like what
well has gone the
last few weeks.
But--
AUDIENCE: Yeah, it's been
a while since we spoke.
Actually, it was on the last
dedicated-- one of these
where there was this
10 minute slots, which
I'm sure you remember.
But we've since implemented
an hreflang to another site,
specifically for
the US, which seems
to have bolstered it a bit.
But the original is
still no where near what
it should be in our opinion.
It's still at least
60% down on last year.
Now we're just not
sure when-- I've just
stuck that into the
comments, the URL again--
when that's going to update.
JOHN MUELLER: OK.
I'll double check, yeah.
AUDIENCE: --algorithm.
And we won't find
anything in the news
because it's not-- as far
as you said last time,
it's not an algorithm that
anyone will report on.
JOHN MUELLER: Yeah, OK.
I'll double check with the team
on that, see what we can do.
AUDIENCE: OK.
All right.
JOHN MUELLER: OK.
AUDIENCE: Excellent.
JOHN MUELLER: So with that,
I'd like you all for your time.
Thanks for all the
good questions.
And I hope I'll see
you guys again in one
of the future Hangouts.
JOSH BACHYNSKI: Thanks, John.
JOSHUA BERG: Thanks, John.
Great show.
AUDIENCE: Thanks, John.
WILLIAM ROCK: I'll
see you, Josh.