vendredi 4 novembre 2011

More Empirical Data

inurlblack-hat-download, ipopbb

Attached you’ll find my April 2010 Top N Trends charts. There are an analysis of the Top 100 google results for a large number of high traffic searches ranging from online retail to entertainment and news. I use charts like these to test the limits of “SEO” factors and to detect if something may or may not be an SEO factor.

I promise you’ll love this data and it will raise more questions than it will answer, but it will answer some too.

I will happily discuss individual charts in this thread and I hope to get new test ideas from you.

About my SEO Lab: I have a VPS Linux server that I don’t conduct business on. Its goal is to be a reputation buffer from my actual sites. I conduct all my SEO experiments on that server which I have dubbed the “SEO Lab”. I only move SEO tactics to my real sites when they have proven successful in the lab and suffered no attention or ill will from Google and others. Never experiment on your real sites… its just not worth the risk.

I use a grid of 6 “$5/month” PHP web hosting accounts as my anonymization network. They have PHP scripts for funneling google searches round robin. I delay 3 seconds but run 6 requests at a time so 0.5 seconds per 100 results. The grid ties into adwords API and AWS Alexa api and others for meta data. It is all queue based so I can feed a url or keyword into the queue and it becomes part of the “Machine” overall.

I hope you appreciate what was involved in getting this data. It’s my pride and joy.

April2010-A April2010-B April2010-C April2010-D April2010-E April2010-F April2010-G April2010-H April2010-I April2010-J


About the Charts

* Many charts appear twice. This is because the MAX value lines compress the MIN and AVG lines and there are trends you’ll probably want to see. The second chart when doubled removes the MAX value line.

* Some charts use percentages… it will appear in the chart title. Otherwise occurrences

* The trends go from Top 100 results across a butt load of searches to the #1 spot across a butt load of search. Thats how you should read it.

* The “Summary” charts refer to the summary text that google displays with each result in the SERPS.

* Bytes mean the number of characters in the property being tested.

* Matches means the number of search term matches in the property being tested.

I hope you enjoy!

PS BHW keeps quirking my first attachment so here it is

I understand much better now. Good question! I don’t test cases


Red = Max Value

Blue = Average

Green = Min Value

Baby Blue = Percentage


Q: Can you tell us what is URL bytes

A: It is how many chars are in the URL


Everyone keeps expecting me to charge… no angle here… People in this forum can’t possibly afford me. I took the sites I manage from $8M per year to $35M per year… I’ll let you guess what they pay me to keep me from jumping ship to amazon, nordstrom, eddie bauer, REI, etc… I am giving back. That’s my angle!


Q: Hm…I don’t understand double charts.

For example one with number of words and images ?

Is it better to have more or less words and images ?

A: Google used to display page size in the SERPs… 63KB 130KB… I noticed way back then the there was a sweet spot for page 1 between 60KB and 220KB… to small and was authoritative enough and too big and its more than a human being can likely sift through… Google still has a sweet spot, but I don’t know if they are looking at source size, or visible content size, or words, or sentences, or a balance between text and graphics… I just don’t know… so I measure.

These are just the measurements I made. They have no baring from one graph to the next… the image measurement just measures images and the words per page just measures words per page… the fact they appear together is just how I took the screenshot


Even more important is that the rules appear to change from page 10-3 and page 2-1… TWO GOOGLE Algorithms. Which do you tune for? I tune long tail pages for page 3 algorithm for best revenue effects and splash pages for page one for traffic gen. I don’t know why I’m the only one who sees it. Google’s best kept secret is that there isn’t 1 algorthm… there are 2!


Q: Can you define HTML Bytes and Visible Bytes? Is that per page or per domain?

A: I don’t analyze websites… just the result pages as they are linked from the SERPs. HTML Bytes is the document size in bytes of the result page. If you were to view source on a result page and count the characters you would measure what I labeled HTML bytes.

Visible Bytes = HTML bytes – HEAD block – SCRIPT blocks – COMMENT blocks – STYLE blocks – HTML tags

in concept… the maximum number of characters of text content you are capable of displaying from the result source. The number assumes pages don’t cloak, but they all do at least a little.


like that… My phrases are one and two word phrases. things like cars, games, diamonds, lawsuit, tennis shoes, gold watch, sterling silver, surround sound,etc… for it to be a match in my data it has to be an exact match including the whitespace. (I do normalize and collapse whitespace for pattern matching). My matches are case insensitive.

This post was made using the Auto Blogging Software from This line will not appear when posts are made after activating the software to full version.

Autogen Content Patterns

gamessgam, gamessgam-e-s-com

For those of us who like to autogen content. Hear are some secrets google appears to be hiding in plain site. My theory is that if it appears in their results then its safe to say it is what they put in their database.

(FYI… my sites tend to have millions of pages and usually index around: Results 1 – 100 of about 259,000 so these patterns mean a lot to me)

I’ll Start With The Conclusion:

There is no hard fast rule, but these appear to be the content details that Google favors when a keyword is “over saturated” like “games” or “cars”

* Title should be no longer than 68 characters with maximum of 5 keyword repeats

* Summary Text should be no longer than 155 characters with a maximum of 8 keyword repeats

* URL should be 14-75 characters with a maximum of 2 keyword repeats

* Your cached pages should be no older than 2 – 7 Days … less is better

* Number of Similar pages doesn’t appear to matter

* Your domain name should be about 14 characters including the top level domain

* You document size ,no images or includes, should be about 48K and probably shouldn’t exceed 100K if possible

* More results doesn’t appear to effect placement. But 539-2,300,000 pages was the Top 100 range of pages for that feature

Some Interesting Limits From the Top 100 Google Results For Searches Like Cars & Games

Title Sizes 4-68 characters

* LOW – cars

* HIGH – Enterprise Rent-A-Car: Rental Cars at Low Rates and Weekend Specials

Note: _… at the end is subtracted because it adds no value

Summary Text 0-155 characters

* LOW -

* HIGH – Research Luxury cars at; get free new Luxury car prices, invoice pricing, MSRP, dealer cost, and more, buy a new Luxury car, or read reviews

Note: _… at the end is subtracted because it adds no value

Keyword Repetition in Summary 0-8 Times

* LOW -

* HIGH – Mara pets games has free games and free flash games, such as arcade games, action games, puzzle games, sports games, online games and more.

Note: Technicality 9 if you count game and games as same – Free game downloads & online games at Big Fish Games – A new game every day! PC games & mac games – Play puzzle games, arcade games, mahjong games, …

Keyword Repetition in Title 0-5 Times

* LOW -

* HIGH – Games – Play Free Games, Online Games, Flash Games and Arcade Games

Keyword Repetition in URL 0-2 Times

* LOW -

* HIGH –*es.

Document Size 1-224 KB 48KB Average In the Top Ten and < 5 in top 100 are 100+KB

* LOW – aplu*** – 1k

* HIGH – amazo** – 224k

Number of Similar Pages 0-30 8 out of top ten were 27 or higher

* LOW –**m/search?hl=en&safe=off&client=safari&rls=en-us&q=related:dmoz.o***rg/Games/

* HIGH – google***.com/s***earch?hl=en&safe=off&client=safari&rls=en-us&

Cache Date Not Cached or Feb 22-27 …Top Ten Feb 26-27 Bottom Ten Feb 22-24

* LOW –*****ri

* HIGH –***

Note: Generated this data on Feb 29

Number of “More results” when present 539-2,300,000 pages

* LOW – More results from » Results 1 – 100 of about 539 from ap*** for cars.

* HIGH – More results from ca** » Results 1 – 100 of about 2,330,000 from for cars.

Size of URL 14-75 characters

* LOW – gam***

* HIGH – amaz***

Size of Full Domain Name (word.word) 6-23 characters Avg 14

* LOW –

* HIGH –

Size of Primary Domain Word (word).com 2-19 characters Avg 10

* LOW –

* HIGH –

This post was made using the Auto Blogging Software from This line will not appear when posts are made after activating the software to full version.

How to Test Google Rank Equation

Test Google Rank Equation

I created a website with the purpose of reverse engineering the “Content Score” portion of Google’s rank equation. The idea is that each page on the site would test the use of a keyword (that no one else on the planet would use) in a single test. Here is an example of a test:

Source for Code 14: Keywords in blockquote tags



I created pages like this for every HTML tag and attribute and other qualities like domain, sub-domain, document, url params, etc…

The idea is that I type my GloballyUnusedKeyword into Google and see how google ranks the single test pages to find out what is most important. As a control I did this twice… two GloballyUnusedKeywords. The results were the same in the two tests which suggests that the sort order is not random.

The 2 websites were submitted on different weeks via Google’s addURL page. No other SEO was performed. No inbound links. This is just a test for the different ways to use a keyword in content. The results surprised me.

I hope you enjoy these findings as much as I did and please offer suggestions and ideas for more or better empirical methods and experiments. I plan to keep my keywords a secret because I dont want my work to get polluted by conversations.

The top 12

1. Code 14: Keywords in blockquote tags

2. Code 17: Keywords in address tags

3. Code 7: Keywords in underline tags

4. Code 46: Keywords in strike tags

5. Code 11: Keywords in p tags

6. Code index: A page that links to all tests

7. Code 56: Keywords in a tag link text

8. Code 53: Keywords in img alt text

9. Code 37: Keywords in meta description

10. Code 52: Keywords in HTML comments

11. Code 39: Keywords in meta author

12. Code 38: Keywords in meta keywords

What startled me was that Title, domain, headings (H1-6 tags), document name weren’t anywhere near the top of the list. That is totally contrary to just about all WH advice out there.

What also startled me was the placement of the index page in the results. The index page contains no references of the unique keyword. Domain and url keyword tests were submitted separately. So apparently link to pages that mention the keyword is the 6th best thing you can do with your content and is more important than on page link text, meta tags, and alt text.

I only provided the top 12 because they haven’t changed ranking and appear to have “settled” into there spots. 13+ are still seeing movement daily. If and when they appear to settle I’ll post an updated list.

This experiment is less than one month old. I’d imagine it is possible for things to change over time… If and when they do I’ll update

This post was made using the Auto Blogging Software from This line will not appear when posts are made after activating the software to full version.

CSS-ONLY SEO Layout Technique

backgroundcolorffd, blackhat-css-positioning, bodyheaderfooter, colorenundivclass, contentblocksheadbodyfooter, css-image-logo-position, css-layout-seo-black-hat, cssdivtopofpage, div-class-note-seo, headbodyfooter, htmllayoutheaderfooterheight100, marketing-notes-layout, order-div-seo-content, positionabsolute, scrapbox-black-hat-download, seo-div-order, seo-layout-position-absolute, seo-position-layout, topheaderleftfooter, websitebottom0css.

Notes: Use CSS to control each part of the page, then you can make order of the each part of the static page(HTML code) random.

Many people believe “on page” position matters. I’m not a believer , but I do like to randomize content order to make google think it is different or in many cases unique. I’m not going to go into detail on my “Spoofing Uniqueness” techniques in this post, but I will give you a small piece of the puzzle. Making content order in the source arbitrary.

The only requirement to my CSS only method is that you have to be ok with using a left-justified fixed-width layout.






So if you believe “on page” position matters then this is a CSS-only way to make your footer links top of page. Or if you want to read between the lines. This is a way to randomize the order of content blocks in your static source but yield the same natural order in the rendered view. (if you look at my source you’ll see that I could put footer,content,header,leftnav,rightnav, or any number of other blocks in any order in the source and still render the same page view.)

Many believe that this gets your best tuned content to the top of page where it matters most. (I don’t know about that but if you thinks so then…)

Many believe that this gets your footer links to the top of page where it matters most. (I don’t know about that but if you thinks so then…)

I do believe this can help you reuse and freshen content blocks on pages to keep getting fresh content bonuses out of google without changing anything but some div orderings.

This post was made using the Auto Blogging Software from This line will not appear when posts are made after activating the software to full version.