Recently, I followed a twitter link that landed me here: k-lytics.com which page is a tutorial for authors about how to understand Amazon links and lessen the risks of review removal. Although the main take away, which is it’s better to use a link that contains only the ASIN, the tutorial is wrong on just about every technical fact.
It’s hard to know where to start when everything is so . . . wrong.
Note: I’ve gone in and clarified when I’ve realized that I wasn’t specific enough or I used language than means something different to a developer or database admin than it likely does to a person who isn’t either of those things.
Amazon URLs don’t identify the person who did the search so Amazon is not using incoming links as a criteria for review removal. The value of qid= in a URL does not assist in distinguishing the user account. While there are reasons to use a “clean” Amazon URL in your links, identification of your Amazon account as the link source is not one of them.
Before I continue, and on the off chance that someone who does not know my background reads this post, here’s a statement of my technical credentials:
I am a former web developer. I have worked in dev-ops. (Technically, I think I still do, but for a much smaller company without scheduled product release cycles.) I am a SQL Server DBA and data architect. It has been my job to design and maintain the database back-end for commercial, enterprise web applications. I have attended daily meetings with software architects and developers where my responsibility was to head off boneheaded code and bad database designs or to design such structures for them. My current job is with a much smaller company, but the skill set is still required.
When someone starts talking about interpreting URLS and particularly about databases, this is squarely in my technical expertise. Especially the database stuff.
The Actual Problem
Amazon has identified relationships between the poster of a review and the creator/seller of the product as reasons they will remove reviews. The exact words (see quote infra) are “perceived to have a close personal relationship” or “a direct or indirect financial interest.”
In order to establish these things, Amazon has to connect a given Amazon account with one or more external accounts. More on that later.
And so, you might think, of course a link on a third party site is an external thing that might create the appearance of a relationship. But link clicks would be a remarkably inefficient way of deriving that information.
A link, sitting on your website, or facebook, or twitter, or pinterest, gets clicked on by someone and that someone ends up at Amazon and they buy your book. This is not behavior that Amazon, or anyone who sells stuff on the web wants to discourage.
We know that Amazon has removed reviews from readers who love an author and post reviews of every book that author writes. That is because they have identified that the reader has done something such as like the author on Facebook, and that there is, therefore, an outside, personal, relationship between the reader and the author. Facebook, as do other social media companies, provides a wealth of information about who likes what/whom. That information is pretty easy to find out. The contents of a link URL are irrelevant to that determination.
That relationship is NOT contained in the URL.
If you have an Amazon link on your Facebook page and someone clicks on it and buys your book, the smoking gun isn’t the URL string of the link. It’s the referrer information that tells amazon that the click came from your Facebook page or profile (And if it’s from your profile rather than your author page, then you are likely asking for a false positive).
An Amazon URL is a Dumb and Inefficient Way to Infer Relationships
Amazon is unlikely to be using URL strings from incoming third parties (your website, facebook etc.) to figure out which reviews are suspect. They surely are interested in incoming links, but not as implied in that article. Parsing URLS for such information would be a strange and inefficient way to get that information, especially when third parties make it easy to mine far more relevant data.
Personally, my theory is that authors who are using the same email address for Amazon, writing- related social media activities, and their personal lives, are more likely to run into problems with Amazon incorrectly deriving personal relationships where none exist. I suspect that logging in all over the place with a Facebook account only exacerbates the issue. Using the same email address at Amazon, Facebook, Twitter, etc, or filling in alternate email contacts at those sites where that alternate email is the same as your Amazon email account are going to make it really easy for Amazon to find connections and come up with real, or incorrect, derivations of actual problematic relationships. Especially if you haven’t locked down your privacy settings.
The Technical Problems with the Analysis
Right. So, the claim is that a link containing stuff besides the ASIN is sufficient to invoke a review removal.
Amazon knows if a reviewer bought the item they’re reviewing. And they surely know what events led to the reviewer’s “buy this” click. If it’s a link from a third party site, then they have whatever information is in the referring link. However, that link does not contain the account information of the person who copied and pasted the link.
The tutorial implies that a qid value, should it exist, is sufficient to identify the account that created the link. That is false.
The qid value is an Epoch timestamp — the number of seconds since January 1, 1970. This value is precise only to the second. That right there tells you there’s a huge problem with the analysis in the tutorial.
The idea that the qid value provides enough information to identify the account that made the search is just … wrong.
The addition of the qid value does not and cannot guarantee uniqueness of the string. It is possible for two people make the exact same search at the exact same second and click on the top result at the same second. In such a case those URLs will be identical.
How a DBA Gets Fired
Uniqueness is a key component of database design. If the data architect gets this wrong because they fail to account for the possibility of collisions where two objects cannot be distinguished from each other, they’re going to be out of a job.
Unique Snowflakes MUST Exist
It is actually impossible to have uniqueness on a timestamp that is precise only to the second. I imagine many people unfamiliar with such concepts think that precision to the second is pretty darn precise. But in this context, it is not. It’s also not precise enough for things like the Olympics, by the way.
When the ability to uniquely identify something is required, you don’t choose imprecise values to achieve that.
Frankly, this is a dumb discussion. If you want to track search queries by account this isn’t how you do it.
The qid does have a useful purpose, but it’s not identifying the user who made the search.
Let me remind you that when Amazon needs to know what user account referred an incoming link, they don’t say, “No worries, we have that in every URL!” What they say is, sign up for an associates account so we can give you a uniquely identifying string that tells us the link came from you.
The tutorial goes on to state that the number of times the link was clicked on provides evidence of author manipulation. No. Mere clicks on a link are evidence of popularity of the content and the popularity of the author. If number of clicks alone was evidence of manipulation then popular authors would disproportionately suffer from such a system. Further, if that were true, then no author should ever use associates links.
Additional information is needed in order to infer manipulation and that information is not in an Amazon URL.
I think it’s pretty ridiculous to think that Amazon would take punitive actions based on data that does not identify the account that made the link. The implication that it’s the qid portion of the URL, is, in a word, bullshit.
Here’s what Amazon says about its policy (found here):
Authors and artists can add a unique perspective and we very much welcome their customer reviews. While we encourage reviewers to share their enthusiasm and experience, there can be a fine line between that and the use of customer reviews as product promotion. We don’t allow anyone to write customer reviews as a form of promotion and if we find evidence that a customer was paid for a review, we’ll remove it. If you have a direct or indirect financial interest in a product, or perceived to have a close personal relationship with its author or artist, we’ll likely remove your review. We don’t allow authors to submit customer reviews on their own books even when they disclose their identity.
And here’s a few of the items that prompt removal:
- A product manufacturer posts a review of their own product, posing as an unbiased shopper
- A customer posts a review in exchange for $5
- A family member of the product creator posts a five-star customer review to help boost sales
- An artist posts a positive review on a peer’s album in exchange for receiving a positive review from them
For that last one, substitute “author” for “artist” and “book” for “album.”
There’s very, very little in any Amazon URL that provides any of that information.
It’s not the purchase that is suspect. Amazon knows who bought what. Amazon is saying there is a non-commercial, personal relationship between the poster of a review and the author. The URL doesn’t provide a smoking gun of “These people are buddies outside this commercial transaction!”
What people suspect Amazon is doing for the purposes of determining those relationships is examining things like connections between Amazon accounts (Kindle sharing, mailing addresses, etc) or links between Amazon email addresses and possibly IP address that indicate that one person is posting under multiple identities. They’re also believed to be looking at other social media accounts, including Facebook and places where unwise authors might obtain insincere reviews, such as Fiverr, including taking legal action against those services. Gifting a book to a reader is something that appears to trigger an issue with a subsequent review.
Even More Problems
If you listen to that tutorial, you’ll come away thinking several incorrect things.
The tutorial implies that the qid, which is a Unix Epoch timestamp (the number of seconds since January 1, 1970) is a unique identifier. This is so false I immediately lost track of the tutorial because I was all wha??? (No worries! I listened three times to get their statements straight.) It manages to also imply that the qid somehow identifies the user making the query. That is also false.
It makes a big deal of demonstrating that a qid value changes over time. Um, doh?
Wrong about Short Links, Too
Then the tutorial talks about short links and it implies that using a short link will strip the identifying data from a copied Amazon URL. That, too, is false. Whatever is contained in the source URL that you paste into your short link destination will be used to resolve the destination of the click.
So, suppose you use bit.ly/mybook as the link you post at FB.
When someone clicks on your FB bit.ly link here is what happens:
The user goes along for the ride to bit.ly where bit.ly looks up the destination you gave it for bit.ly/mybook (this happens really quickly. The user is unlikely, but only unlikely, to notice the millisecond or so that they’re at bit.ly.)
Bit.ly sends the user to the destination you copied and pasted from Amazon. The ENTIRE URL you copied and pasted. Including any applicable qid or other search string.
Lastly, the tutorial completely omits consideration of the use of Amazon associates links. If it’s true that Amazon is using information from incoming third party links to figure out whose reviews to remove, then authors should NEVER use associate links. An associates link actually DOES identify the source of the user account that made the link. But that’s an absurd result. Amazon wants people to use their associates links.
Precision to Websites and Databases
Amazon processes millions of transactions and there are, guaranteed, many many queries that occur in the exact same second. Database systems that need to know which transaction to commit first are looking at milliseconds and nanoseconds. Therefore, a timestamp that is precise only to the second is inadequate for the identification of separate transactions. An Epoch timestamp might uniquify, but it cannot uniquely identify. And, even if it were used to add some value to a search string to make it unique, an imprecise value like that would not guarantee there would not be a collision.
Here’s what the timestamp can efficiently do: create an easy, lightweight way to compare the start time of the product search result to actions taken later. So you know something like, how long it took the user to click buy. It’s easy and lightweight because all you have to do is some arithmetic like subtract one epoch value from another.
Why You’d Want a Clean URL
Long URLS are subject to errors that break the link. Certain characters, such as spaces and ampersands, may need to be encoded so the URL is correctly parsed. You might not get the entire URL. It’s a lot of work. It’s easier to read your html and other analytics.
But it’s not because Amazon is using a qid to identify the person who created the link.