In January 2023, there was a significant data leak involving Yandex, a popular search engine in Russia. The leaked information included a document detailing 1,922 ranking factors used by the company. However, it has since been revealed that 244 of these factors have been categorized as “unused” and removed from consideration. Additionally, 988 of the ranking factors are listed as deprecated, meaning that 64% of the document is either not actively used or has been replaced. Some of the factors in the document also appear to be out of date, with some of the authors who created them no longer working at Yandex.
The leaked document confirms that Yandex uses a form of PageRank as a ranking factor, which is the first factor listed. The concept of “pessimization” is also highlighted in the document, which refers to when a website is penalized and its PageRank is reduced to zero. This aligns with the theory that recovery from penalties in Yandex is more difficult.
The document also includes information on user signals and clicks, as well as overall site performance impacting individual queries. The construction of URLs is also a ranking factor, with certain elements such as too many trailing slashes or the use of numbers seen as negatives. On the other hand, having a corresponding country or city identifier in the URL and having a semantic relation to the query are seen as positives.
Another significant aspect of the leak is that Yandex uses a technique called DSSM to predict the number of products on a webpage based on the URL and page title. Additionally, there is mention of page quality scores, with the host playing a role in determining the perceived quality of a page.
The leak also revealed that there are ranking factors related to “Your Money or Your Life” (YMYL) topics such as medical, financial, and legal information. Additionally, there were factors related to traffic and links from TikTok.
The document also showed that the reliability of a host and the number of URLs on a domain that respond with errors impact the quality of a webpage. Furthermore, the data from Yandex Metrika is also known to impact rankings.
The leak reinforces things that we already knew. Yandex is not Google, but either way, it’s interesting.