Yandex Has Leaked The Source Code To The Public Eye

February 11, 2023

A former Yandex employee leaked the source code of the search engine and other services. This allows interesting insights into the inner workings of the search engine: ranking factors, weightings and more.

Yandex is the search engine market leader in Russia and fifth in the world in terms of page views. Although Yandex is not Google, the basic workings of search engines are comparable. The following findings do not necessarily apply directly to Google, but they do provide an interesting insight: An extensive list of 1,922 different ranking factors can be found in the source code. However, since 999 of these ranking factors are tagged TG_DEPRECATED, 242 with TG_UNUSED, 149 with TG_UNIMPLEMENTED and 115 with TG_REMOVED, there are still 417 active ranking factors left – a few more than the 200 or so that Google has assumed so far.

As Google has already confirmed, Yandex also uses different algorithms and weights depending on the search query. For example, a distinction is made by time: there are morning and evening weights (IND_FI_MORNING_QUERY), but of course there are also differences for adult entertainment (IND_FI_XPORNO_QUERY), commercial queries (IND_FI_QUERY_COMMERCIALITY_MX) and much more. An initial list of ranking factor weights (nav_linear.h) suggests that the most important ranking signals for Yandex can be found in these four areas:

Links: Like Google, Yandex uses a PageRank algorithm to rank the quality of links. Link text is important, as is the age of the link.
User signals: Google denies it, but Yandex’s source code clearly shows that user signals are a ranking factor. Values such as the CTR, time on site, bounce rate and number of visitors returning to the SERPs affect the ranking at Yandex.
Relevance ratings of the text content: The classic search engine is of course also included. Yandex mainly relies on BM25, a well-known approach from information retrieval. Other classics such as checking whether the keyword is contained in the URL can also be found.
Trust and quality: Like Google, Yandex sets higher quality requirements for sensitive topics such as health and financial content. There are 7 different ranking factors for medical topics alone (FI_MEDICAL*)

Many of the assumptions about Google ranking factors can be found in the Yandex source code. This is not a confirmation that Google uses them, but a good indication. Yandex generally rates content published on Wikipedia.org better than other content. Server errors (400/500 status codes) also have a negative effect on the ranking. As known from Google, Yandex also rates HTTPS encryption and speed positively.

All in all, the Yandex code leak offers a very interesting insight into the inner workings of a modern search engine. Although not all findings can be transferred directly to Google, many assumptions made in recent years about the general functioning of large Internet search engines have been confirmed. I expect the SEO industry to have a few interesting weeks ahead of new insights.