It is not news that machine translated websites are penalized by search engines. Google has developed its technologies on the back of reliable bilingual website crawling and freely available public data. After ditching rule-based engines (Systran) back in 2006, it embarked on a mission to use statistical machine translation (SMT) as a byproduct of its own data analysis. Websites that use machine translation to inform users are crawled and aligned, but those alignments provide data that adds dirt (read: uncertainty) which worsens the probabilities and hence the output (read: the translation). That is why Google Translate and website translation can’t marry.
A machine translated website will be penalized by Google, for it is dirty. It is also a proof of laziness on the part of those responsible. The search giant wants to analyze natural, human data. We recently bumped into an article on Slator.com that got our feathers all aflutter. In short, it proved the above point, which has been a known issue to translation companies and those offering proxy translation, often with the economical machine translation option.
Nowadays, even e-commerce sites (see Magento help section on multilingual) do not recommend using machine translation for professional results and better ranking. It may sound ironical, but Search engines (read: Google) will penalize websites using Google Translate for their multilingual website. lentrax has been a die hard advocate for quality website localization, developing Cor as a crawling and translation assistance technology that does not interfere with any of the code nor it provides machine translated output to Google’s algorithms. It checks your content, extracts the text and sends it out for translation. Whenever we hear a website or company will use raw proxy translation or simply Google Translate, we feel so sad. It is business lost, it is the cost of time wasted, wasted investment, having to face the wrong option was chosen some time ago, lose credibility… lose business and customers when the intention was to win.
Google’s violation guidelines
Google clearly bans automatically generated content (in order to avoid black hat SEO and similar techniques), including “Text translated by an automated tool without human review or curation before publishing”. Look for it in its violation guidelines. Thus, raw machine translation, unnatural results (and it is not difficult to detect a text has been produced by software) will bury your website deep in a web of penalization. This kind of careless publication is viewed as spam or, worse still, copied or duplicated content. You will find it hard to make up for it, unless you are prepared to probably do well what should have been done well in the first place. Follow this link to learn more about the dangers of duplicate content.
But Pangeanic develops machine translation technologies, doesn’t it?
Yes we do. We are a well-known developer of machine translation technologies and language technologies. We use them in order to automate processes and it is particularly useful in controlled language situations, like instruction manuals and documentation for the automotive industry. It is extremely useful for gisting, to get a quick idea of what a text in a foreign language says at light speed. It also helps translators in certain situations to pre-translate and post-edit the content, which always needs a final verification in order to ensure to flows as natural language. If your website is rather big (a large e-commerce site, for example, can contain tens of millions of words) and you decide to translate sections of your website using raw MT, there is quality option to consider. We can offer human and machine translation engines that are trained with your previous translations (aligned as reliable “translation memories”) which will speak your language in your style and will contain your terminology, specific to your products, services and industry. Creating engines with your own data, or customizing our own engines with your data and terminology will create better quality translations than general, online machine translation tools. Our expert translators can post-edit the content to make sure it conveys the message as it should.
Your website MUST PROVIDE VALUE
This is surely one of the most difficult things to do, but it is extremely important to search engines. Your content must be informative and engaging. Bouce rates are an indication of how visitors interact with your site, but a high bounce rate may not necessarily be an indication of a bad website. Some of your pages may offer the information the visitor was looking for. The visitor leaves without interacting because he /she found the information. Check this informative post by Yoast on why a high bounce rate is not necessarily a bad thing for your website. Maybe the person spent a minute, two, three or more reading it. A machine translated website simply does not offer the quality content nor the value website visitors want.
Multilingual SEO strategy
Keywords cannot be machine translated, people search for different things in different places.
A simple keyword like “sneakers” can serve as an example (follow this article for a list of top ten disagreements between US and British English). It is widely used in the US, although more profusely in some areas than others. British English uses “trainers” (from “training shoes”. People looking for this kind of garment will not land on your page if you are using a different keyword – and so it happens with languages. Machine translated keywords just won’t work in other languages. Pangeanic solves this challenge by specialist translators with a flare of marketing and aware of such issues. They use our website analysis and SEO tools (SEMRush, Google AdWords, etc.) in order to check the popular options in each country/region so you can make an informed decision about how to market your products from your website, and not use a general or direct translation.