JavaScript Request & PinYin Romanization of Mandarin Chinese

679 views
Skip to first unread message

Ralph Torello

unread,
Jan 28, 2016, 12:29:57 PM1/28/16
to Google Translate API Developer Forum
Hello,

I'm Ralph, and I am writing a (pretty short) web-site where I translate Chinese News Articles and generate simple Vocabulary Lists for prospective students and people here in Dallas Texas who are interested in language exchange. My interpretation of Google Translate (the web-site) is that it's a good tool for somebody who wants to improve their vocabulary, but as an "A.I." (Artificial Intelligence) translation tool, it's pretty terrible. I use it to look up words all the time, but there really isn't any software that generate good English for people to read.

I am using simple JavaScript to request the "Google Cloud Server Translate API" to obtain "Simplified Chinese - which is zh-CN, and also zh-TW (Traditional Chinese Characters) ... and, of course, "en" (English).

Now, the Google Translate Web-Site (http://translate.google.com") has an additional window that can produce "The PinYin Romanization" of Mandarin - which is simply a windows that helps people pronounce the words using a standard & well-known system for converting Chinese to the English Alphabet.

I just want to ask: Is there a way to Make an API call to Google Translate's "Cloud Server" API Toolkit to obtain these pronunciation words?

I have researched "StackOverflow," and this Google Groups, and found several *very old* posts regarding this subject (old as in Year 2011) - and found nothing helpful. If anyone can help, I would appreciate it, but please, don't send me any insulting, threatening or otherwise pejorative answers. I amn a programmer, and I make mistakes - and I have no malicious intent, other than translating Chinese News Articles for people near where I live.

I have read that "Scrapping the actual Google Translate Website" can work, but the answer/solution that I obtained from "Stack Overflow" didn't work at all... It used an XMLRequest and HttpGet Request, but it failed, and I couldn't get the HTML.

What I actually wanted to hear an answer to was - Is it possible to just "scrape" the HTML & PinYin Romanization off of the "translate.google.com" web-site in a simple way? As usual I was immediately insulted by Google and told that I was "probably a robot" trying to code something malicious. So, I guess, the final version of my question is:

"Since it does not look like the Google Translate API Toolkit DOES NOT provide a function call for getting romanization of PinYin, is there a legal/legitmate/well-accepted way using simple JavaScript to Scrape the HTML/PinYin by making an HttpGet request to something like...

https://translate.google.com/m/translate#zh-CN/en/你好吗?

If you can provide some sample JavaScript that scrapes the words: "Ni hao ma?" from the web-site output, I would appreciate it. If this is something that Google does not consider "legal" or "compliant" with it's policies, please, PROFESSIONALLY LET ME KNOW THE ANSWER.

P.S. Please do not post any redirects to other questions that don't answer the question - is this legal/legit? I have already scoured a lot of this forum & StackOverflow - only to realize, that, generally, the answer is - Google Cloud Translate API doesn't provide a means to obtain the pronunciation of Mandarin Chinese. I have given my BANK ACCOUNT INFORMATION to Google to pay for this service.

Ralph Torello

unread,
Jan 28, 2016, 2:48:15 PM1/28/16
to Google Translate API Developer Forum
<HTML>
<HEAD>
<SCRIPT>

// *******************************************************************************
// Copied directly from "Stack Overflow" (DOESN'T WORK!)
// *******************************************************************************

function httpGetAsync(theUrl)
{
    var xmlHttp = new XMLHttpRequest();
    xmlHttp.onreadystatechange = function() { 
        if (xmlHttp.readyState == 4 && xmlHttp.status == 200)
            printIt(xmlHttp.responseText);
    }
    xmlHttp.open("GET", theUrl, true); // true for asynchronous 
    xmlHttp.send(null);
}


// *******************************************************************************
// Currently, this function just functions calls the "http get" function
// *******************************************************************************
function getIt()
{
  // var testHTML = "<HTML>\n<HEAD></HEAD>\n<BODY>\n<H1>Test this HTML</H1>\n<HR>\n</BODY>\n</HTML>";
  // printIt(testHTML);

  // *****************************************************************************
  // WOULD REPLACE THE FOLLOWING URL WITH 
  // (as below)
  // var theUrl = "https://translate.google.com/#zh-CN/en//你好嗎?"
  // *****************************************************************************
  // This is just a temporary-place-holder URL
  // *****************************************************************************

  document.getElementById("print1").innerHTML = "Querrying: " + theUrl;
  document.getElementById("rt").value = "Querrying: " + theUrl;

  httpGetAsync(theUrl);

  // *****************************************************************************
  // NOTE: What I'd *really* like answered is whether or not this is "against
  //            the rules" - since, ostensibly, Google Corporation would prefer
  //            to have automated queries of it's translation software to be
  //            bill to *somebody's* bank account (namely mine)...  The only
  //            problem to that logic is that the "Google Translate Cloud Server
  //            API DOESN'T PROVIDE A FUNCTION to generate 羅馬拼音 (PinYin
  //            Romanization)... And it's actually pretty important to students and
  //            people interested in CHINESE/FOREIGN LANGUAGE EXCHANGE.
  //
  // AGAIN: If you can't answer this question, please don't send me insults
  //             about my character... An explanation for why, particularly with
  //             a link to the Google Policy Rule about this situation would be
  //             *much more useful* than negative accusations. (which I have received).
  // ******************************************************************************
}

// ******************************************************************************
// The goal of "printIt() would ultimately be to "seek & copy" the 羅馬拼音
// .. which just means Chinese Romanization 
// from the HTTPS://translate.google.com/?q= request
// Right now, this function is simply behaving like a primitive "web-browser"
// ******************************************************************************

function printIt(str)
{
  while (str.search("<") != -1) str = str.replace("<", "&lt;");
  while (str.search(">") != -1) str = str.replace(">", "&gt;");
  
  var arr = str.split("\n");
  var outStr = ""; // "HI,HI,HI\n<BR>Length of Arr is: " + arr.length;
  for (var j=0; j < arr.length; j++)
  {
    outStr += "<B>Line #" + j + "</B>" + arr[j] + "<BR>\n";
  }
  document.getElementById("print1").innerHTML = outStr;
  document.getElementById("rt").value = outStr;
}

</SCRIPT>
</HEAD>

<BODY>

<H1>Hi, This is a test of HTTP Get Requests</H1>
<BR><HR><BR>

<BUTTON onClick="getIt()">HTTP Get</BUTTON>
<BR><HR><BR>

<H2>Below is the received & parsed HTML from the web-server</H2>
<BR><HR><BR>

<DIV ID="print1">Empty Right Now</DIV>
<BR><HR><BR>

<H2>Here is a text-field that has the same text as above</H2>
<BR><HR><BR>
<INPUT TYPE="TEXT" ID="rt" VALUE="Empty Right Now.">

</BODY>
</HTML>

Zeehad (Cloud Platform Support)

unread,
Feb 5, 2016, 5:17:56 PM2/5/16
to Google Translate API Developer Forum
Hello Ralph,

If you have not already done so, I invite you to review the Google APIs Terms of Service to ensure that your project's activity is consistent with them.

I find the following section relevant to your query:
 
e. Prohibitions on Content Unless expressly permitted by the content owner or by applicable law, you will not, and will not permit your end users or others acting on your behalf to, do the following with content returned from the APIs:
 
1. Scrape, build databases, or otherwise create permanent copies of such content, or keep cached copies longer than permitted by the cache header;
2. Copy, translate, modify, create a derivative work of, sell, lease, lend, convey, distribute, publicly display, or sublicense to any third party;
3. Misrepresent the source or ownership; or
4. Remove, obscure, or alter any copyright, trademark, or other proprietary rights notices; or falsify or delete any author attributions, legal notices, or other labels of the origin or source of material.

Please note that Google technical staff are not able to provide legal advice on whether a given activity is consistent with the Terms of Service. If you are uncertain whether your activities are prohibited by the Terms of Service, I recommend that you consult your legal advisors who can advise you.

I hope it helps. Cheers!

Ralph Torello

unread,
Feb 8, 2016, 9:53:56 AM2/8/16
to Google Translate API Developer Forum
I'm sorry, but this reply sort of demonstrates a lack of understanding about my project goals. The website is (now starting)

http://ChineseNewsBoard.com

It is a news database of articles (theoretically) from Asia. My project has nothing to do republishing copyrighted Lady Gaga songs or passing off "Dan Brown" books as my own. News articles can not be copyrighted and protected - that's really the first amendment of the constitution.

Furthermore, as I explained at least twice in my question - the "data" that I might "scrape" from a website is the Chinese Pronunciation from the Translate.Google.com user interface. I recently provided my bank account routing information to Google Cloud Server to use their Cloud Translate API. It works very well - but there aren't any procedure calls available in the API to get ... what I have mentioned now for the fourth time ... the pronunciation / Romanization for Mandarin Chinese...

My question is about whether or not it's possible to use HTTP GET requests to translate.google.com (directly) to "scrape off" the pronunciation part of the HTML.

Let me know...

Jesse Scherer (Google Cloud Support)

unread,
Feb 8, 2016, 11:48:21 AM2/8/16
to Google Translate API Developer Forum
Hi Ralph,

TLDR: it doesn't sound like Google Translate is right for your project.

The Google Translate API provides text translations. As you correctly point out, the API does not provide any endpoint for pronunciation. So first: as you have learned, you cannot use the Google Translate API to get pronunciations.

Now, what you're asking is whether you can scrape the Google Translate website to get pronunciations. I'll try to address all of your concerns:
- The Google Translate API is not the Google Translate website. Paying for the former does not get you special privileges on the latter.
- Older StackOverflow threads which discuss scraping the Translate site to get pronunciations or MP3s may happen to work, but this functionality is not part of any paid product, so is not guaranteed to work in the future.
- The Translate website is a consumer product. This means it is intended to be used directly and interactively. Regardless of whether it is possible to use an HTTP GET request to retrieve one pronunciation, doing so repeatedly (to build a website) will trigger a CAPTCHA -- one of those sorts of pop-ups which challenge you to demonstrate you are not a robot. If "you" in this case is a script building your web site, then it will fail because that script is a robot. If "you" is Mr. Ralph Torello, at his computer, painstakingly collecting translations, I guess that won't be a problem.

Given that your project doesn't involve translation, but instead pronunciation, it looks like Forvo might be better suited (disclaimer: I have never used Forvo). (Older) Stack Overflow threads bemoan the lack of commercial licensing for that product, but these days they appear to offer just that.

I hope this helps.

Ralph Torello

unread,
Feb 8, 2016, 12:36:10 PM2/8/16
to Google Translate API Developer Forum
Never TLDR people....  You kind of embarrass everyone...

I am using the Google Cloud Server Translate API to do dictionary lookups for Chinese words.  It's actually really good - and my JavaScript is working fine - but now that I have 1and1.com working, I'm going to move it to the server using PHP.

What my question is - if you want the "one sentence sound byte" version of my question - Since in addition to the translation of Chinese Characters in to English, I would also like to add to my vocabulary tables - the PinYin Romanization - I am looking for a way to possibly scrape that data off of the Google Translate Website

The "extra part" (Part B of my question) is... "is that legal according to Google Policy?"  [Perhaps it might be obvious that Google would expect it's software writers to pay for automated lookups of it's data.. which I'm willing to do...it's just that the API doesn't do everything that the web-front-end can do.]

I will be researching Forvo now... Thanks for that pointer... I'll post if it works out! 

P.S.  The good thing about using the Google Cloud Server Translate API - is that the dictionaries are kept up-to-date with daily news happenings... So for instance... If I were to have used a static-dictionary to look up "習近平" (Premier of China) ... Google Translate's on-line dictionary knows about the "election in China" almost immediately.  I would have to manually update a static dictionary every time a new popular phrase comes up.

Thanks in advance for the Forvo pointer... I'll check it.

Here's a sample of what I'm generating.  And note Columns 1, 3, & 4 (And eventually 5 & 6 for Spanish & Korean) work well with the API... Column 2 - however - can only come from the website.


 

Ralph Torello

unread,
Feb 8, 2016, 12:39:26 PM2/8/16
to Google Translate API Developer Forum
P.S. ... Don't ever say "StackOverflow" like those people are anything but the meanest and uncommunicative people on the internet.  I think they need to institute mandatory literacy tests for the "super users" on that thing.

Ken

unread,
Feb 8, 2016, 2:59:57 PM2/8/16
to Google Translate API Developer Forum
Please write this English text in Pinyin and Mandarian Chinese.

A phonetic (phonemic) alphabet is the only competent alphabet in the world. It can spell and correctly pronounce any word in our language -Mark Twain  

A fa̩netik (fa̩nīmik) ălfa̩bet iz dhī onlī kāmpa̩ta̩nt ălfa̩bet in dha̩ wa̩rld. It ka̩n spel a̩nd ka̩rektlī pra̩năuns enī wa̩rd in ār lăngvij. - Mārk Twein......Roman


અ ફનેટિક્ (ફનીમિક્) ઍલ્ફબેટ્ ઇઝ્ ધી ઓન્લી કામ્પટન્ટ્ ઍલ્ફબેટ્ ઇન્ ધ વર્લ્ડ્. ઇટ્ કન્ સ્પેલ્ અન્ડ્  કરેક્ટલી  પ્રનૅઉન્સ્ એની વર્ડ્ ઇન્ આર્ લૅન્ગ્વિજ્. -માર્ક્ ટ્વેઇન્ ....Gujanaagari

अ फनेटिक् (फनीमिक्) ऍल्फबेट् इझ् धी ओन्ली काम्पटन्ट् ऍल्फबेट् इन् ध वर्ल्ड्. इट् कन् स्पेल् अन्ड्  करेक्टली  प्रनॅउन्स् एनी वर्ड् इन् आर् लॅन्ग्विज्. -मार्क् ट्वेइन्........Devanaagari

ə fəˈnetɪk (fəˈniːmɪk) ˈælfəˌbet ɪz ðiː ˈoʊnliː ˈkɑmpətənt ˈælfəˌbet ˈɪn ðə ˈwərld. ˈɪt kən ˈspel ənd kəˈrekliː prəˈnæʊns ˈeniː ˈwərd ˈɪn ɑr ˈlæŋgwɪdʒ. -ˈmɑrk ˈtweɪn....IPA

Ralph Torello

unread,
Feb 8, 2016, 5:57:03 PM2/8/16
to Google Translate API Developer Forum




Ken

unread,
Feb 9, 2016, 1:43:02 PM2/9/16
to Google Translate API Developer Forum
Ralph,

I meant to say that write this paragraph shown in IPA notations in Pinyin and in Mandarian Chinese.
No translation please. 

A phonetic (phonemic) alphabet is the only competent alphabet in the world. It can spell and correctly pronounce any word in our language -Mark Twain

Ralph Torello

unread,
Feb 9, 2016, 2:26:53 PM2/9/16
to Google Translate API Developer Forum
I really sort of have an actual question.  Why are you acting like a child?  Leave me alone....  Trust me... I know nobody in USA gives a care about China or the Mandarin Language.

Ken

unread,
Feb 11, 2016, 3:24:14 PM2/11/16
to Google Translate API Developer Forum
US may care when China write English words pronunciations and text in Pinyin and convert back to traditional spellings through a converter.

Ralph Torello

unread,
Feb 11, 2016, 8:17:56 PM2/11/16
to Google Translate API Developer Forum
People on the internet have absolutely no brains at all. just none. People in the (former) USA will care about Mandarin Chinese through education, literacy rates increasing here at home, a desire to care about advancements in technology in China, and respect for other residents of this (former) country, and respect for others throughout the world in other countries.

You just seem to want to pick a fight with me on this group for no reason whatsoever!!!! That is a mark of true brainlessness.

Zeehad (Cloud Platform Support)

unread,
Feb 12, 2016, 11:45:43 AM2/12/16
to Google Translate API Developer Forum
I'm locking this thread now.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages