Translate a language into another language

Translation is one of the three Vertex AI pre-trained APIs on Google Distributed Cloud (GDC) air-gapped. With Translation, you can translate more than twenty languages into English.

This page describes the use of the following Translation methods available in Distributed Cloud:

Method Description
TranslateText Translate text from one language to another.
DetectLanguage Get the name of the language you are translating.
GetSupportedLanguages Return a list of languages supported by the Translation API.
GetOperation Get the status of Translation operations.
ListOperations Obtain a list of operations that match a filter.

This page also contains the list of supported languages for Translation.

To optionally define and use a glossary, see Define terms to translate.

You can test the Translation API on Distributed Cloud by trying a tutorial to translate a text from Spanish to English.

Considerations

When using the Translation API, consider the following aspects:

  • The maximum input length of contents for translation is 1024 characters.
  • The TranslateText method doesn't support batching.
  • Types of characters and large portions of text affect translation quality. Therefore, if you experience translation issues, follow these steps:

    1. Remove all special and escape characters that might affect the processing of the request.
    2. Reduce the input size by splitting the contents into smaller sentences.
    3. Translate each of the smaller sentences again by sending them in separate requests to the endpoint.

For more information, see translateText.

Before you begin

Follow these steps before using the Translation API:

  1. Set up a project for Vertex AI.

  2. Assign the AI Translation Developer (ai-translation-developer) role to a service account. You require this role to generate a token for request authentication and authorization.

  3. Enable the Translation pre-trained API to make the Translation service available to all projects.

  4. View the service status and endpoint for the Translation pre-trained API.

  5. Install the Translation client library.

    Follow these steps to ensure you have the correct version of the client library:

    1. Check if the Translation client library is installed and obtain the version number:

      pip freeze | grep translation
      

      If the client library is already installed, you obtain an output similar to the following example:

      google-cloud-translation==3.8.0
      

      The version number you obtain must match the client library at the following endpoint:

      https://GDC_URL/.well-known/static/client-libraries
      

      Replace GDC_URL with the URL of your organization in GDC.

      If the version numbers don't match, uninstall the client library:

      pip uninstall google-cloud-translation
      
    2. If you uninstalled or haven't installed the Translation client library, you must install it by specifying the filename corresponding to your operating system.

  6. Create a notebook to interact with the Translation pre-trained API from a Python script.

  7. Ask your Project IAM Admin to grant you the AI Translation Developer (ai-translation-developer) role in your project namespace.

Set your environment variables

Follow these steps to set the required environment variables on a Python script:

  1. Create a Python script on a JupyterLab notebook.

  2. Add the following code to the Python script:

    import os
    
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
    

    Replace APPLICATION_DEFAULT_CREDENTIALS_FILENAME with the name of the JSON file that contains the service account keys you created in the project, for example, my-service-key.json.

  3. Save the Python script with a name, for example, translation.py.

  4. Run the Python script to set the environment variables:

    python SCRIPT_NAME
    

    Replace SCRIPT_NAME with the name you gave to your Python script, for example, translation.py.

Get an authentication token

Obtain a token to authenticate the request before making a Translation API call from the CLI. This step is necessary if you use the curl tool to interact with the Translation service.

Follow these steps to get an authentication token:

  1. Install the google-auth client library:

    pip install google-auth
    
  2. Add the following code to the Python script you created when setting up the environment variables:

    import google.auth
    from google.auth.transport import requests
    import requests as reqs
    
    audience = "https://ENDPOINT"
    
    creds, project_id = google.auth.default()
    print(project_id)
    creds = creds.with_gdch_audience(audience)
    
    def test_get_token():
      sesh = reqs.Session()
      req = requests.Request(session=sesh)
      creds.refresh(req)
      print(creds.token)
    
    if __name__=="__main__":
      test_get_token()
    

    Replace ENDPOINT with the Translation endpoint that you use for your organization. For more information, view service status and endpoints.

  3. Save the Python script.

  4. Run the Python script to fetch the token:

    python SCRIPT_NAME
    

    Replace SCRIPT_NAME with the name you gave to your Python script, for example, translation.py.

    The output shows the authentication token.

Add the token to the header of the curl requests you make, as in the following example:

-H "Authorization: Bearer TOKEN"

Translate text

The TranslateText method takes input text in a particular language and returns the translated text into another language. You can enter plain or HTML text as input.

If you enter HTML text, the TranslateText method translates only the text between the HTML tags without translating the tags. However, it translates attributes in HTML5 tags, such as alt attributes. An example of using HTML5 tags and attributes is used in the syntax to exclude text from translation.

The output retains the untranslated HTML tags and includes the translated text between them. Differences can exist between the source and target languages. Because of word order changes in the translation, the order of HTML tags in the output might differ from that in the input text.

The following code samples show how to translate text from one language into another using the Translation API.

curl

Follow these steps to make a curl request:

  1. Get an authentication token.

  2. Make the request:

curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"parent": "projects/PROJECT_ID", "source_language_code": "SOURCE_LANGUAGE", "target_language_code": "TARGET_LANGUAGE", "contents": ["INPUT_TEXT"]}'

Replace the following:

  • TOKEN: the authentication token you obtained.
  • ENDPOINT: the Translation endpoint that you use for your organization. For more information, view service status and endpoints.
  • PROJECT_ID: your project ID.
  • SOURCE_LANGUAGE: the language code of your input text. See the list of supported languages and their respective language codes.
  • TARGET_LANGUAGE: the language code you want to translate your text into. See the list of supported languages and their respective language codes.
  • INPUT_TEXT: your input text in the source language.

Use the mime_type field to specify a type of file. Set the mime_type field to one of the following values:

  • text/plain: your input is plain text.
  • text/html: your input is HTML text.

If the mime_type field is empty, text/html is the default value.

The following example uses the mime_type field:

curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"mime_type": "text/html", "parent": "projects/PROJECT_ID", "source_language_code": "SOURCE_LANGUAGE", "target_language_code": "TARGET_LANGUAGE", "contents": ["INPUT_TEXT"]}'

The output returns the translated text.

Python

Follow these steps to use the Translation service from a Python script:

  1. Install the latest version of the Translation client library.

  2. Add the following code to the Python script you created when setting up the environment variables:

    from google.cloud import translate
    import google.auth
    from google.auth.transport import requests
    from google.api_core.client_options import ClientOptions
    
    audience = "https://ENDPOINT:443"
    api_endpoint="ENDPOINT:443"
    
    def translate_client(creds):
      opts = ClientOptions(api_endpoint=api_endpoint)
      return translate.TranslationServiceClient(credentials=creds, client_options=opts)
    
    def main():
      creds = None
      try:
        creds, project_id = google.auth.default()
        creds = creds.with_gdch_audience(audience)
        req = requests.Request()
        creds.refresh(req)
        print("Got token: ")
        print(creds.token)
      except Exception as e:
        print("Caught exception" + str(e))
        raise e
      return creds
    
    def translate_func(creds):
      tc = translate_client(creds)
      req = {
        "parent": "projects/PROJECT_ID",
        "source_language_code": "SOURCE_LANGUAGE",
        "target_language_code": "TARGET_LANGUAGE",
        "mime_type": "text/plain",
        "contents": ["INPUT_TEXT"]
      }
    
      resp = tc.translate_text(req)
      print(resp)
    
    if __name__=="__main__":
      creds = main()
      translate_func(creds)
    

    Replace the following:

    • ENDPOINT: the Translation endpoint that you use for your organization. For more information, view service status and endpoints.
    • PROJECT_ID: your project ID.
    • SOURCE_LANGUAGE: the language code of your input text. See the list of supported languages and their respective language codes.
    • TARGET_LANGUAGE: the language code you want to translate your text into. See the list of supported languages and their respective language codes.
    • INPUT_TEXT: your input text in the source language.

    Use the mime_type field to specify a type of file. Set the mime_type field to one of the following values:

    • text/plain: your input is plain text.
    • text/html: your input is HTML text.

    If the mime_type field is empty, text/html is the default value.

  3. Save the Python script.

  4. Run the Python script to translate the text:

    python SCRIPT_NAME
    

    Replace SCRIPT_NAME with the name you gave to your Python script, for example, translation.py.

Exclude text from translation

Use one of the following HTML tags on the contents field of requests to exclude parts of your text from translation:

  • <span translate="no">"TEXT"</span>
  • <span class="notranslate">"TEXT"</span>

Replace TEXT with the portion of text you want to exclude from translation.

For example, if you have the following input text in Spanish:

Hola, esto es una prueba.

Then, that text translates in English to the following sentence:

Hello, this is a test.

Suppose you only want to translate the following part of the text, excluding Hola, from the input text:

esto es una prueba.

That part of the text translates in English to the following sentence:

this is a test.

Use the HTML tags to exclude text from the translation. For example, the following curl request uses the <span class="notranslate">"TEXT"</span> tag to exclude Hola, from the previous input text in Spanish when translating the text to English:

curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"parent": "projects/PROJECT_ID", "source_language_code": "es", "target_language_code": "en", "contents": [<span class="notranslate">"Hola,"</span>"esto es una prueba."]}'

Detect language

The DetectLanguage method returns the language of a text string by sending an HTTP request.

For example, the following requests detect English as the language from the input text Hello, this is a test:

curl

curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:detectLanguage -d '{"parent": "projects/PROJECT_ID", "contents": ["Hello, this is a test"]}'

Get supported languages

The GetSupportedLanguages method returns the list of language that the Translation API supports.

For example, the following requests return the supported languages by specifying the Translation endpoint:

curl

curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:getSupportedLanguages -d "{}"

Get an operation

The GetOperation method returns the latest state of a long-running operation. Use this method to retrieve the operation result generated by the Translation API service. To use this method, specify your project ID and the Translation endpoint.

For example, the following requests return the state of a long-running operation, such as creating a glossary, that is running in your project:

curl

curl -vv -X GET -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID/operations/PROJECT_ID

List operations

The ListOperations method returns a list of the long-running operations that match a specified filter in the request. To use this method, specify your project ID and the Translation endpoint.

For example, the following requests return the list of operations running in your project and limit the page size to ten results per page:

curl

curl -vv -X GET -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID/operations?page_size=10

Supported languages

Translation lets you translate text written in a given language into English, the language code of which is en. This section contains the list of supported languages that translate to English.

Supported languages to translate to English

The following languages translate to English (en):

Language Language code
Afrikaans af
Albanian sq
Amharic am
Arabic ar
Armenian hy
Azerbaijani az
Basque eu
Belarusian be
Bengali bn
Bosnian bs
Bulgarian bg
Catalan ca
Cebuano ceb
Chinese (Simplified) zh
Chichewa ny
Corsican co
Croatian hr
Czech cs
Danish da
Dutch nl
Estonian et
Farsi/Persian fa
Filipino fil
Finnish fi
French fr
Frisian fy
Galician gl
Georgian ka
German de
Greek el
Gujarati gu
Haitian Creole ht
Hausa ha
Hawaiian haw
Hebrew iw
Hindi hi
Hmong hmn
Hungarian hu
Icelandic is
Igbo ig
Indonesian id
Irish ga
Italian it
Japanese ja
Javanese jv
Lao lo
Kannada kn
Kazakh kk
Khmer km
Kinyarwanda rw
Korean ko
Kurdish (Kurmanji) ku
Kyrgyz ky
Latvian lv
Lithuanian lt
Luxembourgish lb
Macedonian mk
Malagasy mg
Malay ms
Maori mi
Malayalam ml
Maltese mt
Marathi mr
Mongolian mn
Myanmar (Burmese) my
Nepali ne
Norwegian no
Odia (Oriya) or
Pashto ps
Polish pl
Portuguese pt
Punjabi pa
Romanian ro
Russian ru
Samoan sm
Scots Gaelic gd
Serbian sr
Sesotho st
Sindhi sd
Sinhala si
Shona sn
Slovak sk
Slovenian sl
Somali so
Spanish es
Sundanese su
Swahili sw
Tajik tg
Tamil ta
Tatar tt
Telugu te
Thai th
Turkish tr
Turkmen tk
Ukrainian uk
Urdu ur
Uyghur ug
Uzbek uz
Vietnamese vi
Xhosa xh
Yiddish yi
Yoruba yo
Welsh cy
Zulu zu