Translation is one of the three Vertex AI pre-trained APIs on Google Distributed Cloud (GDC) air-gapped. With Translation, you can translate more than twenty languages into English.
This page describes the use of the following Translation methods available in Distributed Cloud:
Method | Description |
---|---|
TranslateText |
Translate text from one language to another. |
DetectLanguage |
Get the name of the language you are translating. |
GetSupportedLanguages |
Return a list of languages supported by the Translation API. |
GetOperation |
Get the status of Translation operations. |
ListOperations |
Obtain a list of operations that match a filter. |
This page also contains the list of supported languages for Translation.
To optionally define and use a glossary, see Define terms to translate.
You can test the Translation API on Distributed Cloud by trying a tutorial to translate a text from Spanish to English.
Considerations
When using the Translation API, consider the following aspects:
- The maximum input length of contents for translation is 1024 characters.
- The
TranslateText
method doesn't support batching. Types of characters and large portions of text affect translation quality. Therefore, if you experience translation issues, follow these steps:
- Remove all special and escape characters that might affect the processing of the request.
- Reduce the input size by splitting the contents into smaller sentences.
- Translate each of the smaller sentences again by sending them in separate requests to the endpoint.
For more information, see translateText
.
Before you begin
Follow these steps before using the Translation API:
Assign the AI Translation Developer (
ai-translation-developer
) role to a service account. You require this role to generate a token for request authentication and authorization.Enable the Translation pre-trained API to make the Translation service available to all projects.
View the service status and endpoint for the Translation pre-trained API.
Install the Translation client library.
Follow these steps to ensure you have the correct version of the client library:
Check if the Translation client library is installed and obtain the version number:
pip freeze | grep translation
If the client library is already installed, you obtain an output similar to the following example:
google-cloud-translation==3.8.0
The version number you obtain must match the client library at the following endpoint:
https://GDC_URL/.well-known/static/client-libraries
Replace
GDC_URL
with the URL of your organization in GDC.If the version numbers don't match, uninstall the client library:
pip uninstall google-cloud-translation
If you uninstalled or haven't installed the Translation client library, you must install it by specifying the filename corresponding to your operating system.
Create a notebook to interact with the Translation pre-trained API from a Python script.
Ask your Project IAM Admin to grant you the AI Translation Developer (
ai-translation-developer
) role in your project namespace.
Set your environment variables
Follow these steps to set the required environment variables on a Python script:
Create a Python script on a JupyterLab notebook.
Add the following code to the Python script:
import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
Replace
APPLICATION_DEFAULT_CREDENTIALS_FILENAME
with the name of the JSON file that contains the service account keys you created in the project, for example,my-service-key.json
.Save the Python script with a name, for example,
translation.py
.Run the Python script to set the environment variables:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, for example,translation.py
.
Get an authentication token
Obtain a token to authenticate the request before making a
Translation API call from the CLI. This step is necessary if you
use the curl
tool to interact with the Translation
service.
Follow these steps to get an authentication token:
Install the
google-auth
client library:pip install google-auth
Add the following code to the Python script you created when setting up the environment variables:
import google.auth from google.auth.transport import requests import requests as reqs audience = "https://ENDPOINT" creds, project_id = google.auth.default() print(project_id) creds = creds.with_gdch_audience(audience) def test_get_token(): sesh = reqs.Session() req = requests.Request(session=sesh) creds.refresh(req) print(creds.token) if __name__=="__main__": test_get_token()
Replace
ENDPOINT
with the Translation endpoint that you use for your organization. For more information, view service status and endpoints.Save the Python script.
Run the Python script to fetch the token:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, for example,translation.py
.The output shows the authentication token.
Add the token to the header of the curl
requests you make, as in
the following example:
-H "Authorization: Bearer TOKEN"
Translate text
The TranslateText
method takes input text in a particular language and returns
the translated text into another language. You can enter plain or HTML text as
input.
If you enter HTML text, the TranslateText
method translates only the text
between the HTML tags without translating the tags. However, it translates
attributes in HTML5 tags, such as alt
attributes. An example of using HTML5
tags and attributes is used in the syntax to exclude text from translation.
The output retains the untranslated HTML tags and includes the translated text between them. Differences can exist between the source and target languages. Because of word order changes in the translation, the order of HTML tags in the output might differ from that in the input text.
The following code samples show how to translate text from one language into another using the Translation API.
curl
Follow these steps to make a curl
request:
Make the request:
curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"parent": "projects/PROJECT_ID", "source_language_code": "SOURCE_LANGUAGE", "target_language_code": "TARGET_LANGUAGE", "contents": ["INPUT_TEXT"]}'
Replace the following:
TOKEN
: the authentication token you obtained.ENDPOINT
: the Translation endpoint that you use for your organization. For more information, view service status and endpoints.PROJECT_ID
: your project ID.SOURCE_LANGUAGE
: the language code of your input text. See the list of supported languages and their respective language codes.TARGET_LANGUAGE
: the language code you want to translate your text into. See the list of supported languages and their respective language codes.INPUT_TEXT
: your input text in the source language.
Use the mime_type
field to specify a type of file. Set the mime_type
field to one of the following values:
text/plain
: your input is plain text.text/html
: your input is HTML text.
If the mime_type
field is empty, text/html
is the default value.
The following example uses the mime_type
field:
curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"mime_type": "text/html", "parent": "projects/PROJECT_ID", "source_language_code": "SOURCE_LANGUAGE", "target_language_code": "TARGET_LANGUAGE", "contents": ["INPUT_TEXT"]}'
The output returns the translated text.
Python
Follow these steps to use the Translation service from a Python script:
Install the latest version of the Translation client library.
Add the following code to the Python script you created when setting up the environment variables:
from google.cloud import translate import google.auth from google.auth.transport import requests from google.api_core.client_options import ClientOptions audience = "https://ENDPOINT:443" api_endpoint="ENDPOINT:443" def translate_client(creds): opts = ClientOptions(api_endpoint=api_endpoint) return translate.TranslationServiceClient(credentials=creds, client_options=opts) def main(): creds = None try: creds, project_id = google.auth.default() creds = creds.with_gdch_audience(audience) req = requests.Request() creds.refresh(req) print("Got token: ") print(creds.token) except Exception as e: print("Caught exception" + str(e)) raise e return creds def translate_func(creds): tc = translate_client(creds) req = { "parent": "projects/PROJECT_ID", "source_language_code": "SOURCE_LANGUAGE", "target_language_code": "TARGET_LANGUAGE", "mime_type": "text/plain", "contents": ["INPUT_TEXT"] } resp = tc.translate_text(req) print(resp) if __name__=="__main__": creds = main() translate_func(creds)
Replace the following:
ENDPOINT
: the Translation endpoint that you use for your organization. For more information, view service status and endpoints.PROJECT_ID
: your project ID.SOURCE_LANGUAGE
: the language code of your input text. See the list of supported languages and their respective language codes.TARGET_LANGUAGE
: the language code you want to translate your text into. See the list of supported languages and their respective language codes.INPUT_TEXT
: your input text in the source language.
Use the
mime_type
field to specify a type of file. Set themime_type
field to one of the following values:text/plain
: your input is plain text.text/html
: your input is HTML text.
If the
mime_type
field is empty,text/html
is the default value.Save the Python script.
Run the Python script to translate the text:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, for example,translation.py
.
Exclude text from translation
Use one of the following HTML tags on the contents
field of requests to
exclude parts of your text from translation:
<span translate="no">"TEXT"</span>
<span class="notranslate">"TEXT"</span>
Replace TEXT
with the portion of text you want to exclude
from translation.
For example, if you have the following input text in Spanish:
Hola, esto es una prueba.
Then, that text translates in English to the following sentence:
Hello, this is a test.
Suppose you only want to translate the following part of the text, excluding
Hola,
from the input text:
esto es una prueba.
That part of the text translates in English to the following sentence:
this is a test.
Use the HTML tags to exclude text from the translation. For example, the
following curl
request uses the <span class="notranslate">"TEXT"</span>
tag to exclude Hola,
from the previous input text in Spanish when translating
the text to English:
curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:translateText -d '{"parent": "projects/PROJECT_ID", "source_language_code": "es", "target_language_code": "en", "contents": [<span class="notranslate">"Hola,"</span>"esto es una prueba."]}'
Detect language
The DetectLanguage
method returns the language of a text string by sending an
HTTP request.
For example, the following requests detect English as the language from the
input text Hello, this is a test
:
curl
curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:detectLanguage -d '{"parent": "projects/PROJECT_ID", "contents": ["Hello, this is a test"]}'
Get supported languages
The GetSupportedLanguages
method returns the list of language that the
Translation API supports.
For example, the following requests return the supported languages by specifying the Translation endpoint:
curl
curl -vv -X POST -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID:getSupportedLanguages -d "{}"
Get an operation
The GetOperation
method returns the latest state of a
long-running operation.
Use this method to retrieve the operation result generated by the
Translation API service. To use this method, specify your
project ID and the Translation endpoint.
For example, the following requests return the state of a long-running operation, such as creating a glossary, that is running in your project:
curl
curl -vv -X GET -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID/operations/PROJECT_ID
List operations
The ListOperations
method returns a list of the long-running operations
that match a specified filter in the request. To use this method, specify your
project ID and the Translation endpoint.
For example, the following requests return the list of operations running in your project and limit the page size to ten results per page:
curl
curl -vv -X GET -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" https://ENDPOINT/v3/projects/PROJECT_ID/operations?page_size=10
Supported languages
Translation lets you translate text written in a given language
into English, the language code of which is en
.
This section contains the list of supported languages that translate to English.
Supported languages to translate to English
The following languages translate to English (en
):
Language | Language code |
---|---|
Afrikaans | af |
Albanian | sq |
Amharic | am |
Arabic | ar |
Armenian | hy |
Azerbaijani | az |
Basque | eu |
Belarusian | be |
Bengali | bn |
Bosnian | bs |
Bulgarian | bg |
Catalan | ca |
Cebuano | ceb |
Chinese (Simplified) | zh |
Chichewa | ny |
Corsican | co |
Croatian | hr |
Czech | cs |
Danish | da |
Dutch | nl |
Estonian | et |
Farsi/Persian | fa |
Filipino | fil |
Finnish | fi |
French | fr |
Frisian | fy |
Galician | gl |
Georgian | ka |
German | de |
Greek | el |
Gujarati | gu |
Haitian Creole | ht |
Hausa | ha |
Hawaiian | haw |
Hebrew | iw |
Hindi | hi |
Hmong | hmn |
Hungarian | hu |
Icelandic | is |
Igbo | ig |
Indonesian | id |
Irish | ga |
Italian | it |
Japanese | ja |
Javanese | jv |
Lao | lo |
Kannada | kn |
Kazakh | kk |
Khmer | km |
Kinyarwanda | rw |
Korean | ko |
Kurdish (Kurmanji) | ku |
Kyrgyz | ky |
Latvian | lv |
Lithuanian | lt |
Luxembourgish | lb |
Macedonian | mk |
Malagasy | mg |
Malay | ms |
Maori | mi |
Malayalam | ml |
Maltese | mt |
Marathi | mr |
Mongolian | mn |
Myanmar (Burmese) | my |
Nepali | ne |
Norwegian | no |
Odia (Oriya) | or |
Pashto | ps |
Polish | pl |
Portuguese | pt |
Punjabi | pa |
Romanian | ro |
Russian | ru |
Samoan | sm |
Scots Gaelic | gd |
Serbian | sr |
Sesotho | st |
Sindhi | sd |
Sinhala | si |
Shona | sn |
Slovak | sk |
Slovenian | sl |
Somali | so |
Spanish | es |
Sundanese | su |
Swahili | sw |
Tajik | tg |
Tamil | ta |
Tatar | tt |
Telugu | te |
Thai | th |
Turkish | tr |
Turkmen | tk |
Ukrainian | uk |
Urdu | ur |
Uyghur | ug |
Uzbek | uz |
Vietnamese | vi |
Xhosa | xh |
Yiddish | yi |
Yoruba | yo |
Welsh | cy |
Zulu | zu |