Using the Language Client
Documents
The Google Natural Language API has the following supported methods:
and each method uses a Document
for representing
text.
>>> document = language.types.Document( ... content='Google, headquartered in Mountain View, unveiled the ' ... 'new Android phone at the Consumer Electronic Show. ' ... 'Sundar Pichai said in his keynote that users love ' ... 'their new Android phones.', ... language='en', ... type='PLAIN_TEXT', ... )
The document’s language defaults to None
, which will cause the API to
auto-detect the language.
In addition, you can construct an HTML document:
>>> html_content = """\ ... <html> ... <head> ... <title>El Tiempo de las Historias</time> ... </head> ... <body> ... <p>La vaca saltó sobre la luna.</p> ... </body> ... </html> ... """ >>> document = language.types.Document( ... content=html_content, ... language='es', ... type='HTML', ... )
The language
argument can be either ISO-639-1 or BCP-47 language
codes. The API reference page contains the full list of supported languages.
In addition to supplying the text / HTML content, a document can refer to content stored in Google Cloud Storage.
>>> document = language.types.Document( ... gcs_content_uri='gs://my-text-bucket/sentiment-me.txt', ... type=language.enums.HTML, ... )
Analyze Entities
The analyze_entities()
method finds named entities (i.e. proper names) in the text. This method
returns a AnalyzeEntitiesResponse
.
>>> document = language.types.Document( ... content='Michelangelo Caravaggio, Italian painter, is ' ... 'known for "The Calling of Saint Matthew".', ... type=language.enums.Document.Type.PLAIN_TEXT, ... ) >>> response = client.analyze_entities( ... document=document, ... encoding_type='UTF32', ... ) >>> for entity in response.entities: ... print('=' * 20) ... print(' name: {0}'.format(entity.name)) ... print(' type: {0}'.format(entity.type)) ... print(' metadata: {0}'.format(entity.metadata)) ... print(' salience: {0}'.format(entity.salience)) ==================== name: Michelangelo Caravaggio type: PERSON metadata: {'wikipedia_url': 'https://en.wikipedia.org/wiki/Caravaggio'} salience: 0.7615959 ==================== name: Italian type: LOCATION metadata: {'wikipedia_url': 'https://en.wikipedia.org/wiki/Italy'} salience: 0.19960518 ==================== name: The Calling of Saint Matthew type: EVENT metadata: {'wikipedia_url': 'https://en.wikipedia.org/wiki/The_Calling_of_St_Matthew_(Caravaggio)'} salience: 0.038798928
NOTE: It is recommended to send an encoding_type
argument to Natural
Language methods, so they provide useful offsets for the data they return.
While the correct value varies by environment, in Python you usually
want UTF32
.
Analyze Sentiment
The analyze_sentiment()
method
analyzes the sentiment of the provided text. This method returns a
AnalyzeSentimentResponse
.
>>> document = language.types.Document( ... content='Jogging is not very fun.', ... type='PLAIN_TEXT', ... ) >>> response = client.analyze_sentiment( ... document=document, ... encoding_type='UTF32', ... ) >>> sentiment = response.document_sentiment >>> print(sentiment.score) -1 >>> print(sentiment.magnitude) 0.8
NOTE: It is recommended to send an encoding_type
argument to Natural
Language methods, so they provide useful offsets for the data they return.
While the correct value varies by environment, in Python you usually
want UTF32
.
Analyze Entity Sentiment
The analyze_entity_sentiment()
method is effectively the amalgamation of
analyze_entities()
and
analyze_sentiment()
.
This method returns a
AnalyzeEntitySentimentResponse
.
>>> document = language.types.Document(
... content='Mona said that jogging is very fun.',
... type='PLAIN_TEXT',
... )
>>> response = client.analyze_entity_sentiment(
... document=document,
... encoding_type='UTF32',
... )
>>> entities = response.entities
>>> entities[0].name
'Mona'
>>> entities[1].name
'jogging'
>>> entities[1].sentiment.magnitude
0.8
>>> entities[1].sentiment.score
0.8
NOTE: It is recommended to send an encoding_type
argument to Natural
Language methods, so they provide useful offsets for the data they return.
While the correct value varies by environment, in Python you usually
want UTF32
.
Annotate Text
The annotate_text()
method
analyzes a document and is intended for users who are familiar with
machine learning and need in-depth text features to build upon. This method
returns a AnnotateTextResponse
.