データソースとモニタリング

別のトレーニングデータを指定し、モニタリングを追加します。

モデルにクエリを直接実行して、返された結果をテストできます。その場合、Cloud コンソールでさまざまなパラメータ値を使用することも、Vertex AI API を直接呼び出すこともできます。

Vertex AI API の使用例については、Vertex AI API を使用したクイックスタートをご覧ください。
Cloud コンソールでこのサンプルを表示するには: Google Cloud コンソールに移動

システム指示

リクエストされたコード拡張を提供することで、ML システムを強化しようとしています。変更内容に直接関連する最適化や提案を 1 つか 2 つ、必ず簡単に言及することにしています。これは、レスポンスの最後に自然言語の箇条書きで記載します。これにより、デベロッパーは無視することも、追加情報をリクエストすることもできます。

自由形式のプロンプト

TensorFlow と Keras を使用してお客様のフィードバックを処理する感情分析プロジェクトに取り組んでいます。customer_reviews の代わりに、Hugging Face の Yelp Polarity データセットからデータをランダムにサンプリングしてください。サンプリングするのは、テストデータや検証データではなく、トレーニングデータのみです。サンプリングは、トークン化の前に行います。また、リソース使用状況のモニタリングも組み込みたいと思っています。そのための関数を追加し、各エポックの最後にコールバックで使用してください。CPU 使用率とメモリ使用率をモニタリングしてロギングする必要があります。

Yelp レビューの 500 件のランダムサンプルを使用して 1 回、Yelp レビューの 1,000 件のランダムサンプルを使用して 1 回、これを実行してください。

コードは次のとおりです。

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout

# Sample customer feedback data
customer_reviews = [
    {"review": "The product is fantastic! Much better than expected.", "gender": "female", "state": "CA"},
    {"review": "Terrible service, very disappointing.", "gender": "male", "state": "NY"},
    {"review": "Excellent quality, but the shipping was slow.", "gender": "female", "state": "TX"},
    {"review": "Horrible experience, will not buy again.", "gender": "male", "state": "CA"},
    {"review": "User-friendly and intuitive design.", "gender": "female", "state": "WA"},
    {"review": "I am very satisfied with my purchase, would recommend.", "gender": "female", "state": "TX"},
    {"review": "Not worth the price, feels cheaply made.", "gender": "male", "state": "NY"},
    {"review": "The app constantly crashes, very annoying.", "gender": "male", "state": "FL"},
    {"review": "Customer support was helpful and solved my problem quickly.", "gender": "female", "state": "CA"},
    {"review": "Good quality, but a bit too expensive.", "gender": "male", "state": "TX"}
]

# Extracting reviews and labels for training
reviews = [entry["review"] for entry in customer_reviews]
labels = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1, 1]) # Binary labels for sentiment (1 = positive, 0 = negative)

# Tokenization and sequence padding
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(reviews)
review_sequences = tokenizer.texts_to_sequences(reviews)
review_padded = pad_sequences(review_sequences, maxlen=100)

# Model creation
model = Sequential([
    Embedding(input_dim=10000, output_dim=64, input_length=100),
    LSTM(128, return_sequences=True),
    Dropout(0.2),
    LSTM(64),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(review_padded, labels, epochs=5, verbose=0)

レスポンス

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from datasets import load_dataset
import psutil
import os

# Function to monitor and log resource usage
def monitor_resources(logs):
    cpu_percent = psutil.cpu_percent(interval=1)
    mem = psutil.virtual_memory()
    mem_percent = mem.percent
    logs.update({'cpu_usage': cpu_percent, 'memory_usage': mem_percent})


# Load the Yelp Polarity dataset
dataset = load_dataset('yelp_polarity')

# Sample data for training
def sample_data(num_samples):
    sampled_data = dataset['train'].random_split([num_samples, len(dataset['train']) - num_samples])[0]
    reviews = [example['text'] for example in sampled_data]
    labels = np.array([example['label'] for example in sampled_data])
    return reviews, labels

# Run with 500 samples
reviews_500, labels_500 = sample_data(500)

# Tokenization and sequence padding for 500 samples
tokenizer_500 = Tokenizer(num_words=10000)
tokenizer_500.fit_on_texts(reviews_500)
review_sequences_500 = tokenizer_500.texts_to_sequences(reviews_500)
review_padded_500 = pad_sequences(review_sequences_500, maxlen=100)

# Model creation for 500 samples
model_500 = Sequential([
    Embedding(input_dim=10000, output_dim=64, input_length=100),
    LSTM(128, return_sequences=True),
    Dropout(0.2),
    LSTM(64),
    Dense(1, activation='sigmoid')
])

# Compile the model for 500 samples
model_500.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

#Resource monitoring callback
resource_callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=monitor_resources)

# Train the model for 500 samples
model_500.fit(review_padded_500, labels_500, epochs=5, verbose=1, callbacks=[resource_callback])


# Run with 1000 samples
reviews_1000, labels_1000 = sample_data(1000)

# Tokenization and sequence padding for 1000 samples
tokenizer_1000 = Tokenizer(num_words=10000)
tokenizer_1000.fit_on_texts(reviews_1000)
review_sequences_1000 = tokenizer_1000.texts_to_sequences(reviews_1000)
review_padded_1000 = pad_sequences(review_sequences_1000, maxlen=100)

# Model creation for 1000 samples
model_1000 = Sequential([
    Embedding(input_dim=10000, output_dim=64, input_length=100),
    LSTM(128, return_sequences=True),
    Dropout(0.2),
    LSTM(64),
    Dense(1, activation='sigmoid')
])

# Compile the model for 1000 samples
model_1000.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model for 1000 samples
model_1000.fit(review_padded_1000, labels_1000, epochs=5, verbose=1, callbacks=[resource_callback])

トークン化前のデータサンプリング: リクエストに応じて、トークン化の前に Hugging Face データセットからデータをサンプリングするようになりました。これにより、トークン化プロセスはサンプリングされたデータにのみ適用され、効率が向上します。
リソースモニタリング: monitor_resources 関数をコールバックとして組み込んだことにより、トレーニング中の CPU とメモリ使用量をリアルタイムでモニタリングできます。これにより、リソース管理が改善され、潜在的なボトルネックを特定できます。より永続的に記録を保持するために、ファイルへのロギングを追加することを検討してください。

モデル:	gemini-1.5-flash-002
温度:	0.2
最大出力トークン:	8192
TopK:	40.0
TopP:	0.95

データソースとモニタリング コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

自由形式のプロンプト

レスポンス

データソースとモニタリング