Menggunakan Cloud Storage sebagai sistem file yang terpasang
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Cloud Storage FUSE memungkinkan Anda memuat data pelatihan ke bucket Cloud Storage dan mengakses data tersebut dari tugas pelatihan kustom Anda seperti sistem file yang terpasang. Menggunakan Cloud Storage FUSE memiliki manfaat berikut:
Data pelatihan di-streaming ke tugas pelatihan Anda, bukan didownload ke replika, yang dapat mempercepat pemuatan data dan tugas penyiapan saat tugas mulai berjalan.
Tugas pelatihan dapat menangani input dan output dalam skala besar tanpa melakukan panggilan API, menangani respons, atau mengintegrasikan dengan library sisi klien.
Cloud Storage FUSE menyediakan throughput tinggi untuk pembacaan berurutan file besar dan dalam skenario pelatihan terdistribusi.
Kasus penggunaan
Sebaiknya gunakan Cloud Storage untuk menyimpan data pelatihan dalam situasi berikut:
Data pelatihan Anda adalah data tidak terstruktur, seperti gambar, teks, dan video.
Data pelatihan Anda adalah data terstruktur dalam format seperti TFRecord.
Data pelatihan Anda berisi file besar, seperti video mentah.
Anda menggunakan pelatihan terdistribusi.
Cara kerjanya
Tugas pelatihan kustom dapat mengakses bucket Cloud Storage Anda sebagai subdirektori dari direktori /gcs root. Misalnya, jika data pelatihan Anda berada di gs://example-bucket/data.csv, Anda dapat membaca dan menulis ke bucket dari aplikasi pelatihan Python Anda sebagai berikut:
Secara default, tugas pelatihan kustom dapat mengakses bucket Cloud Storage apa pun dalam project Google Cloud yang sama menggunakan Agen Layanan Kode Kustom Vertex AI.
Untuk mengontrol akses ke bucket, Anda dapat menetapkan akun layanan kustom ke tugas. Dalam hal ini, akses ke bucket Cloud Storage diberikan berdasarkan izin yang terkait dengan peran Cloud Storage dari akun layanan khusus.
Misalnya, jika Anda ingin memberikan akses baca dan tulis tugas pelatihan kustom ke Bucket-A tetapi hanya memberikan akses baca ke Bucket-B, Anda dapat menetapkan akun layanan khusus yang memiliki peran berikut ke tugas tersebut:
roles/storage.objectAdmin untuk Bucket-A
roles/storage.objectViewer untuk Bucket-B
Jika tugas pelatihan mencoba menulis ke Bucket-B, error "izin ditolak" akan ditampilkan.
Hindari mengganti nama direktori. Operasi penggantian nama tidak bersifat atomik di Cloud Storage FUSE. Jika operasi terganggu, beberapa file akan tetap berada di direktori lama.
Hindari tindakan yang tidak perlu seperti menutup (close()) atau mengosongkan file (flush()). Menutup atau mengosongkan file akan mengirim file ke Cloud Storage, yang akan menimbulkan biaya.
Panduan pengoptimalan performa
Untuk mendapatkan throughput baca yang optimal saat menggunakan Cloud Storage sebagai sistem file, sebaiknya terapkan pedoman berikut:
Untuk mengurangi latensi yang ditimbulkan dengan mencari dan membuka objek di bucket, simpan data dalam file yang lebih besar dan lebih sedikit.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-25 UTC."],[],[],null,["# Use Cloud Storage as a mounted file system\n\n[Cloud Storage FUSE](/storage/docs/gcs-fuse) lets you load training data to a Cloud Storage bucket and access that data from your custom training job like a mounted file system. Using Cloud Storage FUSE has the following benefits:\n\n\u003cbr /\u003e\n\n- Training data is streamed to your training job instead of downloaded to replicas, which can make data loading and setup tasks faster when the job starts running.\n- Training jobs can handle input and output at scale without making API calls, handling responses, or integrating with client-side libraries.\n- Cloud Storage FUSE provides high throughput for large file sequential reads and in distributed training scenarios.\n\nUse cases\n---------\n\nWe recommend using Cloud Storage for storing training data in the following\nsituations:\n\n- Your training data is unstructured data, such as image, text, and video.\n- Your training data is structured data in a format such as TFRecord.\n- Your training data contains large files, such as raw video.\n- You use distributed training.\n\nHow it works\n------------\n\nCustom training jobs can access your Cloud Storage buckets as subdirectories\nof the root `/gcs` directory. For example, if your training data is located at\n`gs://example-bucket/data.csv`, you can read and write to the bucket from your\nPython training application as follows:\n\n**Read to the bucket** \n\n with open('/gcs/example-bucket/data.csv', 'r') as f:\n lines = f.readlines()\n\n**Write to the bucket** \n\n with open('/gcs/example-bucket/epoch3.log', 'a') as f:\n f.write('success!\\n')\n\nBucket access permissions\n-------------------------\n\nBy default, a custom training job can access any Cloud Storage bucket\nwithin the same Google Cloud project by using the\n[Vertex AI Custom Code Service Agent](/vertex-ai/docs/general/access-control#service-agents).\nTo control access to buckets, you can assign a\n[custom service account](/vertex-ai/docs/general/custom-service-account)\nto the job. In this case, access to a Cloud Storage bucket is granted based\non the permissions associated with the Cloud Storage roles of the custom\nservice account.\n\nFor example, if you want to give the custom training job read and write access\nto Bucket-A but only read access to Bucket-B, you can assign a custom service\naccount that has the following roles to the job:\n\n- `roles/storage.objectAdmin` for Bucket-A\n- `roles/storage.objectViewer` for Bucket-B\n\nIf the training job attempts to write to Bucket-B, a \"permission denied\" error\nis returned.\n\nFor more information on Cloud Storage roles, see\n[IAM roles for Cloud Storage](/storage/docs/access-control/iam-roles).\n\nBest practices\n--------------\n\n- Avoid renaming directories. A renaming operation is not atomic in Cloud Storage FUSE. If the operation is interrupted, some files remain in the old directory.\n- Avoid unnecessarily closing (`close()`) or flushing files (`flush()`). Closing or flushing files pushes the file to Cloud Storage, which incurs a cost.\n\n### Performance optimization guidelines\n\nTo get optimal read throughput when using Cloud Storage as a file system, we\nrecommend implementing the following guidelines:\n\n- To reduce the latency introduced by looking up and opening objects in a bucket, store data in larger and fewer files.\n- Use [distributed training](/vertex-ai/docs/training/distributed-training) to maximize bandwidth utilization.\n- Cache frequently accessed files to improve read performance. For details, see [Overview of caching in Cloud Storage FUSE](/storage/docs/gcsfuse-cache).\n- Use local storage for checkpointing and logs instead of Cloud Storage.\n\nLimitations\n-----------\n\nTo learn about the limitations of Cloud Storage FUSE, including the differences\nbetween Cloud Storage FUSE and POSIX file systems, see\n[Limitations and differences from POSIX file systems](/storage/docs/gcs-fuse#differences-and-limitations).\n\nUse Cloud Storage FUSE\n----------------------\n\nTo use Cloud Storage FUSE for custom training, do the following:\n\n1. [Create a Cloud Storage bucket](/storage/docs/creating-buckets). Note that dual-region and multi-region buckets are not supported for custom training.\n2. Upload your training data to the bucket. For details, see\n [Uploads](/storage/docs/uploads-downloads#uploads).\n\n To learn about other options for transferring data to Cloud Storage, see\n [Data transfer options](/storage-transfer/docs/transfer-options).\n3. [Install Cloud Storage FUSE](/storage/docs/gcsfuse-install).\n\n4. [Mount the bucket in your training application](#how_it_works).\n\nWhat's next\n-----------\n\n- [See Cloud Storage FUSE documentation](/storage/docs/gcs-fuse).\n- [Learn about Cloud Storage FUSE pricing](/storage/docs/gcs-fuse#charges).\n- [Prepare your training application](/vertex-ai/docs/training/code-requirements) for use on Vertex AI."]]