Friday 23 February 2018 at 10:59

Google CoLaboratory File Persistence

By Eric Antoine Scuccimarra

It took me a while to figure out exactly what was going on with the files I was uploading and creating using Google's CoLaboratory. Each user has a VM where their notebooks run and the VM only runs for 12 hours before it is spun down and recycled, taking with it any files you may have downloaded or created. The second day I used it I was surpised that the files I had spent time downloading, unzipping and importing were no longer there, and I had deleted the code to do that, so if you are using CoLab make sure you keep the code to get your data files!

I also tried to have two notebooks running at the same time thinking it would speed up some work I was doing, but it seems as if all of a user's notebooks run in the same VM, so there really is no advantage to having multiple notebooks running.

There is an instruction notebook that explains how to save files to Google Drive, which works very well and is easy to use. To do that run:

from google.colab import auth
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build

auth.authenticate_user()

Then you have to enter a code to authenticate yourself. Then I use this function to save files:

drive_service = build('drive', 'v3')

def save_file_to_drive(name, path):
  file_metadata = {
    'name': name,
    'mimeType': 'application/octet-stream'
  }
  
  media = MediaFileUpload(path, 
                        mimetype='application/octet-stream',
                        resumable=True)
  
  created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()

  print('File ID: {}'.format(created.get('id')))
  return created

The function takes two arguments, the name of the file and the path to it, and write the file to the root of your Google drive.

Note - This post was updated because my original guess as to how the VMs work was completely wrong. The VM instance exists for 12 hours, they are not tied to the runtime.

Labels: coding, machine_learning, tensorflow, google

Comments

Please login to comment