Training Models with Mantra

Once you have some datasets and models, you can easily train them (a) locally and (b) on the cloud.

🏃 Training Locally

From your project root:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50

If you have a task that you want to train with:

$ mantra train my_model --dataset my_dataset --task my_task --batch-size 64 --epochs 50

Additional magic model hyperparameters can be referenced:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --dropout 0.5

For image datasets, you can specify things like dimensions:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --image-dim 256 256

For table datasets, you can specify features and targets:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --target my_target --features feature_1 feature_2

If you only want to save the best model weights:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --savebestonly

🚂 Training on AWS

To train a model on AWS, first configure your AWS credentials

$ mantra cloud

You will be asked for your your AWS API keys and AWS region preferences. Once complete make sure you have AWS CLI installed - this is a necessary dependency! You will also need to ensure your security group has the right permissions - e.g. ability to create and shut down instances.

Make sure to check the settings.py file and ensure that the instance type and AMI you want to launch are right for you. Most of the functionality has been tested with the AWS Deep Learning AMI. Depending on what type of instance you want to launch, you might need to contact AWS to ask them to increase your instance limit.

Danger

RESERVED AWS GPU INSTANCES CAN BE VERY EXPENSIVE TO TRAIN ON. ALWAYS ENSURE YOU ARE AWARE WHAT INSTANCES ARE RUNNING AND IF THEY HAVE BEEN PROPERLY SHUT DOWN OR TERMINATED

To train with the cloud, there are two main options. First you can spin up a reserved instance and close once training is complete. To do this just use the cloud flag:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --cloud

For model development, it’s recommended to using a –dev flag:

$ mantra train my_model --dataset my_dataset --batch-size 64 --epochs 50 --cloud --dev

This will create a development instance that isn’t terminated when training completes. This means you can use the same instance to run models on - it means setup time is a lot quicker (as all the dependencies are already sorted out). You can still shut this instance down when you’re not using it - and when you do need to use it again, training will automatically turn the instance on again.

You can see what mantra GPU instances are running on the cloud tab of the UI:

_images/cloud_panel.png

This is no substitute for checking on AWS itself what instances are running - always stay aware!

The other thing to be aware of is S3 storage costs. Mantra uses S3 as a central storage backend for datasets and also data that is generated during training - such as model weights. You can see your bucket name in settings.py. Be aware of how much you are currently storing, and if you are cost conscious, then remove files in S3 that you are no longer using.