Package installation

Use Mambda instead of conda to install packages.


import pandas as pd

Drop rows if frequency of a class id below n

sometimes it can be useful to drop rows in a dataset which appear to few times, in the most extreme cases datasets might have only one observation for a particular class.

rows = ['1', '2', '3', '2','1']; df = pd.DataFrame({'a':rows, 'b': rows})
a b
0 1 1
1 2 2
2 3 3
3 2 2
4 1 1
a b
0 1 1
1 2 2
3 2 2
4 1 1


Some things related to fastai (version 2)

from import *

Train a learner so we can do some inference

path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
def label_func(f): return f[0].isupper()
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), num_workers=0)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
epoch train_loss valid_loss error_rate time
0 0.999297 0.718680 0.324763 01:26
epoch train_loss valid_loss error_rate time
0 0.722174 0.548298 0.272666 01:41

Making predictions

preds = learn.get_preds()

Decoding predictions

To decode the output from get_preds. Can decode a label using the dls. Accessing the categorize.decode attribute to get a human readable value.


If we have the predictions tensors i.e. the predictions for each class:

tensor([0.2110, 0.7890])

numpy argmax can be used to get the index of the most likely predictions`


Can also access the confidence for the maximum prediction directly;


Test time augmentation

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), batch_tfms=[*aug_transforms()],num_workers=0)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
epoch train_loss valid_loss error_rate time
0 0.898633 0.700572 0.299729 01:11


Learner.tta(ds_idx=1, dl=None, n=4, item_tfms=None, batch_tfms=None, beta=0.25, use_max=False)

Return predictions on the ds_idx dataset or dl using Test Time Augmentation

Show in docs

preds, targs = learn.tta()
error_rate(preds, targs).item() 


Accessing input data

Accessing the input from the data loaders can be done through items. The L here is just to automagically limit the number of items displayed in the notebook

(#5912) [Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_76.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/shiba_inu_8.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/newfoundland_87.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/english_cocker_spaniel_11.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pug_50.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/chihuahua_58.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/British_Shorthair_74.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pomeranian_75.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Russian_Blue_104.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_77.jpg')...]

Indexing dataloader items

Items can be indexed in the usual way which can be useful to get back to orginal input based on an index



Filtering repositories

If you end up with a big repository which you want to split off into a new repo but maintain the history for the stuff in that folder. Use to filter the repo itself. To do this I pulled the old repo into a new folder to make sure I didn't destroy anything by mistake. You filter out a repo based on a folder and other potential filters. Once this repo has been filtered you can push it to a new repo.