Conda

Package installation

Use Mambda instead of conda to install packages.

Pandas

import pandas as pd

Drop rows if frequency of a class id below n

sometimes it can be useful to drop rows in a dataset which appear to few times, in the most extreme cases datasets might have only one observation for a particular class.

rows = ['1', '2', '3', '2','1']; df = pd.DataFrame({'a':rows, 'b': rows})
df
a b
0 1 1
1 2 2
2 3 3
3 2 2
4 1 1
df[df.groupby('a')['a'].transform('count').ge(2)] 
a b
0 1 1
1 2 2
3 2 2
4 1 1

fastai

Some things related to fastai (version 2)

from fastai.vision.all import *

Train a learner so we can do some inference

path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
def label_func(f): return f[0].isupper()
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), num_workers=0)
dls.show_batch()
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.999297 0.718680 0.324763 01:26
epoch train_loss valid_loss error_rate time
0 0.722174 0.548298 0.272666 01:41

Making predictions

preds = learn.get_preds()

Decoding predictions

To decode the output from get_preds. Can decode a label using the dls. Accessing the categorize.decode attribute to get a human readable value.

dls.categorize.decode(0)
'False'

If we have the predictions tensors i.e. the predictions for each class:

preds[0][0]
tensor([0.2110, 0.7890])

numpy argmax can be used to get the index of the most likely predictions`

dls.categorize.decode(np.argmax(preds[0][0]))
'True'

Can also access the confidence for the maximum prediction directly;

max(preds[0][0])
tensor(0.7890)

Test time augmentation

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), batch_tfms=[*aug_transforms()],num_workers=0)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fit(1)
epoch train_loss valid_loss error_rate time
0 0.898633 0.700572 0.299729 01:11
doc(learn.tta)

Learner.tta[source]

Learner.tta(ds_idx=1, dl=None, n=4, item_tfms=None, batch_tfms=None, beta=0.25, use_max=False)

Return predictions on the ds_idx dataset or dl using Test Time Augmentation

Show in docs

preds, targs = learn.tta()
error_rate(preds, targs).item() 
0.28890395164489746

Dataloaders

Accessing input data

Accessing the input from the data loaders can be done through items. The L here is just to automagically limit the number of items displayed in the notebook

L(dls.items)
(#5912) [Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_76.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/shiba_inu_8.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/newfoundland_87.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/english_cocker_spaniel_11.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pug_50.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/chihuahua_58.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/British_Shorthair_74.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pomeranian_75.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Russian_Blue_104.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_77.jpg')...]

Indexing dataloader items

Items can be indexed in the usual way which can be useful to get back to orginal input based on an index

dls.items[0]
Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/german_shorthaired_54.jpg')

Git

Filtering repositories

If you end up with a big repository which you want to split off into a new repo but maintain the history for the stuff in that folder. Use https://github.com/newren/git-filter-repo/ to filter the repo itself. To do this I pulled the old repo into a new folder to make sure I didn't destroy anything by mistake. You filter out a repo based on a folder and other potential filters. Once this repo has been filtered you can push it to a new repo.