Conda

Package installation

Use Mambda instead of conda to install packages.

We've done what we do at @QuantStack: making things run faster. Conda has been getting slower with @condaforge's growing package registry -- we're fighting it with raw C++ power in the #mamba package!

Try now:

conda install mamba -c conda-forge/label/mamba-alpha -c conda-forge pic.twitter.com/tnlVQKAbv4
— Wolf Vollprecht (@wuoulf) March 25, 2019

Pandas

import pandas as pd

Drop rows if frequency of a class id below n

sometimes it can be useful to drop rows in a dataset which appear to few times, in the most extreme cases datasets might have only one observation for a particular class.

rows = ['1', '2', '3', '2','1']; df = pd.DataFrame({'a':rows, 'b': rows})

df

df[df.groupby('a')['a'].transform('count').ge(2)]

fastai

Some things related to fastai (version 2)

Inference

Sources of information:

https://forums.fast.ai/t/doing-predictions-and-showing-results-with-v2-questions-best-practice-thread/62915/6

from fastai.vision.all import *

Train a learner so we can do some inference

path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
def label_func(f): return f[0].isupper()
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), num_workers=0)

dls.show_batch()

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1)

Making predictions

preds = learn.get_preds()

Decoding predictions

To decode the output from get_preds. Can decode a label using the dls. Accessing the categorize.decode attribute to get a human readable value.

dls.categorize.decode(0)

'False'

If we have the predictions tensors i.e. the predictions for each class:

preds[0][0]

tensor([0.2110, 0.7890])

numpy argmax can be used to get the index of the most likely predictions`

dls.categorize.decode(np.argmax(preds[0][0]))

'True'

Can also access the confidence for the maximum prediction directly;

max(preds[0][0])

tensor(0.7890)

Test time augmentation

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(32), batch_tfms=[*aug_transforms()],num_workers=0)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fit(1)

doc(learn.tta)

preds, targs = learn.tta()

error_rate(preds, targs).item()

0.28890395164489746

Dataloaders

Accessing input data

Accessing the input from the data loaders can be done through items. The L here is just to automagically limit the number of items displayed in the notebook

L(dls.items)

(#5912) [Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_76.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/shiba_inu_8.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/newfoundland_87.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/english_cocker_spaniel_11.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pug_50.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/chihuahua_58.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/British_Shorthair_74.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/pomeranian_75.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Russian_Blue_104.jpg'),Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/Bengal_77.jpg')...]

Indexing dataloader items

Items can be indexed in the usual way which can be useful to get back to orginal input based on an index

dls.items[0]

Path('/Users/dvanstrien/.fastai/data/oxford-iiit-pet/images/german_shorthaired_54.jpg')

Git

Filtering repositories

If you end up with a big repository which you want to split off into a new repo but maintain the history for the stuff in that folder. Use https://github.com/newren/git-filter-repo/ to filter the repo itself. To do this I pulled the old repo into a new folder to make sure I didn't destroy anything by mistake. You filter out a repo based on a folder and other potential filters. Once this repo has been filtered you can push it to a new repo.

Virtual memory

Conda

Package installation

Pandas

Drop rows if frequency of a class id below n

fastai

Inference

Making predictions

Decoding predictions

Test time augmentation

`Learner.tta`[source]

Dataloaders

Accessing input data

Indexing dataloader items

Git

Filtering repositories

Conda

Package installation

Pandas

Drop rows if frequency of a class id below n

fastai

Inference

Making predictions

Decoding predictions

Test time augmentation

Learner.tta[source]

Dataloaders

Accessing input data

Indexing dataloader items

Git

Filtering repositories

`Learner.tta`[source]