tl;dr
It wasn’t possible to get a very good f2 score on this website classification dataset. As the UK web archive say:
We expect that a appropriate classifier might require more information about each site in order to produce reliable results, and are looking at augmenting this dataset with further information in the future. Options include:
For each site, make the titles of every page on that site available.
For each site, extract a set of keywords that summarise the site, via the full-text index.
I suspect that having a either of these additional components would help improve the performance of the classifier.