Watch our short overview video below to get a first impression on how data selection works with WhatToLabel.
You can use our web app, our command-line interface, or the docker container to filter your dataset. The command-line interface comes in handy when already using a cloud server for your deep learning model training.
We allow you to optimize your dataset for various tasks. The command-line interface, as well as the web application, allows for coarse optimization for classification, object detection, segmentation, and GANs. More fine-grained control can be achieved with the docker container.
After you submit your dataset with your preferred parameters our AI data filtering software – whom we named Boris – analyzes it. Boris automatically removes corrupt files and rebalances the dataset on a feature level. Based on your filter preference nearby duplicates are removed or a new dataset is created based on the most important samples. We will share more details about how exactly we filter the datasets in this blog post. Click here.
You will be able to either download a list of final filenames or a clean dataset. Additionally, we provide you with a report showing you more details about how Boris processed your data.