wao.ai logowao.ai blog - data labeling and applied machine learning
Blog HomeCan AI Guess Our Sex?Dataset vs Ground-Truth DatasetIntroducing the Universal Data ToolLabel Bounding Boxes with the Universal Data ToolSingle-Label Image Classification with Google AutoMLSingle-Label Image Classification with KerasPrepare the DataGet Sex LabelsTrain a ModelResults

Single-Label Image Classification with Keras

Recently, generated.photos released a royalty-free dataset of images of human faces. But unlike most datasets, this dataset is completed generated by an AI. None of the faces are real!

I thought this would be a fun dataset to teach a machine learning algorithm to classify sex on. For this tutorial, I'll be using wao.ai to build my dataset and keras to run the algorithm. I downloaded about 2,000 faces from generated.photos. Believe it or not, none of the photos below are real!

some sample faces

Prepare the Data

To prepare the data, I resized each photo to 256x256 pixels, this makes it easier to fit all the photos in RAM. Resizing the photos can be done quickly with image magick with a command like this:

convert *.jpg -resize 256x256\\! *.jpg

I repackaged all the images we'd use in this zip file so you can skip the resizing if you're following along.

Get Sex Labels

To train our algorithm, we'll need to a sex label for each face. We could manually create our labels using the universal data tool, but since we're dealing with more than 100 images I'm going use the wao.ai workforce.

Downloading a csv from wao.ai, we get a CSV full of the labels we'll learn (shown below). You can download the labels.csv here to continue following along.

wao.ai csv download screenshot

Train a Model

There are a lot of ways to choose a model to do training with. For a simple computer vision task, especially a classification task, I like to start with a simple model that performs well on the MNIST digit dataset (a common machine learning benchmark). I don't remember where I found this model, but it's good for small (1000-10000) image datasets. Note: To use it, we'll need to convert the images to grayscale and resize them to a resolution of 64x64.

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
num_output_classes = 2 # 0 = male, 1 = female
input_img_size = (64, 64, 1) # 64x64 image with 1 color channel
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_img_size))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(num_output_classes, activation="softmax"))
model.compile(
loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=["accuracy"],
)

To see the surrounding code, including data transformations and training parameters, check out all my work in this notebook.

Results

In the end, this model was able to predict sex with about 80% accuracy. Not great, but a reasonable start considering we're working on a small 2,000 sample dataset.

Let's look at the data qualitatively. First, let's see the top 25 our model classified as male:

most male faces

Now let's see what our model classified as female:

most female faces

Interesting! I see a lot of similarities in the general outline of males and the general outline of females (look at the hair!). It would also appear that this dataset has many females wearing hats.

To see where our model is struggling, we can also see where the model was most unsure, i.e. where |P(male) - P(female)| was smallest.

most ambiguous faces

Children, indirect face angles and neutral hair cuts seem to confuse the model.

To enhance the effectiveness of the model, we could try a different neural network architecture, create descriptive features (e.g. features that identify children or facial hair) or label additional data for testing.

Big thanks to generated.photos for providing the excellent AI-generated faces!

Can you do better? I'd love to know, reach out to me on twitter!

Learn about applied machine learning with a new article every two weeks. Unsubscribe anytime.
You can also follow @waoai_ on twitter.