Image Datasets and OpenML
This page covers image datasets, the Olivetti faces dataset, OpenML fetching, vector-to-image reshaping, and subplot grids.
What you should be able to do
- Understand flat image vectors versus 2D image arrays.
- Fetch datasets by name or ID.
- Convert face data to NumPy and reshape a 4096-value vector into 64 x 64 pixels.
- Display many images with subplot grids.
Reusable patterns
- A 64 x 64 image becomes a vector of length 4096 when flattened.
- plt.subplot(rows, columns, position) uses positions starting from 1.
- enumerate is useful when a loop needs both a counter and each data row.
Datasets that contain images: Olivetti faces
Listing 1. Fetch the olivetti_faces dataset from scikit-learn
# fetch the olivetti_faces dataset from scikit-learn
from sklearn import datasets
olivetti_dataset = datasets.fetch_olivetti_faces()Expected text output or note
downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /root/scikit_learn_dataThe difference between load and fetch: load_iris() loads a small dataset that comes with the library, while fetch_olivetti_faces() can download a dataset from the internet and save it in the local cache.
Listing 2. Code listing 2
olivetti_dataset.keys()Expected text output or note
dict_keys(['data', 'images', 'target', 'DESCR'])- data: images converted into flat vectors; one image has shape (4096,) because 64 x 64 = 4096.
- images: images in their original 2D shape, 64 x 64.
- target: person labels.
- DESCR: dataset description.
Listing 3. Code listing 3
print(olivetti_dataset.data)Expected text output or note
[[0.30991736 0.3677686 0.41735536 ... 0.15289256 0.16115703 0.1570248 ]
[0.45454547 0.47107437 0.5123967 ... 0.15289256 0.15289256 0.15289256]
[0.3181818 0.40082645 0.49173555 ... 0.14049587 0.14876033 0.15289256]
...
[0.5 0.53305787 0.607438 ... 0.17768595 0.14876033 0.19008264]
[0.21487603 0.21900827 0.21900827 ... 0.57438016 0.59090906 0.60330576]
[0.5165289 0.46280992 0.28099173 ... 0.35950413 0.3553719 0.38429752]]Listing 4. Code listing 4
print(olivetti_dataset.DESCR)Expected text output or note
.. _olivetti_faces_dataset:
The Olivetti faces dataset
--------------------------
`This dataset contains a set of face images`_ taken between April 1992 and
April 1994 at AT&T Laboratories Cambridge. The
:func:`sklearn.datasets.fetch_olivetti_faces` function is the data
fetching / caching function that downloads the data
archive from AT&T.
.. _This dataset contains a set of face images: https://cam-orl.co.uk/facedatabase.html
As described on the original website:
There are ten different images of each of 40 distinct subjects. For some
subjects, the images were taken at different times, varying the lighting,
facial expressions (open / closed eyes, smiling / not smiling) and facial
details (glasses / no glasses). All the images were taken against a dark
homogeneous background with the subjects in an upright, frontal position
(with tolerance for some side movement).
**Data Set Characteristics:**
================= =====================
Classes 40
Samples total 400
Dimensionality 4096
Features real, between 0 and 1
================= =====================
The image is quantized to 256 grey levels and stored as unsigned 8-bit
integers; the loader will convert these to floating point values on the
interval [0, 1], which are easier to work with for many algorithms.
The "target" for this database is an integer from 0 to 39 indicating the
identity of the person pictured; however, with only 10 examples per class, this
relatively small dataset is more interesting from an unsupervised or
semi-supervised perspective.
The original dataset consisted of 92 x 112, while the version available here
consists of 64x64 images.
When using these images, please give credit to AT&T Laboratories Cambridge.OpenML and dataset fetching
Listing 5. Code listing 5
!pip install openmlExpected text output or note
Collecting openml
Downloading openml-0.15.1-py3-none-any.whl.metadata (10 kB)
Collecting liac-arff>=2.4.0 (from openml)
Downloading liac-arff-2.5.0.tar.gz (13 kB)
Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting xmltodict (from openml)
Downloading xmltodict-1.0.4-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from openml) (2.32.4)
Requirement already satisfied: scikit-learn>=0.18 in /usr/local/lib/python3.12/dist-packages (from openml) (1.6.1)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.12/dist-packages (from openml) (2.9.0.post0)
Requirement already satisfied: pandas>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from openml) (2.2.2)
Requirement already satisfied: scipy>=0.13.3 in /usr/local/lib/python3.12/dist-packages (from openml) (1.16.3)
Requirement already satisfied: numpy>=1.6.2 in /usr/local/lib/python3.12/dist-packages (from openml) (2.0.2)
Collecting minio (from openml)
Downloading minio-7.2.20-py3-none-any.whl.metadata (6.5 kB)
Requirement already satisfied: pyarrow in /usr/local/lib/python3.12/dist-packages (from openml) (18.1.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from openml) (4.67.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from openml) (26.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas>=1.0.0->openml) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas>=1.0.0->openml) (2026.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil->openml) (1.17.0)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn>=0.18->openml) (1.5.3)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn>=0.18->openml) (3.6.0)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.12/dist-packages (from minio->openml) (25.1.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from minio->openml) (2026.5.20)
Collecting pycryptodome (from minio->openml)
Downloading pycryptodome-3.23.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.12/dist-packages (from minio->openml) (4.15.0)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.12/dist-packages (from minio->openml) (2.5.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->openml) (3.4.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->openml) (3.15)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.12/dist-packages (from argon2-cffi->minio->openml) (25.1.0)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from argon2-cffi-bindings->argon2-cffi->minio->openml) (2.0.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->minio->openml) (3.0)
Downloading openml-0.15.1-py3-none-any.whl (160 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m160.4/160.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading minio-7.2.20-py3-none-any.whl (93 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.8/93.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading xmltodict-1.0.4-py3-none-any.whl (13 kB)
Downloading pycryptodome-3.23.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: liac-arff
Building wheel for liac-arff (setup.py) ... [?25l[?25hdone
Created wheel for liac-arff: filename=liac_arff-2.5.0-py3-none-any.whl size=11717 sha256=1712308773e768b14802e06924a1641aeb578eba6492ef6349c74d591139c5bf
Stored in directory: /root/.cache/pip/wheels/a9/ac/cf/c2919807a5c623926d217c0a18eb5b457e5c19d242c3b5963a
Successfully built liac-arff
Installing collected packages: xmltodict, pycryptodome, liac-arff, minio, openml
Successfully installed liac-arff-2.5.0 minio-7.2.20 openml-0.15.1 pycryptodome-3.23.0 xmltodict-1.0.4Listing 6. Code listing 6
!rm -rf ~/scikit_learn_data/openmlListing 7. Datasets are fetched by name or numeric id
# datasets are fetched by name or numeric ID
import openml
from sklearn import datasets
from sklearn.datasets import fetch_openml
face_dataset = datasets.fetch_openml(name = "olivetti_faces", version = 1) # or face_dataset = datasets.fetch_openml(data_id = 61)Listing 8. Fetched without openml because openml is not working here
# fetched without OpenML because OpenML is not working here
from sklearn import datasets
from sklearn.datasets import fetch_olivetti_faces
face_dataset = fetch_olivetti_faces()Listing 9. Code listing 9
face_dataset.keys()Expected text output or note
dict_keys(['data', 'images', 'target', 'DESCR'])For OpenML objects:
- data: input data.
- target: class labels.
- frame: tabular data, if available.
- categories: categorical information.
- feature_names: feature names.
- target_names: target-variable names.
- DESCR: description.
- details: dataset metadata.
- url: OpenML link.
Listing 10. Code listing 10
face_dataset.DESCRExpected text output or note
'.. _olivetti_faces_dataset:\n\nThe Olivetti faces dataset\n--------------------------\n\n`This dataset contains a set of face images`_ taken between April 1992 and\nApril 1994 at AT&T Laboratories Cambridge. The\n:func:`sklearn.datasets.fetch_olivetti_faces` function is the data\nfetching / caching function that downloads the data\narchive from AT&T.\n\n.. _This dataset contains a set of face images: https://cam-orl.co.uk/facedatabase.html\n\nAs described on the original website:\n\n There are ten different images of each of 40 distinct subjects. For some\n subjects, the images were taken at different times, varying the lighting,\n facial expressions (open / closed eyes, smiling / not smiling) and facial\n details (glasses / no glasses). All the images were taken against a dark\n homogeneous background with the subjects in an upright, frontal position\n (with tolerance for some side movement).\n\n**Data Set Characteristics:**\n\n================= =====================\nClasses 40\nSamples total 400\nDimensionality 4096\nFeatures real, between 0 and 1\n================= =====================\n\nThe image is quantized to 256 grey levels and stored as unsigned 8-bit\nintegers; the loader will convert these to floating point values on the\ninterval [0, 1], which are easier to work with for many algorithms.\n\nThe "target" for this database is an integer from 0 to 39 indicating the\nidentity of the person pictured; however, with only 10 examples per class, this\nrelatively small dataset is more interesting from an unsupervised or\nsemi-supervised perspective.\n\nThe original dataset consisted of 92 x 112, while the version available here\nconsists of 64x64 images.\n\nWhen using these images, please give credit to AT&T Laboratories Cambridge.\n'Listing 11. Code listing 11
face_dataset.data.shapeExpected text output or note
(400, 4096)Listing 12. Pandas form; learn how to work with pandas or convert to a numpy array
# pandas form; learn how to work with pandas or convert to a NumPy array
face_dataset.dataExpected text output or note
array([[0.30991736, 0.3677686 , 0.41735536, ..., 0.15289256, 0.16115703,
0.1570248 ],
[0.45454547, 0.47107437, 0.5123967 , ..., 0.15289256, 0.15289256,
0.15289256],
[0.3181818 , 0.40082645, 0.49173555, ..., 0.14049587, 0.14876033,
0.15289256],
...,
[0.5 , 0.53305787, 0.607438 , ..., 0.17768595, 0.14876033,
0.19008264],
[0.21487603, 0.21900827, 0.21900827, ..., 0.57438016, 0.59090906,
0.60330576],
[0.5165289 , 0.46280992, 0.28099173, ..., 0.35950413, 0.3553719 ,
0.38429752]], dtype=float32)Listing 13. Target contains the class label for each image: 40 different people, 10 images per person
face_dataset.target # target contains the class label for each image: 40 different people, 10 images per personExpected text output or note
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18,
18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25,
25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30,
30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35,
35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39,
39, 39, 39, 39, 39, 39, 39, 39, 39])Converting data and displaying images
Listing 14. Convert to numpy
# convert to NumPy
import numpy as np
X = np.array(face_dataset.data) # X is the feature matrix; X = input data
y = np.array(face_dataset.target) # y contains the class labels; y = labels
print(X)
print("----------------------------------------------------------------------")
print(y)Expected text output or note
[[0.30991736 0.3677686 0.41735536 ... 0.15289256 0.16115703 0.1570248 ]
[0.45454547 0.47107437 0.5123967 ... 0.15289256 0.15289256 0.15289256]
[0.3181818 0.40082645 0.49173555 ... 0.14049587 0.14876033 0.15289256]
...
[0.5 0.53305787 0.607438 ... 0.17768595 0.14876033 0.19008264]
[0.21487603 0.21900827 0.21900827 ... 0.57438016 0.59090906 0.60330576]
[0.5165289 0.46280992 0.28099173 ... 0.35950413 0.3553719 0.38429752]]
----------------------------------------------------------------------
[ 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7
7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9
9 9 9 9 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14
14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16
16 16 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 19 19
19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21
21 21 21 21 22 22 22 22 22 22 22 22 22 22 23 23 23 23 23 23 23 23 23 23
24 24 24 24 24 24 24 24 24 24 25 25 25 25 25 25 25 25 25 25 26 26 26 26
26 26 26 26 26 26 27 27 27 27 27 27 27 27 27 27 28 28 28 28 28 28 28 28
28 28 29 29 29 29 29 29 29 29 29 29 30 30 30 30 30 30 30 30 30 30 31 31
31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 32 32 32 33 33 33 33 33 33
33 33 33 33 34 34 34 34 34 34 34 34 34 34 35 35 35 35 35 35 35 35 35 35
36 36 36 36 36 36 36 36 36 36 37 37 37 37 37 37 37 37 37 37 38 38 38 38
38 38 38 38 38 38 39 39 39 39 39 39 39 39 39 39]Listing 15. Get the first image data and reshape it into 64x64 pixels
import matplotlib.pyplot as plt
import numpy as np
# get the first image data and reshape it into 64x64 pixels
first_face_image = X[0].reshape(64, 64)
plt.imshow(first_face_image, cmap = "gray") # for grayscale, reshape the vector into an image and set cmap
# or plt.imshow(first_face_image, cmap = "binary") for a black-and-white colormap
# or plt.imshow(first_face_image, cmap = "cool") for the cool colormapExpected text output or note
<matplotlib.image.AxesImage at 0x7d813582bcb0>
<Figure size 640x480 with 1 Axes>
[visual output omitted; run the code to display the image or chart]Practice task. from the first 100 images, display every fifth image.
Listing 16. For subplot numbering, start from 1 when drawing multiple plots, but be careful with data indexes
i = 1 # for subplot numbering, start from 1 when drawing multiple plots, but be careful with data indexes
for row in X[0:100:5]:
plt.subplot(4, 5, i) # creates a grid of 4 rows and 5 columns and selects the i-th position
image = row.reshape(64, 64)
plt.imshow(image, cmap = "gray")
i += 1Expected text output or note
<Figure size 640x480 with 20 Axes>
[visual output omitted; run the code to display the image or chart]Listing 17. Second method
# second method:
for i,data in enumerate(X[0:100:5], start=1): # if start is not defined, it starts from 0
plt.subplot(4, 5, i)
image=data.reshape(64,64)
plt.imshow(image, cmap='gray')
# enumerate is useful when a loop needs both the index and the valueExpected text output or note
<Figure size 640x480 with 20 Axes>
[visual output omitted; run the code to display the image or chart]Practice task. display images from the 11th to the 29th, every second image, in 2 rows and 5 columns.
Listing 18. Code listing 18
i = 1
for row in X[10:30:2]:
plt.subplot(2, 5, i)
image = row.reshape(64, 64)
plt.imshow(image, cmap = "gray")
i += 1Expected text output or note
<Figure size 640x480 with 10 Axes>
[visual output omitted; run the code to display the image or chart]