Problem-solving guide
This page summarizes the recurring operations from the documentation as tasks you should be able to perform from memory.
Core Python and files
- Use variables to store values, then combine them with expressions.
- Use a loop and accumulator when a result must be built step by step.
- Use
with open(...)to write text files safely. - Know which file commands are Colab-specific:
drive.mount,files.download, andfiles.upload.
Images
- Load with Pillow using
Image.open(path), or with OpenCV usingcv2.imread(path). - Inspect
.size,.mode, and.shape. - Crop with array slicing:
image[y1:y2, x1:x2]. - Convert OpenCV BGR images to RGB before displaying with Matplotlib.
Datasets and arrays
- For scikit-learn datasets, use
dataset.datafor features anddataset.targetfor labels. - Use
dataset.target_namesanddataset.feature_namesto interpret the numbers. - Use NumPy slicing:
array[start:stop:step, columns]. - Use
np.unique(y, return_counts=True)to count classes.
Train/test splitting
- Manual slicing is possible, but
train_test_splitis the standard tool. - Always split X and y together so rows and labels remain aligned.
- Use
random_statewhen you want reproducible splits.
Plotting
- Use
plt.plotfor line plots andplt.scatterfor point clouds. - Use labels and legends to distinguish multiple series.
- Use
plt.subplot(rows, columns, position)to build grids of many images.
Clustering
- Clustering is unsupervised: it groups data without true class labels.
- Create
KMeans(n_clusters=...), then usefitandpredict. - Use centroids to understand cluster centers.
- Use elbow and silhouette methods to evaluate cluster choices.
Regression
- Regression predicts continuous numbers.
- Use
LinearRegression().fit(X_train, y_train). - For one feature, reshape X with
reshape(-1, 1). - Interpret
intercept_andcoef_as the regression equation.
Classification
- Classification predicts discrete class labels.
- Train KNN with
KNeighborsClassifier(n_neighbors=...). - Measure performance with accuracy, confusion matrix, precision, recall, F1-score, and
classification_report. - Compare models by using the same train/test split.