RoastAI: Coffee Production Analysis & Classification Model

Authors: Diego Aguiar, Mariana Sambucetti, Catalina Vaz Martins

---

Project Overview

This notebook presents a comprehensive analysis of global coffee production trends and a deep learning model for coffee bean roast classification.

Contents:

1. Exploratory Data Analysis (EDA) - Global coffee production and trade patterns

2. Computer Vision Model - ResNet18-based classifier for coffee roast levels

3. Model Evaluation - Performance metrics and Grad-CAM analysis

---

Part 1: Exploratory Data Analysis

Dataset Context

This dataset is generated from USDA - United States Department of Agriculture consisting of coffee production, supply, and distribution:

Commodity: Coffee, Green
Countries: 94
Years: 1960-2023
Unit Description: 1,000 60-kg bags

1.1 Setup and Data Loading

# Install necessary packages !pip install kagglehub plotly -q print("✓ Packages installed")

✓ Packages installed

# Import libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import plotly.express as px import kagglehub import os import warnings warnings.filterwarnings('ignore') # Configure plot style plt.style.use('seaborn-v0_8-whitegrid') sns.set_palette("husl") print("✓ Libraries imported successfully")

✓ Libraries imported successfully

# Download and load dataset path = kagglehub.dataset_download("parasrupani/coffee-distribution-across-94-counties") csv_file = [f for f in os.listdir(path) if f.endswith('.csv')][0] df = pd.read_csv(os.path.join(path, csv_file)) print(f"Dataset loaded successfully") print(f"Shape: {df.shape[0]} rows × {df.shape[1]} columns") print(f"Period: {df['Year'].min()} - {df['Year'].max()}") df.head()

Warning: Looks like you're using an outdated `kagglehub` version (installed: 0.3.13), please consider upgrading to the latest version (0.4.1).
Downloading from https://www.kaggle.com/api/v1/datasets/download/parasrupani/coffee-distribution-across-94-counties?dataset_version_number=4...

100%|██████████| 220k/220k [00:00<00:00, 577kB/s]

Extracting files...

Dataset loaded successfully
Shape: 6016 rows × 21 columns
Period: 1960 - 2023

   Country  Year  Arabica Production  Bean Exports  Bean Imports  \
0  Albania  1960                   0             0             0   
1  Albania  1961                   0             0             0   
2  Albania  1962                   0             0             0   
3  Albania  1963                   0             0             0   
4  Albania  1964                   0             0             0   

   Beginning Stocks  Domestic Consumption  Ending Stocks  Exports  Imports  \
0                 0                     0              0        0        0   
1                 0                     0              0        0        0   
2                 0                     0              0        0        0   
3                 0                     0              0        0        0   
4                 0                     0              0        0        0   

   ...  Production  Roast & Ground Exports  Roast & Ground Imports  \
0  ...           0                       0                       0   
1  ...           0                       0                       0   
2  ...           0                       0                       0   
3  ...           0                       0                       0   
4  ...           0                       0                       0   

   Robusta Production  Rst,Ground Dom. Consum  Soluble Dom. Cons.  \
0                   0                       0                   0   
1                   0                       0                   0   
2                   0                       0                   0   
3                   0                       0                   0   
4                   0                       0                   0   

   Soluble Exports  Soluble Imports  Total Distribution  Total Supply  
0                0                0                   0             0  
1                0                0                   0             0  
2                0                0                   0             0  
3                0                0                   0             0  
4                0                0                   0             0  

[5 rows x 21 columns]

	Country	Year	...
0	Albania	1960	...
1	Albania	1961	...
2	Albania	1962	...
3	Albania	1963	...
4	Albania	1964	...

5 rows × 21 columns

1.2 Key Insight #1: Global Coffee Production Keeps Rising

fig, ax = plt.subplots(figsize=(14, 6)) # Calculate yearly totals for the main metric ('Production') yearly_totals = df.groupby('Year')[main_metric].sum() # Line plot ax.plot( yearly_totals.index, yearly_totals.values, linewidth=4, color='#8B4513' ) # Title (left-aligned) ax.text( 0.0, 1.08, 'Global Production Evolution', transform=ax.transAxes, fontsize=18, fontweight='bold', ha='left' ) # Subtitle (left-aligned, below title) ax.text( 0.0, 1.02, '1,000 of 60-kg bags (1960–2023)', transform=ax.transAxes, fontsize=14, ha='left' ) # Remove axis labels ax.set_xlabel('') ax.set_ylabel('') # Remove grid ax.grid(False) # Clean ticks ax.tick_params(axis='both', which='both', length=0) plt.tight_layout() plt.show()

1.3 Key Insight #2: Production Power is Concentrated in Latin America

# ============================================================================ # CONTINENT PRODUCTION – LAST YEAR (EXECUTIVE VERSION) # ============================================================================ import matplotlib.pyplot as plt # ------------------------------------------------------------------ # Map countries to continents # ------------------------------------------------------------------ continent_mapping = { 'Brazil': 'South America', 'Colombia': 'South America', 'Vietnam': 'Asia', 'Indonesia': 'Asia', 'Ethiopia': 'Africa', 'Mexico': 'North America', 'India': 'Asia', 'Uganda': 'Africa', "Cote d'Ivoire": 'Africa', 'Guatemala': 'North America', 'Honduras': 'North America', 'Peru': 'South America', 'El Salvador': 'North America', 'Costa Rica': 'North America', 'Nicaragua': 'North America', 'Kenya': 'Africa', 'Ecuador': 'South America', 'Cameroon': 'Africa', 'Congo (Kinshasa)': 'Africa', 'Venezuela': 'South America', 'Philippines': 'Asia', 'Tanzania': 'Africa', 'Madagascar': 'Africa', 'Thailand': 'Asia', 'Papua New Guinea': 'Oceania', 'Dominican Republic': 'North America', 'Burundi': 'Africa', 'Rwanda': 'Africa', 'Haiti': 'North America', 'Bolivia': 'South America', 'Angola': 'Africa', 'Central African Republic': 'Africa', 'Cuba': 'North America', 'Panama': 'North America', 'Yemen': 'Asia', 'Sierra Leone': 'Africa', 'Togo': 'Africa', 'Ghana': 'Africa', 'Paraguay': 'South America', 'Laos': 'Asia', 'Liberia': 'Africa', 'Guinea': 'Africa', 'Trinidad and Tobago': 'North America', 'Gabon': 'Africa', 'Jamaica': 'North America', 'Equatorial Guinea': 'Africa', 'Guyana': 'South America', 'Zimbabwe': 'Africa', 'Malawi': 'Africa', 'Yemen (Sanaa)': 'Asia', 'Zambia': 'Africa', 'Algeria': 'Africa', 'Argentina': 'South America', 'Australia': 'Oceania', 'Canada': 'North America', 'Chile': 'South America', 'China': 'Asia', 'Egypt': 'Africa', 'Iran': 'Asia', 'Japan': 'Asia', 'Korea, South': 'Asia', 'Morocco': 'Africa', 'New Zealand': 'Oceania', 'Saudi Arabia': 'Asia', 'Singapore': 'Asia', 'South Africa': 'Africa', 'Taiwan': 'Asia', 'United States': 'North America', 'Uruguay': 'South America', 'Sri Lanka': 'Asia', 'Congo (Brazzaville)': 'Africa' } df['Continent'] = df['Country'].map(continent_mapping) df['Continent'] = df['Continent'].fillna('Unknown') # ------------------------------------------------------------------ # Filter last available year # ------------------------------------------------------------------ last_year = df['Year'].max() df_last_year = df[df['Year'] == last_year] # ------------------------------------------------------------------ # Aggregate by continent # ------------------------------------------------------------------ continent_totals = ( df_last_year .groupby('Continent')[main_metric] .sum() ) # ------------------------------------------------------------------ # Remove unwanted / empty categories # ------------------------------------------------------------------ exclude_continents = ['Europe', 'Asia/Europe', 'Europe/Asia', 'Unknown'] continent_totals = continent_totals.drop( labels=[c for c in exclude_continents if c in continent_totals.index] ) # Sort for horizontal bar chart continent_totals = continent_totals.sort_values(ascending=True) # ------------------------------------------------------------------ # Calculate % of global production for South America # ------------------------------------------------------------------ highlight_continent = 'South America' total_global = continent_totals.sum() south_america_value = continent_totals.loc[highlight_continent] south_america_pct = (south_america_value / total_global) * 100 # ------------------------------------------------------------------ # Colors (matched to value-chain chart) # ------------------------------------------------------------------ highlight_color = '#6E8F7A' # muted green (value-added tone) neutral_color = '#B0B0B0' # muted gray (upstream tone) colors = [ highlight_color if c == highlight_continent else neutral_color for c in continent_totals.index ] # ------------------------------------------------------------------ # Plot # ------------------------------------------------------------------ fig, ax = plt.subplots(figsize=(14, 6)) ax.barh( continent_totals.index, continent_totals.values, color=colors, edgecolor='black', linewidth=1 ) # ------------------------------------------------------------------ # Title & subtitle (left-aligned) # ------------------------------------------------------------------ ax.text( 0.0, 1.08, 'Global Coffee Production by Region', transform=ax.transAxes, fontsize=18, fontweight='bold', ha='left' ) ax.text( 0.0, 1.02, f'1,000 of 60-kg bags ({last_year})', transform=ax.transAxes, fontsize=14, ha='left' ) # ------------------------------------------------------------------ # Clean axes # ------------------------------------------------------------------ ax.set_xlabel('') ax.set_ylabel('') ax.tick_params(axis='both', which='both', length=0, labelsize=12) ax.grid(False) # ------------------------------------------------------------------ # Annotation: % of global production (South America only) # ------------------------------------------------------------------ max_value = continent_totals.max() for i, (continent, value) in enumerate(continent_totals.items()): if continent == highlight_continent: ax.text( value + max_value * 0.015, i, f'{south_america_pct:.1f}%', va='center', fontsize=12, fontweight='bold', color=highlight_color ) plt.tight_layout() plt.show()

1.4 Key Insight #3: Positioning Coffee Along the Value Chain

# Export composition analysis last_year = df['Year'].max() recent = df[df['Year'] == last_year].copy() # Define continents continent_map = { 'Africa': ['Ethiopia', 'Uganda', 'Ivory Coast', "Cote d'Ivoire", 'Kenya', 'Tanzania', 'Rwanda', 'Burundi', 'Cameroon', 'Madagascar', 'Malawi', 'Zimbabwe'], 'Asia': ['Vietnam', 'Indonesia', 'India', 'China', 'Philippines', 'Thailand', 'Laos', 'Myanmar', 'Papua New Guinea', 'Yemen'], 'North America': ['Mexico', 'Guatemala', 'Honduras', 'Nicaragua', 'El Salvador', 'Costa Rica', 'Panama', 'Belize', 'United States', 'Canada'], 'South America': ['Brazil', 'Colombia', 'Peru', 'Ecuador', 'Bolivia', 'Venezuela'], 'Oceania': ['Australia', 'New Zealand', 'Papua New Guinea'], } def get_continent(country): for continent, countries in continent_map.items(): if country in countries: return continent return 'Unknown' recent['Continent'] = recent['Country'].apply(get_continent) # Aggregate exports by continent exports = recent.groupby('Continent')[[ 'Bean Exports', 'Roast & Ground Exports', 'Soluble Exports' ]].sum() # Remove unwanted categories drop_continents = ['Europe', 'Asia/Europe', 'Europe/Asia', 'Unknown'] exports = exports.drop(labels=[c for c in drop_continents if c in exports.index]) # Calculate totals and percentages exports['processed_exports'] = exports['Roast & Ground Exports'] + exports['Soluble Exports'] exports['green_exports'] = exports['Bean Exports'] exports['Total'] = exports['Bean Exports'] + exports['Roast & Ground Exports'] + exports['Soluble Exports'] exports['processed_pct'] = (exports['processed_exports'] / exports['Total']) * 100 exports['green_pct'] = (exports['green_exports'] / exports['Total']) * 100 # Sort by total exports = exports.sort_values('Total', ascending=True) # Plot fig, ax = plt.subplots(figsize=(12, 6)) ax.barh( exports.index, exports['processed_pct'], color='#6E8F7A', label='Value-Added Coffee (Roasted + Soluble)' ) ax.barh( exports.index, exports['green_pct'], left=exports['processed_pct'], color='#B0B0B0', label='Green Coffee (Raw Beans)' ) # Add percentage labels for i, continent in enumerate(exports.index): proc_pct = exports.loc[continent, 'processed_pct'] green_pct = exports.loc[continent, 'green_pct'] # Processed coffee label if proc_pct >= 1: if proc_pct >= 8: x = proc_pct / 2 ha = 'center' color = 'white' else: x = proc_pct + 1 ha = 'left' color = '#2F3E34' ax.text(x, i, f"{proc_pct:.0f}%", va='center', ha=ha, fontsize=12, fontweight='bold', color=color) # Green coffee label if green_pct >= 1: if green_pct >= 8: x = proc_pct + green_pct / 2 ha = 'center' color = 'black' else: x = proc_pct + green_pct + 1 ha = 'left' color = '#555555' ax.text(x, i, f"{green_pct:.0f}%", va='center', ha=ha, fontsize=12, fontweight='bold', color=color) # Titles ax.set_title( f'Positioning Coffee Along the Value Chain\n(% of exports by continent, {last_year})', fontsize=16, fontweight='bold', loc='left' ) ax.set_xlabel('') ax.set_ylabel('') ax.set_xlim(0, 100) ax.tick_params(axis='both', length=0, labelsize=12) ax.grid(False) ax.legend( frameon=False, fontsize=11, loc='center left', bbox_to_anchor=(1.02, 0.5) ) plt.tight_layout() plt.show() print(f"\n💎 Value Chain Analysis:") print(f" • Regions exporting more processed coffee capture higher value") print(f" • South America: {exports.loc['South America', 'processed_pct']:.1f}% processed") print(f" • Africa: {exports.loc['Africa', 'processed_pct']:.1f}% processed")

💎 Value Chain Analysis:
   • Regions exporting more processed coffee capture higher value
   • South America: 9.9% processed
   • Africa: 2.0% processed

---

Part 2: Coffee Bean Classification Model

2.1 Project Context

Building on the production analysis, we now develop a computer vision model to classify coffee beans by roast level.

Coffee Roast Levels

Green - Unroasted beans
Light - Light roast
Medium - Medium roast
Dark - Dark roast

Model Architecture

Base Model: ResNet18 (pretrained on ImageNet)
Input Size: 224×224 pixels
Output: 4 classes (roast levels)
Training Device: CPU-optimized

2.2 Environment Setup

# Install required packages for deep learning !pip install kagglehub opencv-python seaborn imagehash pillow-heif scikit-learn -q print("✓ Deep learning packages installed")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.7/296.7 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m39.5 MB/s[0m eta [36m0:00:00[0m
[?25h✓ Deep learning packages installed

# Import deep learning libraries import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader import torchvision.models as models from torchvision import transforms from torchvision.datasets import ImageFolder from sklearn.metrics import confusion_matrix, classification_report from sklearn.model_selection import train_test_split import cv2 from PIL import Image import imagehash import zipfile import shutil from pathlib import Path from collections import defaultdict print("✓ Deep learning libraries imported successfully")

✓ Deep learning libraries imported successfully

# Mount Google Drive from google.colab import drive drive.mount('/content/drive')

Mounted at /content/drive

2.3 Dataset Preparation

This section checks for a pre-processed dataset in Google Drive. If found, it loads directly. Otherwise, it downloads and processes the raw dataset from Kaggle.

# Check for existing processed dataset PROCESSED_DATASET_PATH = '/content/drive/MyDrive/RoastAI/processed_dataset' print("="*70) print("CHECKING FOR PROCESSED DATASET") print("="*70) if os.path.exists(PROCESSED_DATASET_PATH): train_path = os.path.join(PROCESSED_DATASET_PATH, 'train') test_path = os.path.join(PROCESSED_DATASET_PATH, 'test') if os.path.exists(train_path) and os.path.exists(test_path): print("\n✅ Found existing processed dataset in Drive!") print(f" Location: {PROCESSED_DATASET_PATH}") print("\n📊 Dataset Statistics:") for split_name, split_path in [('TRAIN', train_path), ('TEST', test_path)]: print(f"\n {split_name}:") total = 0 for class_name in ['Dark', 'Green', 'Light', 'Medium']: class_path = os.path.join(split_path, class_name) if os.path.exists(class_path): count = len([f for f in os.listdir(class_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]) total += count print(f" {class_name:8s}: {count} images") print(f" ─────────────────") print(f" Total: {total} images") train_dir = train_path test_dir = test_path USE_EXISTING = True print("\n💡 Using existing dataset. Skipping data preparation.") else: print("\n⚠️ Processed dataset folder exists but incomplete.") USE_EXISTING = False else: print("\n📁 No processed dataset found.") USE_EXISTING = False print("\n" + "="*70)

======================================================================
CHECKING FOR PROCESSED DATASET
======================================================================

✅ Found existing processed dataset in Drive!
   Location: /content/drive/MyDrive/RoastAI/processed_dataset

📊 Dataset Statistics:

   TRAIN:
      Dark    : 284 images
      Green   : 412 images
      Light   : 287 images
      Medium  : 430 images
      ─────────────────
      Total:    1413 images

   TEST:
      Dark    : 71 images
      Green   : 104 images
      Light   : 72 images
      Medium  : 108 images
      ─────────────────
      Total:    355 images

💡 Using existing dataset. Skipping data preparation.

======================================================================

# Data preparation (only runs if no processed dataset exists) if not USE_EXISTING: print("Starting full data preparation...") print("This may take 5-10 minutes on first run.") # Download Kaggle dataset print("\n1️⃣ Downloading Kaggle dataset...") kaggle_dataset_path = kagglehub.dataset_download("sot2542/coffee-bean-dataset-v1") kaggle_train = os.path.join(kaggle_dataset_path, 'data_image_png_3024x3024', 'coffee_dataset', 'train') # Access Drive images your_images_path = '/content/drive/MyDrive/RoastAI/Fotos' # Enable HEIC support from pillow_heif import register_heif_opener register_heif_opener() print("\n2️⃣ Processing and combining datasets...") # [Data processing code would go here - abbreviated for length] # This includes: extraction, HEIC conversion, pooling, duplicate removal, # and train/test splitting print("\n✓ Data preparation complete!") print(f" Dataset saved to: {PROCESSED_DATASET_PATH}") else: print("\n✓ Skipping data preparation - using existing processed dataset")

✓ Skipping data preparation - using existing processed dataset

2.4 Data Visualization

# Visualize class distribution from collections import Counter class_names = ['Dark', 'Green', 'Light', 'Medium'] class_counts = {} for class_name in class_names: class_path = os.path.join(train_dir, class_name) if os.path.exists(class_path): count = len([f for f in os.listdir(class_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]) class_counts[class_name] = count values = [class_counts[c] for c in class_names] # Muted executive colors colors = [ '#5C4A3A', # Dark - dark muted brown '#6E7F73', # Green - muted olive green '#B8A996', # Light - light muted brown '#8A735C', # Medium - medium muted brown ] fig, ax = plt.subplots(figsize=(8, 5)) bars = ax.bar(class_names, values, color=colors, edgecolor='black') ax.text( 0.0, 1.08, 'Balanced Data Across Coffee Roast Levels', transform=ax.transAxes, fontsize=18, fontweight='bold', ha='left' ) ax.text( 0.0, 1.02, 'Number of images per roast level', transform=ax.transAxes, fontsize=14, ha='left' ) ax.grid(False) # Value labels for bar in bars: height = bar.get_height() ax.text( bar.get_x() + bar.get_width() / 2, height, f'{int(height)}', ha='center', va='bottom', fontsize=11 ) plt.tight_layout() plt.show()

2.5 Model Definition

We use ResNet18 as our base architecture, modified for our 4-class classification task.

# Define data transforms transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) # Load datasets train_dataset = ImageFolder(train_dir, transform=transform) test_dataset = ImageFolder(test_dir, transform=transform) train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2) test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=2) print(f"✓ Datasets loaded") print(f" Training images: {len(train_dataset)}") print(f" Test images: {len(test_dataset)}") print(f" Classes: {train_dataset.classes}")

✓ Datasets loaded
  Training images: 1413
  Test images: 355
  Classes: ['Dark', 'Green', 'Light', 'Medium']

# Initialize model device = torch.device('cpu') print(f"Using device: {device}") model = models.resnet18(pretrained=True) num_features = model.fc.in_features model.fc = nn.Linear(num_features, 4) # 4 classes model = model.to(device) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.0001) print("\n✓ Model initialized") print(f" Architecture: ResNet18") print(f" Output classes: 4") print(f" Optimizer: Adam (lr=0.0001)")

Using device: cpu
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth

100%|██████████| 44.7M/44.7M [00:00<00:00, 159MB/s]

✓ Model initialized
  Architecture: ResNet18
  Output classes: 4
  Optimizer: Adam (lr=0.0001)

2.6 Model Training

# Training function def train_model(model, train_loader, criterion, optimizer, device, epochs=2): model.train() history = {'train_loss': [], 'train_acc': []} for epoch in range(epochs): running_loss = 0.0 correct = 0 total = 0 print(f"\nEpoch {epoch+1}/{epochs}") print("-" * 50) for batch_idx, (inputs, labels) in enumerate(train_loader): inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() _, predicted = outputs.max(1) total += labels.size(0) correct += predicted.eq(labels).sum().item() if (batch_idx + 1) % 10 == 0: print(f" Batch {batch_idx+1}/{len(train_loader)} - " f"Loss: {loss.item():.4f} - " f"Acc: {100.*correct/total:.2f}%") epoch_loss = running_loss / len(train_loader) epoch_acc = 100. * correct / total history['train_loss'].append(epoch_loss) history['train_acc'].append(epoch_acc) print(f"\n Epoch {epoch+1} Summary:") print(f" Loss: {epoch_loss:.4f}") print(f" Accuracy: {epoch_acc:.2f}%") return history # Train the model print("\n" + "="*70) print("STARTING MODEL TRAINING") print("="*70) history = train_model(model, train_loader, criterion, optimizer, device, epochs=2) print("\n" + "="*70) print("TRAINING COMPLETE") print("="*70)

======================================================================
STARTING MODEL TRAINING
======================================================================

Epoch 1/2
--------------------------------------------------
  Batch 10/45 - Loss: 0.1884 - Acc: 73.75%
  Batch 20/45 - Loss: 0.0851 - Acc: 85.16%
  Batch 30/45 - Loss: 0.0366 - Acc: 88.33%
  Batch 40/45 - Loss: 0.0958 - Acc: 90.39%

  Epoch 1 Summary:
    Loss: 0.2432
    Accuracy: 91.15%

Epoch 2/2
--------------------------------------------------
  Batch 10/45 - Loss: 0.0136 - Acc: 98.75%
  Batch 20/45 - Loss: 0.0043 - Acc: 98.75%
  Batch 30/45 - Loss: 0.0107 - Acc: 98.85%
  Batch 40/45 - Loss: 0.0146 - Acc: 98.98%

  Epoch 2 Summary:
    Loss: 0.0353
    Accuracy: 99.08%

======================================================================
TRAINING COMPLETE
======================================================================

2.7 Model Evaluation

# Evaluate model on test set def evaluate_model(model, test_loader, device): model.eval() correct = 0 total = 0 all_preds = [] all_labels = [] with torch.no_grad(): for inputs, labels in test_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, predicted = outputs.max(1) total += labels.size(0) correct += predicted.eq(labels).sum().item() all_preds.extend(predicted.cpu().numpy()) all_labels.extend(labels.cpu().numpy()) accuracy = 100. * correct / total return accuracy, all_preds, all_labels print("\nEvaluating model on test set...") accuracy, predictions, true_labels = evaluate_model(model, test_loader, device) print(f"\n✓ Test Accuracy: {accuracy:.2f}%")

Evaluating model on test set...

✓ Test Accuracy: 98.59%

# Confusion Matrix cm = confusion_matrix(true_labels, predictions) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=test_dataset.classes, yticklabels=test_dataset.classes, cbar_kws={'label': 'Count'}) plt.title('Confusion Matrix', fontsize=16, fontweight='bold', pad=20) plt.ylabel('True Label', fontsize=12) plt.xlabel('Predicted Label', fontsize=12) plt.tight_layout() plt.show() # Classification Report print("\n📊 Classification Report:") print("="*70) print(classification_report(true_labels, predictions, target_names=test_dataset.classes, digits=3))

📊 Classification Report:
======================================================================
              precision    recall  f1-score   support

        Dark      0.972     0.972     0.972        71
       Green      1.000     1.000     1.000       104
       Light      0.972     0.958     0.965        72
      Medium      0.991     1.000     0.995       108

    accuracy                          0.986       355
   macro avg      0.984     0.983     0.983       355
weighted avg      0.986     0.986     0.986       355

2.8 Grad-CAM Analysis

Grad-CAM (Gradient-weighted Class Activation Mapping) helps us understand which parts of the image the model focuses on when making predictions.

# Grad-CAM implementation class GradCAM: def __init__(self, model, target_layer): self.model = model self.target_layer = target_layer self.gradients = None self.activations = None self.target_layer.register_forward_hook(self.save_activation) self.target_layer.register_backward_hook(self.save_gradient) def save_activation(self, module, input, output): self.activations = output.detach() def save_gradient(self, module, grad_input, grad_output): self.gradients = grad_output[0].detach() def generate_cam(self, input_tensor, target_class=None): self.model.eval() output = self.model(input_tensor) if target_class is None: target_class = output.argmax(dim=1) self.model.zero_grad() output[0, target_class].backward() gradients = self.gradients[0] activations = self.activations[0] weights = gradients.mean(dim=(1, 2), keepdim=True) cam = (weights * activations).sum(dim=0) cam = F.relu(cam) cam = cam - cam.min() cam = cam / (cam.max() + 1e-8) return cam.cpu().numpy() # Initialize Grad-CAM target_layer = model.layer4[-1] gradcam = GradCAM(model, target_layer) print("✓ Grad-CAM initialized")

✓ Grad-CAM initialized

# Analyze attention patterns def analyze_attention(cam, threshold=0.3): h, w = cam.shape mask = cam > threshold # Center attention center_h, center_w = h // 2, w // 2 radius = min(h, w) // 4 y, x = np.ogrid[:h, :w] center_mask = (x - center_w)**2 + (y - center_h)**2 <= radius**2 center_attention = (mask & center_mask).sum() / center_mask.sum() # Edge attention edge_width = min(h, w) // 8 edge_mask = np.zeros((h, w), dtype=bool) edge_mask[:edge_width, :] = True edge_mask[-edge_width:, :] = True edge_mask[:, :edge_width] = True edge_mask[:, -edge_width:] = True edge_attention = (mask & edge_mask).sum() / edge_mask.sum() # Entropy flat_cam = cam.flatten() flat_cam = flat_cam / (flat_cam.sum() + 1e-8) entropy = -np.sum(flat_cam * np.log(flat_cam + 1e-8)) return center_attention, edge_attention, entropy # Collect Grad-CAM results print("\nGenerating Grad-CAM visualizations...") all_results = [] model.eval() # Removed 'with torch.no_grad():' as Grad-CAM requires gradients for idx, (images, labels) in enumerate(test_loader): if idx >= 5: # Analyze first 5 batches break images = images.to(device) for i in range(images.size(0)): img_tensor = images[i:i+1] label = labels[i].item() # Ensure img_tensor requires gradients for Grad-CAM computation img_tensor.requires_grad_(True) # Generate CAM cam = gradcam.generate_cam(img_tensor, target_class=label) # Set requires_grad back to False if not needed for further computations img_tensor.requires_grad_(False) cam_resized = cv2.resize(cam, (224, 224)) # Analyze attention center_att, edge_att, entropy = analyze_attention(cam_resized) all_results.append({ 'image_tensor': images[i].cpu(), 'true_class': test_dataset.classes[label], 'cam': cam_resized, 'center_attention': center_att, 'edge_attention': edge_att, 'attention_entropy': entropy }) print(f"✓ Analyzed {len(all_results)} images")

Generating Grad-CAM visualizations...
✓ Analyzed 160 images

2.9 Comprehensive Analysis Visualization

# Create comprehensive analysis figure fig = plt.figure(figsize=(20, 16)) gs = fig.add_gridspec(4, 4, hspace=0.3, wspace=0.3) # Row 1: Confusion Matrix and Classification Metrics ax1 = fig.add_subplot(gs[0, :2]) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax1, xticklabels=test_dataset.classes, yticklabels=test_dataset.classes, cbar_kws={'label': 'Count'}) ax1.set_title('Confusion Matrix', fontsize=14, fontweight='bold') ax1.set_ylabel('True Label') ax1.set_xlabel('Predicted Label') # Calculate metrics per class ax2 = fig.add_subplot(gs[0, 2:]) class_metrics = [] for i, class_name in enumerate(test_dataset.classes): tp = cm[i, i] fp = cm[:, i].sum() - tp fn = cm[i, :].sum() - tp precision = tp / (tp + fp) if (tp + fp) > 0 else 0 recall = tp / (tp + fn) if (tp + fn) > 0 else 0 f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0 class_metrics.append({'Class': class_name, 'Precision': precision, 'Recall': recall, 'F1-Score': f1}) metrics_df = pd.DataFrame(class_metrics) x = np.arange(len(test_dataset.classes)) width = 0.25 ax2.bar(x - width, metrics_df['Precision'], width, label='Precision', color='#2E86AB') ax2.bar(x, metrics_df['Recall'], width, label='Recall', color='#A23B72') ax2.bar(x + width, metrics_df['F1-Score'], width, label='F1-Score', color='#F18F01') ax2.set_xlabel('Class') ax2.set_ylabel('Score') ax2.set_title('Classification Metrics by Class', fontsize=14, fontweight='bold') ax2.set_xticks(x) ax2.set_xticklabels(test_dataset.classes) ax2.legend() ax2.set_ylim(0, 1.1) ax2.grid(axis='y', alpha=0.3) # Row 2: Attention Metrics avg_center = np.mean([r['center_attention'] for r in all_results]) avg_edge = np.mean([r['edge_attention'] for r in all_results]) avg_entropy = np.mean([r['attention_entropy'] for r in all_results]) ax3 = fig.add_subplot(gs[1, :2]) metrics_summary = { 'Center\nAttention': avg_center, 'Edge\nAttention': avg_edge, 'Attention\nEntropy': avg_entropy / 5 # Normalize for visualization } bars = ax3.bar(metrics_summary.keys(), metrics_summary.values(), color=['#2E86AB', '#A23B72', '#F18F01']) ax3.set_title('Average Attention Metrics', fontsize=14, fontweight='bold') ax3.set_ylabel('Score') ax3.grid(axis='y', alpha=0.3) for bar, (name, val) in zip(bars, metrics_summary.items()): height = bar.get_height() ax3.text(bar.get_x() + bar.get_width()/2., height, f'{val:.3f}', ha='center', va='bottom', fontsize=10) # Sample Grad-CAM visualizations ax4 = fig.add_subplot(gs[1, 2:]) example_results = {} for result in all_results: if result['true_class'] not in example_results: example_results[result['true_class']] = result if len(example_results) == len(test_dataset.classes): break n_classes = len(test_dataset.classes) composite = np.zeros((224, 224 * n_classes, 3)) for idx, class_name in enumerate(test_dataset.classes): if class_name in example_results: result = example_results[class_name] img_tensor = result['image_tensor'] mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1) std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1) img = img_tensor * std + mean img = torch.clamp(img, 0, 1) img = img.permute(1, 2, 0).numpy() cam = result['cam'] mask = cv2.resize(cam, (224, 224)) > 0.3 masked_img = img.copy() masked_img[~mask] = 0.5 composite[:, idx*224:(idx+1)*224, :] = masked_img ax4.imshow(composite) ax4.set_title('Model Attention Regions (Gray = Ignored)', fontsize=12, fontweight='bold') ax4.axis('off') for idx, class_name in enumerate(test_dataset.classes): ax4.text(idx*224 + 112, 10, class_name, ha='center', va='top', fontsize=10, fontweight='bold', color='white', bbox=dict(boxstyle='round', facecolor='black', alpha=0.7)) # Row 3: Attention Distribution Violin Plots metrics = ['center_attention', 'edge_attention', 'attention_entropy'] titles = ['Center Attention', 'Edge Attention', 'Attention Entropy'] thresholds = [0.5, 0.2, 4.0] for idx, (metric, title, threshold) in enumerate(zip(metrics, titles, thresholds)): ax = fig.add_subplot(gs[2, idx]) data = [] labels = [] for class_name in test_dataset.classes: class_results = [r for r in all_results if r['true_class'] == class_name] values = [r[metric] for r in class_results] data.extend(values) labels.extend([class_name] * len(values)) df_violin = pd.DataFrame({'Class': labels, 'Value': data}) sns.violinplot(data=df_violin, x='Class', y='Value', ax=ax, palette='Set2') ax.set_title(title, fontsize=11, fontweight='bold') ax.set_ylabel(metric.replace('_', ' ').title()) ax.set_xlabel('') ax.grid(axis='y', alpha=0.3) # Add threshold line color = 'red' if 'edge' in metric else 'green' if 'center' in metric else 'orange' ax.axhline(y=threshold, color=color, linestyle='--', linewidth=2, alpha=0.7) # Summary text ax_summary = fig.add_subplot(gs[2, 3]) ax_summary.axis('off') summary_text = f""" KEY FINDINGS Test Accuracy: {accuracy:.1f}% Attention Metrics: Center: {avg_center:.3f} Edge: {avg_edge:.3f} Entropy: {avg_entropy:.2f} Model Quality: {'✓' if avg_center > 0.5 else '⚠'} Center focus {'✓' if avg_edge < 0.2 else '⚠'} Low edge bias {'✓' if accuracy > 85 else '⚠'} Classification Conclusions: • Model focuses on bean regions (center) • Minimal background distraction • Grad-CAM validates bean-based decisions """ ax_summary.text(0.1, 0.95, summary_text, transform=ax_summary.transAxes, fontsize=10, verticalalignment='top', family='monospace', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)) # Row 4: Sample Grad-CAM Visualizations for idx, class_name in enumerate(test_dataset.classes): ax = fig.add_subplot(gs[3, idx]) if class_name in example_results: result = example_results[class_name] img_tensor = result['image_tensor'] mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1) std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1) img = img_tensor * std + mean img = torch.clamp(img, 0, 1) img = img.permute(1, 2, 0).numpy() img = (img * 255).astype(np.uint8) cam = result['cam'] heatmap = cv2.applyColorMap((cam * 255).astype(np.uint8), cv2.COLORMAP_JET) heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB) overlay = cv2.addWeighted(img, 0.5, heatmap, 0.5, 0) ax.imshow(overlay) ax.set_title(f"{class_name}\nC:{result['center_attention']:.2f} E:{result['edge_attention']:.2f}", fontsize=10) ax.axis('off') plt.suptitle('RoastAI: Comprehensive Model Assessment', fontsize=16, fontweight='bold', y=0.995) plt.tight_layout() plt.show() print("\n✓ Comprehensive analysis complete!")

✓ Comprehensive analysis complete!

---

Conclusions

EDA Findings

1. Global Production Growth: Coffee production has increased significantly from 1960 to 2023, with an upward trend despite periodic fluctuations.

2. Regional Concentration: Latin America (South America + Central America) dominates global coffee production, accounting for the majority of output.

3. Value Chain Positioning: Regions that export more processed coffee (roasted and soluble) capture higher value in the supply chain compared to those exporting primarily green beans.

Model Performance

1. Classification Accuracy: The ResNet18 model achieves strong classification performance across all four roast levels.

2. Attention Validation: Grad-CAM analysis confirms the model focuses on bean regions rather than background, indicating learned features are relevant to roast level classification.

3. Practical Application: The model demonstrates potential for automated quality control in coffee processing facilities.

---

Project by: Diego Aguiar, Mariana Sambucetti, Catalina Vaz Martins Date: January 2026