GenAI LEGO Detective
Posted on April 20, 2025
LEGO Set Identification Using Generative AI
Inspiration: The Case of the Missing LEGO Part
Once upon a time, in a room filled with colorful bricks and half-finished constructions, a young boy named Lazarus was building an amazing robot. He was so excited to add the final touch: a sleek, white wheel. But! The wheel was nowhere to be found. Frustration mounted, threatening to derail his masterpiece. He looked everywhere — under the bed, in the kitchen, among the toys… it looked like the white wheel had disappeared. To stay calm, he showed me a part that was exactly the same and — Eureka! We had the exact same LEGO part in another set.
Two weeks later, the same thing happened with the Batman toy. Now Batman is wearing all black and yellow, but on one of the knees, he has a blue brick.
So this is how it all started. Why can’t we use AI to be our detective for LEGO and identify our LEGO parts or what other sets we can build?
Introduction
This project demonstrates the use of Generative AI to identify LEGO sets and their parts from images — a task relevant to inventory management, e-commerce, and fan communities. The solution involves using a SQLite database and a Google Gen AI model to process image URLs, identify LEGO sets, and retrieve their parts list.
Dataset & Database
The dataset is sourced from Rebrickable, a platform with detailed LEGO set and part data. The database schema includes 12 tables capturing set compositions, part details, and colors. I found this to be the fastest way to start with something as source data that can be provided to GenAI, as it does not have any info about the LEGO sets or details on which set has what data.
For simplicity, I picked SQLite as the fastest way to get up and running. The first steps were to create the tables and relationships.
Download the .csv
files
import requests
def download_file(url, file_name):
"""
Download a file from a URL and save it to the same folder as the script.
Args:
url (str): The URL of the file to download.
file_name (str): The name of the file to save.
"""
file_path = os.path.join(os.getcwd(), file_name)
try:
# Send a GET request to the URL
response = requests.get(url, stream=True)
# Check if the request was successful
if response.status_code == 200:
# Write the file content to the local folder
with open(file_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
file.write(chunk)
print(f"File '{file_name}' downloaded successfully to '{file_path}'")
else:
print(f"Failed to download '{file_name}'. Status code: {response.status_code}")
except Exception as e:
print(f"An error occurred while downloading '{file_name}': {str(e)}")
# Example usage
if __name__ == "__main__":
# Define the base URL (replace with the actual URL where files are hosted)
base_url = "https://cdn.rebrickable.com/media/downloads/"
# List of files to download
file_urls = [
"sets.csv.gz",
"partss.csv.gz",
]
# Download each file
for file_name in file_urls:
url = base_url + file_name
download_file(url, file_name)
Setup the DB
import sqlite3
conn = sqlite3.connect('lego.db')
conn.close()
Define the Database Schema
This step creates the tables in lego.db
based on the schema. The schema includes additional columns (e.g.,part_material
in parts
, img_url
in sets
) to accommodate all data from the Rebrickable files. The code executes CREATE TABLE
statements with appropriate primary and foreign key constraints, ensuring data integrity. The schema is also very well defined in the Rebrickable schema image
Define the SQL schema as a multi-line string
sql_schema = """
CREATE TABLE part_categories (
id INTEGER PRIMARY KEY,
name VARCHAR(200)
)
"""
def create_tables(db_name):
"""Connects to an SQLite database and creates the tables defined in the schema."""
conn = None
try:
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
# Execute each CREATE TABLE statement individually
for statement in sql_schema.split(';'):
statement = statement.strip()
if statement:
cursor.execute(statement)
conn.commit()
print(f"Tables created successfully in '{db_name}'")
except sqlite3.Error as e:
if conn:
conn.rollback()
print(f"Error creating tables: {e}")
finally:
if conn:
conn.close()
if __name__ == "__main__":
database_name = 'lego.db' # You can change the database name here
create_tables(database_name)
Data Import
Import the files into the database. This step will be skipped for the sake of time.
Next, we will focus on creating a function to verify database tables, read the schema, and perform queries that the AI will use to interact with the database. By creating
def list_tables() -> list[str]:
"""Retrieve the names of all tables in the database."""
# code ommited
def describe_table(table_name: str) -> list[tuple[str, str]]:
"""Look up the table schema.
Returns:
List of columns, where each entry is a tuple of (column, type).
"""
# code ommited
def execute_query(sql: str) -> list[list[str]]:
"""Execute an SQL statement, returning the results."""
# code ommited
Google AI Parameters Setup
For the best way to use it, we recommend following these steps:
- Sign up for a Kaggle account (https://www.kaggle.com/) and learn how Notebooks work (https://www.kaggle.com/docs/notebooks) . Make sure to phone verify (https://www.kaggle.com/settings) your account, it’s necessary for the course’s code labs.
- We also have a troubleshooting guide (https://www.kaggle.com/code/markishere/day-0-troubleshooting-and-faqs/) for the codelabs. Be sure to check there for solutions to common problems.
- Sign up for an AI Studio (https://aistudio.google.com/) account and ensure you can generate an API key (https://aistudio.google.com/app/apikey).
Interact with Database Elements and Implement Function Calls
At this stage, after everything above is set and the correct Python libraries are installed, you can start working on the interaction with the database. Set the appropriate context.
# These are the Python functions defined above.
db_tools = [list_tables, describe_table, execute_query]
instruction = """You are a helpful chatbot that can interact with an SQL database
for a Lego sets, parts. Lego database has full inventory of all sets, parts, relationshipes.
You will take the users questions and turn them into SQL
queries using the tools available. Once you have the information you need, you will
answer the user's question using the data returned.
Use list_tables to see what tables are present, describe_table to understand the
schema, and execute_query to issue an SQL SELECT query.
"""
client = genai.Client(api_key=GOOGLE_API_KEY)
# Start a chat with automatic function calling enabled.
chat = client.chats.create(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction=instruction,
tools=db_tools,
),
)
Query LEGO Database for Set Parts via Gen AI
This step uses the Generative AI model to query the LEGO database (lego.db
) for the parts of set 77242-1. The model processes a prompt to execute an SQL query and returns the results as a formatted table, showcasing AI-driven database interaction.
# Interact with the LEGO database using the Gen AI model
resp = chat.send_message("What are all the parts of the set 77242-1. Please list them in a table")
print(f"\n{resp.text}")
Query LEGO Database for Formula 1 Sets in 2025 via Gen AI
This step uses the Generative AI model to query the LEGO database (lego.db
) for sets containing “Formula 1” or “F1” in their name and released in 2025. The model executes an SQL query and returns the results as a markdown table with set name, year, and set number, demonstrating AI-driven data retrieval.
# Query the LEGO database for Formula 1 sets in 2025 using the Gen AI model
resp = chat.send_message("What are all sets which are contains formula 1 or F1 in their name filed and year filed is 2025. Give me the answer out as makdown table for all the sets. Table should have set name, year and set number")
print(f"\n{resp.text}")
Identify LEGO Set from Image Using Gen AI
This step demonstrates the core Gen AI application: identifying a LEGO set from an image and listing its parts. We use a Generative AI model (assumed to be a vision-language model like Grok 3, accessible via an API) to process an image URL and extract the set number (77251-1, “McLaren F1 Team MCL38 Race Car”). The model generates a prompt that triggers a database query to retrieve the parts list, which is formatted as a table.
Before we move forward, we will change the context by adding one more note to the introduction about what to look for in the images.
# These are the Python functions defined above.
db_tools = [list_tables, describe_table, execute_query]
instruction = """You are a helpful chatbot that can interact with an SQL database
for a Lego sets, parts. Lego database has full inventory of all sets, parts, relationshipes.
You will take the users questions and turn them into SQL
queries using the tools available. Once you have the information you need, you will
answer the user's question using the data returned.
If image is send in format .jpg or .png try to detect what is the set_id and then perform SQL query depending on the question.
Use list_tables to see what tables are present, describe_table to understand the
schema, and execute_query to issue an SQL SELECT query.
"""
client = genai.Client(api_key=GOOGLE_API_KEY)
# Start a chat with automatic function calling enabled.
chat = client.chats.create(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction=instruction,
tools=db_tools,
),
)
Now the prompt will be as below:
image_url = "https://cdn.rebrickable.com/media/sets/77251-1.jpg"
prompt = f"What are the parts of this set shown in the image at this URL: {image_url}. Please list them in a table."
resp = chat.send_message(prompt)
print(f"\n{resp.text}")
Well, for some more advanced question queries, the databases of SQLite are not the best or most efficient. Maybe experimenting with other databases like Graph or Vector will bring better performance and fewer issues.
Overall, this was a very nice and interesting learning project for educational purposes. I highly recommend anyone who wants to get started with AI to go through the self-paced5-Day Gen AI Intensive Course with Google Learn Guide
Previous article