GenAI LEGO Detective

by Vladimir Joshevski


Posted on April 20, 2025



LEGO Set Identification Using Generative AI

Inspiration: The Case of the Missing LEGO Part

Once upon a time, in a room filled with colorful bricks and half-finished constructions, a young boy named Lazarus was building an amazing robot. He was so excited to add the final touch: a sleek, white wheel. But! The wheel was nowhere to be found. Frustration mounted, threatening to derail his masterpiece. He looked everywhere — under the bed, in the kitchen, among the toys… it looked like the white wheel had disappeared. To stay calm, he showed me a part that was exactly the same and — Eureka! We had the exact same LEGO part in another set.

Two weeks later, the same thing happened with the Batman toy. Now Batman is wearing all black and yellow, but on one of the knees, he has a blue brick.

So this is how it all started. Why can’t we use AI to be our detective for LEGO and identify our LEGO parts or what other sets we can build?

Introduction

This project demonstrates the use of Generative AI to identify LEGO sets and their parts from images — a task relevant to inventory management, e-commerce, and fan communities. The solution involves using a SQLite database and a Google Gen AI model to process image URLs, identify LEGO sets, and retrieve their parts list.

Dataset & Database

The dataset is sourced from Rebrickable, a platform with detailed LEGO set and part data. The database schema includes 12 tables capturing set compositions, part details, and colors. I found this to be the fastest way to start with something as source data that can be provided to GenAI, as it does not have any info about the LEGO sets or details on which set has what data.

For simplicity, I picked SQLite as the fastest way to get up and running. The first steps were to create the tables and relationships.

Download the .csv files


import requests

def download_file(url, file_name):
    """
    Download a file from a URL and save it to the same folder as the script.
    
    Args:
        url (str): The URL of the file to download.
        file_name (str): The name of the file to save.
    """
    file_path = os.path.join(os.getcwd(), file_name)
    
    try:
        # Send a GET request to the URL
        response = requests.get(url, stream=True)
        
        # Check if the request was successful
        if response.status_code == 200:
            # Write the file content to the local folder
            with open(file_path, 'wb') as file:
                for chunk in response.iter_content(chunk_size=8192):
                    if chunk:
                        file.write(chunk)
            print(f"File '{file_name}' downloaded successfully to '{file_path}'")
        else:
            print(f"Failed to download '{file_name}'. Status code: {response.status_code}")
    except Exception as e:
        print(f"An error occurred while downloading '{file_name}': {str(e)}")

# Example usage
if __name__ == "__main__":
    # Define the base URL (replace with the actual URL where files are hosted)
    base_url = "https://cdn.rebrickable.com/media/downloads/"
    
    # List of files to download
    file_urls = [
        "sets.csv.gz",
        "partss.csv.gz",
    ]
    
    # Download each file
    for file_name in file_urls:
        url = base_url + file_name
        download_file(url, file_name)

Setup the DB


import sqlite3

conn = sqlite3.connect('lego.db')
conn.close()

Define the Database Schema

This step creates the tables in lego.db based on the schema. The schema includes additional columns (e.g.,part_material in parts, img_url in sets) to accommodate all data from the Rebrickable files. The code executes CREATE TABLE statements with appropriate primary and foreign key constraints, ensuring data integrity. The schema is also very well defined in the Rebrickable schema image

Define the SQL schema as a multi-line string

sql_schema = """
CREATE TABLE part_categories (
    id INTEGER PRIMARY KEY,
    name VARCHAR(200)
)
"""

def create_tables(db_name):
    """Connects to an SQLite database and creates the tables defined in the schema."""
    conn = None
    try:
        conn = sqlite3.connect(db_name)
        cursor = conn.cursor()

        # Execute each CREATE TABLE statement individually
        for statement in sql_schema.split(';'):
            statement = statement.strip()
            if statement:
                cursor.execute(statement)

        conn.commit()
        print(f"Tables created successfully in '{db_name}'")

    except sqlite3.Error as e:
        if conn:
            conn.rollback()
        print(f"Error creating tables: {e}")

    finally:
        if conn:
            conn.close()

if __name__ == "__main__":
    database_name = 'lego.db'  # You can change the database name here
    create_tables(database_name)

Data Import

Import the files into the database. This step will be skipped for the sake of time.

Next, we will focus on creating a function to verify database tables, read the schema, and perform queries that the AI will use to interact with the database. By creating

def list_tables() -> list[str]:
    """Retrieve the names of all tables in the database."""
    # code ommited 

def describe_table(table_name: str) -> list[tuple[str, str]]:
    """Look up the table schema.

    Returns:
      List of columns, where each entry is a tuple of (column, type).
    """
    # code ommited 

def execute_query(sql: str) -> list[list[str]]:
    """Execute an SQL statement, returning the results."""
    # code ommited 

Google AI Parameters Setup

For the best way to use it, we recommend following these steps:

  • Sign up for a Kaggle account (https://www.kaggle.com/) and learn how Notebooks work (https://www.kaggle.com/docs/notebooks) . Make sure to phone verify (https://www.kaggle.com/settings) your account, it’s necessary for the course’s code labs.
  • We also have a troubleshooting guide (https://www.kaggle.com/code/markishere/day-0-troubleshooting-and-faqs/) for the codelabs. Be sure to check there for solutions to common problems.
  • Sign up for an AI Studio (https://aistudio.google.com/) account and ensure you can generate an API key (https://aistudio.google.com/app/apikey).

Interact with Database Elements and Implement Function Calls

At this stage, after everything above is set and the correct Python libraries are installed, you can start working on the interaction with the database. Set the appropriate context.

# These are the Python functions defined above.
db_tools = [list_tables, describe_table, execute_query]

instruction = """You are a helpful chatbot that can interact with an SQL database
for a Lego sets, parts. Lego database has full inventory of all sets, parts, relationshipes. 
You will take the users questions and turn them into SQL
queries using the tools available. Once you have the information you need, you will
answer the user's question using the data returned.
Use list_tables to see what tables are present, describe_table to understand the
schema, and execute_query to issue an SQL SELECT query.

"""

client = genai.Client(api_key=GOOGLE_API_KEY)

# Start a chat with automatic function calling enabled.
chat = client.chats.create(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction=instruction,
        tools=db_tools,
    ),
)


Query LEGO Database for Set Parts via Gen AI

This step uses the Generative AI model to query the LEGO database (lego.db) for the parts of set 77242-1. The model processes a prompt to execute an SQL query and returns the results as a formatted table, showcasing AI-driven database interaction.


# Interact with the LEGO database using the Gen AI model
resp = chat.send_message("What are all the parts of the set 77242-1. Please list them in a table")
print(f"\n{resp.text}")

Query LEGO Database for Formula 1 Sets in 2025 via Gen AI

This step uses the Generative AI model to query the LEGO database (lego.db) for sets containing “Formula 1” or “F1” in their name and released in 2025. The model executes an SQL query and returns the results as a markdown table with set name, year, and set number, demonstrating AI-driven data retrieval.


# Query the LEGO database for Formula 1 sets in 2025 using the Gen AI model
resp = chat.send_message("What are all sets which are contains formula 1 or F1 in their name filed and year filed is 2025. Give me the answer out as makdown table for all the sets. Table should have set name, year and set number")
print(f"\n{resp.text}")

Identify LEGO Set from Image Using Gen AI

This step demonstrates the core Gen AI application: identifying a LEGO set from an image and listing its parts. We use a Generative AI model (assumed to be a vision-language model like Grok 3, accessible via an API) to process an image URL and extract the set number (77251-1, “McLaren F1 Team MCL38 Race Car”). The model generates a prompt that triggers a database query to retrieve the parts list, which is formatted as a table.

Before we move forward, we will change the context by adding one more note to the introduction about what to look for in the images.


# These are the Python functions defined above.
db_tools = [list_tables, describe_table, execute_query]

instruction = """You are a helpful chatbot that can interact with an SQL database
for a Lego sets, parts. Lego database has full inventory of all sets, parts, relationshipes. 
You will take the users questions and turn them into SQL
queries using the tools available. Once you have the information you need, you will
answer the user's question using the data returned.

If image is send in format .jpg or .png try to detect what is the set_id and then perform SQL query depending on the question.


Use list_tables to see what tables are present, describe_table to understand the
schema, and execute_query to issue an SQL SELECT query.

"""

client = genai.Client(api_key=GOOGLE_API_KEY)

# Start a chat with automatic function calling enabled.
chat = client.chats.create(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction=instruction,
        tools=db_tools,
    ),
)

Now the prompt will be as below:

image_url = "https://cdn.rebrickable.com/media/sets/77251-1.jpg"
prompt = f"What are the parts of this set shown in the image at this URL: {image_url}. Please list them in a table."
resp = chat.send_message(prompt)
print(f"\n{resp.text}")

Well, for some more advanced question queries, the databases of SQLite are not the best or most efficient. Maybe experimenting with other databases like Graph or Vector will bring better performance and fewer issues.

Overall, this was a very nice and interesting learning project for educational purposes. I highly recommend anyone who wants to get started with AI to go through the self-paced5-Day Gen AI Intensive Course with Google Learn Guide



Ai