Deduplicate Data Records Using JavaScript Array Methods

How It Works โ€“ Data Deduplication in n8n

This tutorial demonstrates how to remove duplicate records from a dataset using JavaScript logic inside n8n's Code nodes. It simulates real-world data cleaning by generating sample user data with intentional duplicates (based on email addresses) and walks you through the process of deduplication step-by-step.

The process includes:

Creating Sample Data with duplicates.
Filtering Out Duplicates using filter() and findIndex() based on email.
Displaying Cleaned Results with simple statistics for before-and-after comparison.
This is ideal for scenarios like CRM imports, ETL processes, and general data hygiene.

โš™๏ธ Set-Up Steps

๐Ÿ”น Step 1: Manual Trigger
Node: When clicking 'Test workflow'
Purpose: Initiates the workflow manually for testing.

๐Ÿ”น Step 2: Generate Sample Data
Node: Create Sample Data (Code node)
What it does:

Creates 6 users, including 2 intentional duplicates (by email).
Outputs data as usersJson with metadata (totalCount, message).
Mimics real-world messy datasets.
๐Ÿ”น Step 3: Deduplicate the Data
Node: Deduplicate Users (Code node)
What it does:

Parses usersJson.
Uses .filter() + .findIndex() to keep only the first instance of each email.
Logs total, unique, and removed counts.
Outputs clean user list as separate items.
๐Ÿ”น Step 4: Display Results
Node: Display Results (Code node)
What it does:

Outputs structured summary:
Unique users
Status
Timestamp
Prepares results for review or downstream use.
๐Ÿ“ˆ Sample Output

Original count: 6 users
Deduplicated count: 4 users
Duplicates removed: 2 users
๐ŸŽฏ Learning Objectives

You'll learn how to:

Use .filter() and .findIndex() in n8n Code nodes
Clean JSON data within workflows
Create simple, effective deduplication pipelines
Output structured summaries for reporting or integration
๐Ÿง Best Practices

Validate input format (e.g., JSON schema)
Handle null or missing fields gracefully
Use logging for visibility
Add error handling for production use
Use pagination/chunking for large datasets