Data Cleaning Script

Generate robust Python or R snippets to instantly clean, format, and prepare raw, messy datasets for analysis.

Use Case

Ideal for data scientists and analysts who spend too much time on the “janitorial” work of data cleaning and want to automate repetitive tasks.

The Prompt

I want you to act as a Data Engineer. Write a [Python/R] script using [Pandas/Tidyverse] to clean the following dataset.

Dataset Preview/Description:
- Column A: Date (formats varies, e.g., 2023/01/01 and 01-01-23)
- Column B: Price (includes currency symbols and commas)
- Column C: Category (has many typos and duplicates)

Requirements:
1. Normalize all dates to 'YYYY-MM-DD' format.
2. Convert 'Price' to a numeric float and handle missing values by [e.g., using the mean].
3. Handle the typos in 'Category' by [e.g., mapping them to a standard list].
4. Remove all exact duplicate rows.
5. Provide comments for every line of code explaining the transformation.

Tips for Success

  • Be Explicit about Nulls: Tell the AI how you want to handle missing data (delete the row, fill with zero, or interpolate).
  • Check Formats: Mention if you need the output in a specific file format like CSV or Parquet.

Back to Data & Analysis Prompts

© 2026 Orush AI. All rights reserved.