Member-only story
47% of Data Lakes Are Disorganized — Fix It with These Databricks Naming Conventions
First, let’s address the elephant in the room: naming conventions are not glamorous. They’re the broccoli of data engineering — not exciting, but essential for a healthy ecosystem. But without them, your data lake becomes a data swamp, and no one likes wading through a swamp looking for treasure.
If you’ve ever spent hours searching for a poorly named database table — or worse, named something table_123
—you know the pain of a disorganized data ecosystem. Databricks, with its powerful data management capabilities, is no exception. Without consistent naming conventions, your data lake can quickly turn into a data swamp, costing time, money, and sanity.
In this article, lets explore how to design effective naming conventions for Databricks tables and views. However, these principles aren’t limited to Databricks — they apply equally to other data platforms like Snowflake, BigQuery, Redshift, and even traditional SQL Server or Oracle databases. A well-structured naming system is universal and critical for scalable data management. To make things relatable, I’ll illustrate our approach using a retail domain example.
Some Quick Stats
- A 2023 survey by TDWI found that 47% of companies struggle with disorganized data lakes — and you can bet naming chaos is part of that.
- Poorly named tables can add 25–30% more time to query development and debugging efforts.