A concise, engineer-friendly walkthrough of Databricks Lakehouse, Delta Tables, commits, reads, temporary storage, and DBFS.
Databricks is a unified data & AI platform built on Apache Spark that merges the best of data lakes and data warehouses into a Lakehouse. You store data once in low-cost cloud object storage and query it like a warehouse, run pipelines, stream processing, and build ML models on the same platform.
Object Storage + Delta (transactions) + Spark (compute) + Governance
A Delta Table is a set of Parquet files plus a _delta_log/ transaction log stored in your cloud object storage:
/path/to/table/
├─ part-0000.snappy.parquet
├─ part-0001.snappy.parquet
└─ _delta_log/
├─ 00000000000000000000.json
├─ 00000000000000000001.json
└─ ...
CREATE TABLE sales (region STRING, amount DOUBLE);
Stored under the workspace-managed path (e.g., dbfs:/user/hive/warehouse/).
CREATE TABLE sales
USING DELTA
LOCATION 'abfss://datalake@account.dfs.core.windows.net/bronze/sales';
Physically stored in your ADLS/S3/GCS path; catalog just points to it.
Readers list _delta_log/, apply actions from JSON (and checkpoints) to determine the active data files for the requested version, then read only those Parquet files.
DBFS is a Databricks filesystem abstraction that lets you access cloud object storage with simple paths like /dbfs/mnt/datalake/ or dbfs:/mnt/datalake/. Mounts map to S3/ADLS/GCS; managed tables live under dbfs:/user/hive/warehouse/.
Object stores are flat key–value systems. Delta uses naming conventions and prefix listings to simulate directories. Example keys:
s3://bucket/sales/_delta_log/00000000000000000000.json
s3://bucket/sales/part-0001.snappy.parquet
The slashes in keys are just delimiters used for grouping in listings and UIs.