Subsurface 2023: The open lakehouse conference

Dipankar Mazumdar
5 min readOct 25, 2022

The Subsurface LIVE conference is back. It is the industry’s premier cloud data lake conference, featuring some of the most exciting, innovative speakers and open-source projects propelling today’s cloud data lake ecosystem.

The call for speakers is open until 1st November, and we’re looking for members of the data lake community to share their experience and expertise in building modern cloud data lakes and key open-source technologies such as Apache Iceberg, Apache Arrow, Apache Spark, and more.

Call for Speakers

A bit of background

Over the last decade, data architectures have undergone quite a bit of evolution. First, we started with on-premise data warehouses to centrally store data from one or more disparate sources and cater to use cases such as business intelligence & reporting.

And then, the need to support more advanced analytical workloads (such as machine learning) and the search for cost-effective options led us to data lakes. However, while these two-tier architectures (warehouse + lake) has been one of the common approaches to designing data analytics platforms, it has presented some complexity & challenges (read more).

Enter Data Lakehouse!

Simply put, a lakehouse architecture combines the idea of a data warehouse and a data lake. Although it is not just a mere integration between the two, the idea is to bring the best out of the two architectures: the reliable transactions of a data warehouse and the scalability/low cost of a data lake.

A data lakehouse architecture can comprise various components. However, one of the critical components that set the base for a lakehouse is the table format. Table formats such as Apache Iceberg help abstract the physical data structure’s complexity and allows us to do data warehouse-level transactions (DML) along with critical features such as schema evolution, time travel, compaction, etc.

Okay, that was some high-level stuff on data lakehouses. However, like any new form of technology or approach, a lakehouse platform brings certain speculations, specifically around the following:

  • How have enterprises been using it — have they been successful?
  • What are some of the challenges?
  • What technology supports and fits in such an architecture, etc.?

The word cloud below is only representational but aims to show some of the related search terms associated with a lakehouse..

Representational word cloud

As data practitioners, we want to look beyond the jargon (ooh, we are good at it) and all the sale-sy stuff. We want to hear about those real-world production implementations and the lessons learned. Well, Subsurface is the perfect place to talk and learn all about it.

The Open Lakehouse conference: Calling all speakers!

Subsurface is all about open lakehouses, and we are particularly driven by the ‘open’ aspect of it. We believe all lakehouses should have open data standards, so it allows all the exciting possibilities for analytics.

Join us for this exciting two-day conference on March 1st & 2nd, 2023, and hear from tech experts, open source innovators, data engineers, architects, etc., on the hard-fought lessons learned, innovative projects, and the trends and strategies around the lakehouse ecosystem.

While attending Subsurface is a fantastic experience, we also want to hear from data practitioners working in related technologies & open-source developers and contributors. Our call for speakers is open till 1st November. So if you have an exciting story to tell — not just the successful ones but also about something that didn’t work, share it with this incredible community. This can also be an excellent opportunity for speakers to establish credibility & build a brand for themselves professionally. All the selected talks will have a spot on our Subsurface website’s resource library.

The event is 100% virtual so you can present from anywhere in the world.

We had some fantastic sessions last year 🔥🔥

SubSurface 2022 Talks

Relevant FAQ’s:

1. When is Subsurface 2023?

Subsurface is a completely virtual event happening on March 1st & 2nd 2023.

2. Who should attend?

Subsurface is open to the entire data community. So, whether you are a data engineer, analyst, architect, scientist or an advocate, join us.

3. What are some of the topics to talk about?

We’re looking for members of the data lake community to share their experience and expertise on topics related to the data lakehouse ecosystem. The topics relevant to a data lakehouse can vary from storage to compute engines, from data quality to ingestion, and BI/data science-based tools. In general, the topics can fall in the follow categories.

  • Use case implementations - lessons learned
  • Data architectures/infrastructure
  • Data visualization
  • Data science
  • Orchestration
  • ETL/ELT
  • Data quality
  • Data catalogs

4. What were some of the past talks?

Here are a few of the previous year’s amazing talks:

5. How to submit a talk?

The CFS page has all the submission guidelines. Go here!

6. I missed the deadline to submit an abstract but would love to speak. Can I?

We understand that deadlines are sometimes are hard to meet. If you would still like to speak, reach out to me to see if I can make any arrangements.

Dipankar, Developer Advocate at Dremio

LinkedIn, Twitter

--

--

Dipankar Mazumdar

Dipankar is currently a Staff Data Engineering Advocate at Onehouse.ai where he focuses on open source projects in the data lakehouse space.