Data Democratization at Button

At Button, data, engineering, revenue and finance are constantly working together to deliver the best service to our customers, and to each other.

As a modern technology company, we take pride in our sophisticated data stack - AWS for data capture and ingestion, BigQuery for data warehouse, Airflow for workflow management, dbt for data transformation, Looker for business intelligence reporting, Fivetran for data integration, and AI Notebook for data science development (see this blog post on how we rebuilt our data stack). However, these are just the foundation of the data powerhouse that we're building.

While our product and analytics teams deliver more insights-driven products to our partners, internally, we are also striving to become first-class data citizens, where everyone at the company is empowered to use the data available to them to guide, automate, and advance their work. In this post I'm going to outline how we've deployed four principles to reach a step-function change in data usability and accessibility:

Analyst-driven post-load data transformation
Self-service tools on a shared warehouse
Tight connection between product and financial data
Continuous feedback loop

Embracing ELT

Until the rise of cloud data warehouse, ETL (extract data from underlying systems, transform it into usable formats, and load it into a database) had been the gold standard in data pipeline design. However, cloud data warehouse enabled us to push the transformation step later, after raw data has already been stored in the warehouse. There are a few advantages to this approach: 1) business logic is removed from the extraction and loading steps, and raw data is preserved, making it much simpler to derive historical aggregate data when business logic changes; 2) there are more analyst-friendly options for executing the business logic, compared to transformations defined in engineer-centric languages like Python, Scala and Java.

At Button, we made the switch to dbt and never looked back. dbt empowers data analysts to take ownership of the transformation step as anyone with SQL fluency can write a dbt workflow. What used to require Python logic written by data engineers can now be eliminated. This grants our analysts a lot more flexibility and scalability when it comes to designing and modifying the transformations for specific datasets. In turn, it speeds up turnaround time for business teams.

Our path to ELT was marked by fun engineering challenges as well. For example, a seemingly straightforward task of giving dbt permission to access Google Sheets in Google Cloud Build turned out to be a nightmare, and we ended up utilizing a solution that incorporated a custom Airflow operator and our existing Airflow setup in AWS.

Self Service Business Intelligence

We migrated our business intelligence reporting from Tableau to Looker this year as well. While Tableau had served our organization well with visualizing data and creating dashboards, our increased focus on scaling holistic data analyses called for a more self-service platform. That's when Looker got on our radar.

With Looker, our colleagues in Revenue can quickly access the metrics and dimensions they need. Through targeted trainings, stakeholders feel empowered to use Looker to obtain insights and create reports for their needs. Looker's centralized data model definitions reduced human errors and ensured that everyone is using the same data definitions.

Even though we use BigQuery to ingest data in near real-time, data that powers Looker is transformed by dbt via batch processes. Those transformations include additional data cleaning and QA'ing steps to ensure data quality. However, there are times when we want to track instant changes in metrics to measure specific events, such as the impact of iOS 14 release or play station 5 rollout.

For measuring specific launches and events, our data-savy product managers have turned to Google Data Studio, which is integrated seamlessly with BigQuery.

Modern Financial Data Stack

Button's technology-focused finance team incorporated Fivetran to sync financial, operational, and marketing data into BigQuery, thus unifying all data sources into one destination. This data is then transformed and loaded into Looker, and it provides invaluable insights that can be easily further explored and combined with event data. By having NetSuite and Salesforce data reliably reside in our data warehouse and Looker, we streamlined revenue tracking, reporting and forecasting.

In this process, we also designed and implemented a set of data access strategies for BigQuery and Looker that balance our security needs and development pace to ensure the safety of sensitive personal information and company financial data.

Continuous Feedback Loop

Perhaps most importantly, we recognize that behind every set of data there is a story, and every piece of analysis is conducted by and communicated to individuals with their own feelings, needs, and pride. While numbers are cold, we hope to bring some human touch to big data - building confidence and trust with stakeholders, bringing mindfulness into data-driven decision making process, and having some fun along the way.

The data team experimented with hosting office hours, as well as regular retros with our frequent collaborators, the product engineering team behind Reach. The honest and open dialogues helped us understand each other's points of views, priorities, and feedback, and made our partnership even more productive.

Many thanks to the data, engineering, revenue and finance teams whose tireless efforts into projects highlighted in this post made a big push for data democratization at Button. Stay tuned for more posts on how we designed our data warehouse, developed our data access strategy, built up our machine learning platform, and more.