Uber India Engineering team successfully hosted Datacon 2022, a day-long conference focused on the use of data to make better business decisions. Technology leaders and subject experts from Uber and leading global technology companies such as LinkedIn, Rakuten, and Microsoft, came together to share insights into their work on improving data quality and reliability at scale.
At Uber, Data is a First Citizen equivalent to the applications and services and teams are involved in analysis activities to make the platform better. The right use of this data helps provide the best experience to Riders, Eaters, Drivers, and Delivery partners. Product teams at Uber are constantly engaged in work related to data so that Drivers and Delivery partners on our platform can earn more, while Riders and Eaters have the right ETA calculated to the last second.
Vast amounts of data are being generated for large companies across industries. The emergence of bigger companies has led to support systems becoming more complex. Data observability has fast risen as an important function that holds the key to making better decisions. At Datacon 2022, various such challenges and opportunities were discussed by experts with topics ranging from:
Data-driven optimization in real-time marketplaces
Handling data at scale using Remote Shuffle Service (RSS) in Spark
Applying anomaly detection and automation to improve data reliability
Building Serverless Hive - A PetaByte Scale Data Warehouse
Consistent Metrics In Online And Offline World
Speaking at the conference, Manikandan Thangarathnam, Senior Director, Engineering, Uber, said, “The core experiences offered by Uber, from Uber Eats to Uber rides, are powered by data. With data becoming the most powerful tool to make better business decisions, there are several common questions faced by Uber and our technology peers. Datacon 2022 was our attempt to discuss the criticality of improving data quality for business and bringing diverse voices to the table. We will continue to encourage cross-learning across industries and look to build in the future.”
Some highlights from Datacon 2022:
Datacon keynote: On Data-Driven Optimization In Real-Time Marketplaces
Ritesh Madan, Senior Director at Uber, who leads the pricing and incentives for Uber Rides and Delivery businesses, talked about how Uber runs real-time marketplaces for multiple lines of businesses with complex matching, pricing, promotion, and payment structures. During his talk, Ritesh also covered strategies and data architectures to generate the input data and state to enable ML models and sophisticated optimization algorithms for an efficient marketplace.
LinkedIn Sales Insight For Smarter Sales Planning
Linkedin Sales Insights (LSI) is a self-service tool that gives you direct access to real-time data and insights on companies and employees in your target market.
During their session, Shishir Sathe & Harpreet Singh of LinkedIn went over the Data Foundational aspects that were built for LSI in order to enable self-serve analytics that gives customers a lot of flexibility. They also covered how LinkedIn solved these data and design challenges around scaling count distinct operations for Big Data and tackling the pre-compute and store approach as it was not technically possible to do this for all possible combinations of over 15+ attributes (due to the memory and computational resources).
On Future Of Data Observability
Uber DataCon included a panel discussion on the topic of, ‘Future of Data Observability’. The discussion started with an overview on the importance of Data Observability and hearing the perspectives of each of the panelists from their experience. It was followed by a deep-dive on aspects like data SLA, data quality, tools and framework and pain points of consumers. Discussion extended on how data observability is handled at Uber and factors that should be taken into consideration while designing to make the application more observable. The panelists included, Divya B, Chirag Parmar, Javed Abdulla Alok Srivastava while Pallavi Rao moderated the discussion.
Data Reliability At Scale For Rakuten Data Science And ML Workflows
Traditional data quality and reliability mechanisms follow deterministic rules coded by a human programmer. But these cannot evolve fast enough or scale quickly enough to keep up with the data velocity and volume. Sekhar MK VP of AI Products Rakuten and his team, talked about how various ML techniques and tools can be leveraged for ensuring Data Reliability at Scale to address issues in the areas of Automatically discovering & flagging issues, Tracing and generating data lineage for critical data pipelines, Flagging erroneous predictions, Auto-auditing raw as well as derived data sources for completeness
Data Mesh Architecture At Microsoft 365
Microsoft 365, has a compliant experimentation platform that allows Microsoft data scientists to build machine learning models by providing the necessary tools, processes, and infrastructure. Raj Srivastav & Vinit Tiwari from Microsoft talked about the platform that started off as a centralized data lake, which is now evolved into a distributed architecture, more closely aligned with a Data Mesh paradigm. This talk shared their experiences and insights through this journey and described the various challenges with both architectures.