Keynote Session

9:30 to 9:45AM

The Trojan Grand Ball Room

9:30 to 9:45

Jason Geng

Introduction from Conference Organizer

9:50 to 10:20


Topic:Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage 

10:25 to 10:55



  • Everyone is talking about big data but there are some misunderstanding.
  • What is Augmented Intelligence. 1) Human vs computer 2)Human with computer is more powerful than computer or human 3) Augmented intelligence is human intelligence + machine learning
  • Some case studies to indicate the power of augmented intelligence

11:00 to 11:30


Topic: Privacy vs. Security in a Big Data World
Description: The jury is still out on whether Edward Snowden was a hero, traitor, or schmuck. Regardless of the scarlet letter we want to hang around his neck, we should thank him for helping bring the discussion of big data privacy and security to the public square. This session examines the issues of big data privacy and security in the context of the six-stage (big) data lifecycle: create, store, use, share, archive, and destroy. We each have a role to play in this privacy/security theater. What’s yours going to be?


11:35 to 12:05

Raymond Fu - Practice Architect

Topic:  Building Enterprise Advanced Analytics Platform
Description: Will talk about the best practices in building an advanced analytics platform to help companies apply machine learning, deep learning and data science to their structured and unstructured data


Session - A

14:00 to 18:00

The Trojan Grand Ball Room - A

14:00 to 14:30

Juan Sebastian Vasquez - Operations Innovation Team - Office of Mayor Eric Garcetti

Topic: Battling Bureaucracy: Keep Your Data Project Moving

14:45 to 15:15

Kyle Polich - Principle Data Scientist at DataScience, Inc

Topic: Common Errors in Machine Learning
Description: The impact of machine learning on the modern world is indisputable.  Yet, constructing a useful system requires careful consideration of one’s dataset and interpretation of the model output.  Mistakes in the creation of a model can easily yield a model that looks good but performs poorly in practice.  This talk will explore some common pitfalls and how to avoid them

15:30 to 16:00

John Chao - Manager of Business Analytics and Data Science at LinkedIn

Topic: Elevate the Sales Process: B2B sales intelligence with LinkedIn Social Selling Data
Description:At LinkedIn, we use big data to push the envelope in the sales development process. We develop a set of full funnel b2b sales models that incorporates both company and individuals’ information and synthesize such information to dynamically inform our sales team on how to best manage their sales development. We are able to produce an information machine that decides which firms to target, who to target within each firm, understand each individual's propensity change in real time during the sales process so that the sales team is better equipped to win business deals

16:15 to 16:45

Yves Bergquist - Project Director of Entertainment Technology Center at USC

Topic: Data Science and Entertainment Industry

17:00 to 17:30

Kent Strokery - Open Soure Analytics Technical Evangelist AT IBM

Topic: Leveraging IBM DSX and R for Exploring, Modeling and Visualizing

Description: Mr. Stroker demonstrates a vehicular fatalities use case which starts with multiple data sources, perform some exploratory steps, model the data, create visualizations and use Shiny to create present an interactive dashboard.

Session - B

14:00 to 18:00

The Trojan Grand Ball Room - B

14:00 to 14:30

Wenjing Zhang - Linkedin, Director of Analytics

Topic: Advance analytics career from novice to sophisticated

14:45 to 15:15

Dusan Bosnjakovic - Data Scientist at Intuit

Topic: Link Analysis in Fraud Detection
Description: Find out how Intuit's Strategic Risk team uses link analysis to identify connections between accounts. This is done for a variety of reasons such as identification of fraudulent activity as well as cross-company account linking. This session will focus on the process of creating a graph structure with commonly used tools, identifying important data quality issues, several methods of analysis of the graph, and interactive graph data visualization.

15:30 to 16:00

William Wang - Assistant Professor at UC Santa Barbara

Topic: Scalable Learning and Reasoning for Large Knowledge

Description: Knowledge graph is a data structure that supports many recent AI applications, and learning to reason and understand the world’s knowledge is a fundamental problem in AI. While it is always hypothesized that both the symbolic and statistical approaches are necessary to tackle complex problems, in practice, bridging the two in a combined framework might bring intractability. In this talk, I will describe some examples of my work in advancing the state-of-the-arts on scalable statistical relational learning.

16:15 to 16:45

Benjamin Uminsky - Executive Assistant: Data Science at Los Angeles County Registrar-Recorder/ County Clerk

Topic: Min(d)ing the Voter File
Description: This talk will describe how the largest voting jurisdiction in the US (Los Angeles County) is using various data science techniques to clean the voter file (4.9 million registered voters) and use the data contained therein with other external data sources to develop a machine learning prediction algorithm to classify, for any future election, which poll workers will show up on election day and which ones will cancel or no-show.

17:00 to 17:30

Neal Fultz - Principal Data Scientist at Openmail

Topic: Going Bayesian in Ad Tech - Two Case Studies
Description: In online advertising, our data is fast and sparse, which can undermine traditional analyses. This session presents two applications of Bayesian methods: the first case is detecting changes in click-through-rate (CTR) in a series of web traffic from end users; the second case is estimating market depth using series of ad network data. In both cases, going Bayesian allowed us to use lower-level data directly to answer high-level strategic questions.

Session - C

14:00 to 18:00

THH 301

14:00 to 14:30

Oren Golan - VP Software Engineering at Sanguine

Topic: Evolving Graphs while Evolving Pokemon
Description: This talk follows our journey of trying to model and understand the Pokemon (generation 1) data and build a small web application and graph database around it.  The web application allows querying and visualization of stats, types, locations, breeding, evolution, and various other attributes.
The talk focuses on the realities of working with unfamiliar data and improving your model as you improve your understanding of the data.  Rather than focusing on the end result, it focus on all the steps and missteps it took to get there and what we learned along the way.


14:45 to 15:15

Rohan Monga - Sr. Machine Learning Engineer / Data Scientist at Whisper App

Topic: Personalization at Whisper
Description: Personalization strategies at whisper. Practical recommendation algorithms, pitfalls, A/B testing and validation strategies.

15:30 to 16:00

Austin Clements - Associate @ TenOneTen Ventures

Topic: Raising Venture Capital for your Data Driven Startup
Description:Get an inside look into how VCs evaluate your team, market, and product before making an investment decision. Learn how to identify the right investors for your company and stand out from the crowd.

16:15 to 16:45

Min Zhang - CEO of Totumwealth

Topic: Data science applications in FinTech

Description:  Provide an overview of data science trends in FinTech, including applications in banking, credit and lending, wealth management, and insurance. 

(Totum is an innovative risk analytics company that empowers digital client engagement and suitable advice for financial advisors.)

17:00 to 17:30

Emily Thompson - Product Manager at Insight Data Science

Topic: Academia to Industry: Transitioning to a career in Data Science

Description: A discussion of her transition from a career in particle physics research to becoming a Product Manager at Insight Data Science, where she help PhDs in quantitative fields make the transition to industry themselves. She will also give a high-level overview of the field of data science and industry-wide trends, as well as the tools and skills that top data-driven companies are looking for when hiring data scientists and data engineers.

Session – D

14:00 to 18:00

THH 202

14:00 to 14:30

Dinesh Srirangpatna - Big Data Solution Architect at Microsoft

Topic: Big Data and IoT in the 4th Industrial Revolution
Description: IoT is expected to be a $1.7 Trillion market by 2020 with 25 Billion inter connected devices. However, with all this buzz and hype, there is confusion on what defines Big Data and IoT, and how to utilize it. The good news – Big Data will result in the fourth industrial revolution by 2020. Attendees will understand the tools, terminology, landscape, use cases and business value of these transformational shifts in IT Landscape.

14:45 to 15:15

Debajyoti Ray - Chief Data Officer at VideoAmp

Topic: Large-scale consumer graph analytics for cross-screen advertising with Spark
Description:Consumers now view the same TV and online content seamlessly across multiple screens. This shift in consumer behavior has come to a head with the way advertising is sold. Each medium is sold separately in TV and online silos, creating an opportunity to bridge the gap and make advertising more effective using data and machine learning.
This talk explores technological developments at VideoAmp that bring together data from disparate mediums and across over a billion unique device IDs. We build a large-scale consumer graph using Apache Spark for 150 Million users. Our open-sourced project, Flint, is used to spin-up on-demand Spark clusters for ad-hoc analysis and batch processing. Machine Learning and graph analytics methods are used to build audience models for cross-screen bid optimization, frequency capping and sequential targeting.

15:30 to 16:00

Nimbus Goehausen - Senior Software Engineer at Bloomberg LP

Topic: Spark: Tips & Tricks
Description:Spark is a general-purpose cluster compute engine with a lot of momentum and excitement behind it. It’s easier to use than hadoop mapreduce, but it can still be difficult even for experienced programmers to learn how to structure their code and debug problems in this environment. Endless problems can arise due to running in a distributed environment, but we can avoid many by following a few rules. We'll cover some common pitfalls and examples of working alternatives.

16:15 to 16:45

Kenan Li - Data Scientist at Data Application Lab

Topic: GIS application with data science Description:

17:00 to 17:30

Peyman Mohajerian - Solution Architect at Databricks

Topic: Applying Apache Spark to Data Science Challenges in Media & Entertainment Industry
Description: Media and entertainment companies are moving from a content-centric model toward a consumer-centric one having to handle vast number of data sources, broad user base, and billions of data points created by millions of interactions.
Content Personalization and social media analytics are used to optimize marketing and enhance consumer’s experience. In predicting TV viewership- using Nielsen rating, and transcript analytic TV networks can better improve programming and optimize marketing. Apache Spark is a unified scalable computing frame for both Streaming and Machine Learning. We will also briefly discuss how Spark is being used to build continuous applications in media & entertainment