An Algorithm Conference workshop by two Twitter ML engineers
How to curate quality datasets for machine learning is the title of a workshop that will be conducted by a couple of machine learning engineers from Twitter, Inc. on Day 1 of Algorithm Conference, that is, on Feb. 18, 2021, at the TI Auditorium, University of Texas at Dallas.
Target audience: Developers, aspiring developers, and technical (project) managers.
Date: Feb. 18, 2021
Time: 8 a.m. – 10 a.m.
Instructors: Jigyasa Grover and Rishabh Misra
Location: TI Auditorium, University of Texas at Dallas.
In the contemporary world of machine learning algorithms, data is the new oil. And for state-of-the-art machine learning algorithms to work their magic, it’s important to have access to relevant data. Though volumes of crude data are available on the web, we still need the ability to identify and extract them into meaningful datasets.
This workshop will present the power of one of the most fundamental aspects of machine learning – dataset curation, which often does not get is due but is highly relevant in machine learning.
You’ll learn why dataset curation is important in specific industry use cases, and also learn, via hands-on Pythonic examples, how to construct good quality datasets.
The methods and tips shared in this workshop have come in handy for the instructors when publishing high-grade research papers, at their current employment with Twitter, Inc., and at prior engagements in the industry and academia.
Detailed class schedule will be published shortly. Stay tuned! But don’t let that stop you from getting your tickets before they sell out.
And perform the following actions on your computer:
1. Install or update to Google Chrome’s latest version (v79).
2. Download Chrome Driver with version matching the Google Chrome’s version from here
3. Install Jupyter Notebook from here
4. Install Beautiful Soup and Selenium packages.
5. Ensure that the starter code in curating quality ML datasets works on your computer.
The 2017 Red Hat Women in Open Source Academic Award Winner and Google Summer of Code alumna, I am an ardent open source enthusiast and a budding researcher, with work experience at the San Diego Supercomputer Center; National Research Council of Canada; and the Institute of Research & Development, France. I also briefly worked on anomaly detection frameworks in the ads system at Facebook.
I was the Director of Women Who Code and Lead of Google Women Techmakers for a handful of years. Aside from teaching this workshop with my colleague Rishabh Misra, I’ll also be giving a presentation that sheds some light on an informal taxonomy of machine learning algorithms.
I live in San Francisco, California.
I am passionate about identifying and tackling novel and practical problems using my machine learning expertise. I also like messing with data. The bigger, the bettter. The datasets I’ve collected as part of my research have been very well received by the machine learning community, and I’m currently ranked 23 as a dataset contributor on the Kaggle platform. My dataset on Sarcasm Detection has been used in Deeplearning.ai’s Natural Language Processing in TensorFlow course on Coursera.
I love explaining convoluted concepts in an accessible manner and have written several articles with the TowardDataScience online publication. I have a masters degree in computer science from the University of California San Diego, and currently work as a machine learning engineer at Twitter Inc.
Registration for workshop and for the conference itself is now open. The workshop has a limited number of tickets, so hurry and register if you want to guarantee yourself a spot. To reserve your ticket(s), click on that big red button.
Want to register using your favorite cryptocurrency? We’re on your side. Just click that button to email us to begin the process. We’ll get back with you pronto.