Algorithm Conference

July 16 - 18, 2020

Austin, Texas

How to curate quality datasets for machine learning

An Algorithm Conference workshop by two Twitter ML engineers

General info

How to curate quality datasets for machine learning is the title of a workshop that will be conducted by a couple of machine learning engineers from Twitter, Inc. on Day 1 of Algorithm Conference, that is, on July 16, 2020, at the Thompson Conference Center on the campus of The University of Texas at Austin.


Target audience: Developers, aspiring developers, and technical (project) managers.

Date: July 16, 2020


Time: 8 a.m. – 10 a.m.


Instructors: Jigyasa Grover and Rishabh Misra


Location: Room 3.108,  Thompson Conference Center, University of Texas at Austin.​

Workshop summary

In the contemporary world of machine learning algorithms, data is the new oil. And for state-of-the-art machine learning algorithms to work their magic, it’s important to have access to relevant data. Though volumes of crude data are available on the web, we still need the ability to identify and extract them into meaningful datasets.


This workshop will present the power of one of the most fundamental aspects of machine learning – dataset curation, which often does not get is due but is highly relevant in machine learning.

You’ll learn why dataset curation is important in specific industry use cases, and also learn, via hands-on Pythonic examples, how to construct good quality datasets.


The methods and tips shared in this workshop have come in handy for the instructors when publishing high-grade research papers, at their current employment with Twitter, Inc., and at prior engagements in the industry and academia.

What you'll learn

Workshop schedule

Detailed class schedule will be published shortly. Stay tuned! But don’t let that stop you from getting your tickets before they sell out.

What you'll need

BYOL (Bring your own laptop)

And perform the following actions on your computer:

1. Install or update to Google Chrome’s latest version (v79).

2. Download Chrome Driver with version matching the Google Chrome’s version from here

3. Install Jupyter Notebook from here

4. Install Beautiful Soup and Selenium packages.

5. Ensure that the starter code in curating quality ML datasets works on your computer.


Registration for this workshop and for the conference itself is now open. You may register for one or both. The workshop has a limited number of tickets, so hurry and register if you want to guarantee yourself a seat. To reserve your ticket(s), click on that big red button.

Workshop instructors

Jigyasa Grover

Jigyasa Grover

Machine Learning Engineer, Twitter

The 2017 Red Hat Women in Open Source Academic Award Winner and Google Summer of Code alumna, I am an ardent open source enthusiast and a budding researcher, with work experience at the San Diego Supercomputer Center; National Research Council of Canada; and the Institute of Research & Development, France. I also briefly worked on anomaly detection frameworks in the ads system at Facebook.


I was the Director of Women Who Code and Lead of Google Women Techmakers for a handful of years. Aside from teaching this workshop with my colleague Rishabh Misra, I’ll also be giving a presentation that sheds some light on an informal taxonomy of machine learning algorithms.


I live in San Francisco, California.

Rishabh Misra

Rishabh Misra

Machine Learning Engineer, Twitter

I am passionate about identifying and tackling novel and practical problems using my machine learning expertise. I also like messing with data. The bigger, the bettter. The datasets I’ve collected as part of my research have been very well received by the machine learning community, and I’m currently ranked 23 as a dataset contributor on the Kaggle platform.  My dataset on Sarcasm Detection has been used in’s Natural Language Processing in TensorFlow course on Coursera.


I love explaining convoluted concepts in an accessible manner and have written several articles with the TowardDataScience online publication. I have a masters degree in computer science from the University of California San Diego, and currently work as a machine learning engineer at Twitter Inc.

Additional workshop

Exploring machine learning on the edge is the title of another workshop that will be conducted by representatives from Particle during Algorithm Conference. It is recommended for developers, CTOs, and technical (project) managers.

Recommended activities for developers

Day 1: Workshop 1

How to curate quality datasets for machine learning

Thursday, July 16, 2020 (8 a.m. - 10 a.m.)

Day 1: Workshop 2

Exploring machine learning on the edge

Thursday, July 16, 2020 (1:30 p.m. - 5 p.m.)

Day 2 to Day 3

General conference sessions

Friday, July 17 - Saturday, July 18, 2020


All Algorithm Conference workshops will be held in Room 3.108 inside the Thompson Conference Center on the campus of the University of Texas at Austin. The address is 2405 Robert Dedman Dr., Austin, Texas 78712

All plenary conference sessions will be in the LBJ Auditorium of the LBJ Presidential Library, located at 2313 Red River St, Austin, Texas. The auditorium has a seating capacity of just under 1,000.

LBJ Auditorium
LBJ Presidential Library
University of Texas at Austin
LBJ Auditorium
LBJ Presidential Library
University of Texas at Austin

Be part of the Algorithm Conference experience!

Austin is the state capital of Texas and also the tech capital of the state. It’s the home for major tech conferences in Texas.


Austin in the summer is warm and buzzing with activity. No better city to spend your summer in the southern US. And no better conference to attend next summer than Algorithm Conference.

Get your ticket


Algorithm Conference is the event you want to get on stage and represent your company before some of the best minds active in the areas of big data, articial intelligence and blockchain technologies.

Complete our CFP


Becoming a sponsor (guarantees speaking oportunity) or exhibitor gives you multiple opportunities to showcase your brand and services before a dedicated and influential audience.

Email for info


Media and news outlets (yes, podcasts included!), industry associations and professionals organizations, special discounts for your members if you link with us as partner.

Email for info


Subscribe to our newsletter to get the latest update about Algorithm Conference in your inbox? We won’t spam you. Just the latest Algorithm Conference news.

Subscribe to newsletter

Privay and cookies policy

What We do is an event website with a focus on disruptive technologies. So information we provide on this website pertains to our past and upcoming events. We don't sell you anything directly. We do, however, sell you tickets using a third party, online ticketing platform and monitor website traffic using a third party analytics tool.


Information we collect

Information we collect directly is your email address, when you subscribe to our newsletter. And to help us better understand the nature of traffic that flows through this website, we use a third party analytics tool, which uses cookies inserted into your browser to track your activities on this website. Other third party services we may use also use cookies to make the services they provide function properly.


Our Privacy Policy

Our privacy policy is very simple; we do not sell any personal data that we collect from you either directly when you subscribe to our newsletter or via a third party when you, for example, purchase a ticket or tickets to this event.


How we use the information we collect

When you subscribe to our newsletter, we use your email address to send you updates about our events. The personal information we have access to via the third party ticketing platform we use to help us understand the geographic distribution of our attendees.


How we use cookies

Cookies inserted into your Web browser via a third party analytics tool are used to compile aggregate data about your activities while you on this website so that we can offer better site experiences and content in the future. You can configure your browser to not store cookies, but that will severely impact your user experience while on our website. We recommend that you whitelist our website if you don't want every website you visit to set cookies in your browser.


How to contact us

To contact us, click on the Contact us button in the footer of this website.

Code of conduct

All attendees, speakers, sponsors and volunteers at our conference are required to agree with the following code of conduct. Organizers will enforce this code throughout the event. We expect cooperation from all participants to help ensure a safe environment for everybody.


Our conference is dedicated to providing a harassment-free conference experience for everyone.


Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion, technology choices, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.


Participants asked to stop any harassing behavior are expected to comply immediately.


Sponsors are also subject to the anti-harassment policy. In particular, sponsors should not use sexualized images, activities, or other material.


If a participant engages in harassing behavior, the conference organizers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund.


If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately. You'll always find at least one conference staff at the registration table.


Conference staff will be happy to help participants contact hotel/venue security or local law enforcement, provide escorts, or otherwise assist those experiencing harassment to feel safe for the duration of the conference. We value your attendance.


We expect participants to follow these rules at conference and workshop venues and conference-related social events.

© 2019 Sundiatah Ventures LLC. All rights reserved.

We bring a combined 50 yrs experience as IT pros to the table, 5 of those organizing technology conferences.



al-Khwārizmī   ->   algoritmi  ->   algorithm


Postponed due to covid-19

Due to the raging covid-19 pandemic, Algorithm Conference has been postponed until further notice. With two vaccines already approved and a couple more expected to be approved early 2021, many experts believe that we should begin getting back to normal by Summer, assuming the vaccination programs go as planned.


So tentatively, we should be able to host an in-person conference by Fall 2021. The good news is, people want to get together like during the pre-pandemic days, so we’re sure that when the conference eventually takes place, it will be a resounding success.


All things being equal, we’ll announce a new date by March 2021. Ticket sale is still ongoing, so you may buy your tickets very early now.