MAS S68 F’19 | Computer Visions: Generative Machine Learning Tools for Creative Applications


Machine learning is transforming our reality. Today’s models deeply learn about their input domains and reach beyond-human capabilities on a growing list of tasks. While learning to predict, many ML models construct a view of their input worlds that allow them to also generate new input data – they are generative models. In this fast-paced class students will use generative ML models to “paint” and “sketch”, “write” a poem or a whole (fake-)news article, transfer visual artistic style, hallucinate structure out of noise, generate 3D models, and “compose” music. The class will lightly cover topics in applied mathematics for machine learning but will focus on hands-on practical programmatic methods to implement computer visions in Python. Students will learn how machine learning models are used to generate information in multiple media (text, sound, picture, 3D geometry), and by the end of the course will be able to apply these tools to their own domain of interest. The course is designed for persons without prior experience in deep learning, but it can benefit advanced students looking for an overview of the latest generative methods.


Location: E15-359
Time: Thursdays from 1-3pm, on a once-every-two-weeks schedule.
Instructors: Prof. Pattie Maes, Dr. Roy Shilkrot, Guillermo Bernal
Units: 0-6-0

Using Google Cloud: http://visions.media.mit.edu/howto-using-google-cloud/

Syllabus | Topics Covered

  • Deep learning mechanics: ML concepts (model, classification, regression, supervision, train-test, loss, overfitting, regularization), linear models, neural nets, back-propagation optimization, convolution, recurrence, attention, transformers
  • Deep learning practicalities: Tensorflow-Keras, working with data, inference environment stack (CUDA, Docker, Jupyter), retraining and transfer learning
  • Generative model patterns: encoder-decoders, autoencoders, adversarial nets
  • Significant ML Models of interest: VAEs, CNNs (VGG, Inception, ResNeXt, DenseNet, WaveNet, Pixel), RNNs (Char, DRAW, Sketch, Melody, ELMo), GANs (DC, VAE, C/Info, Cycle, Big, Style, Pro, pix2pix, Gau, Stacked, 3D, …), Transformers (BERT, GPT1,2)


Date Topics Assignment Slides Video Assignment
1 9/5 Intro, Machine & Deep Learning, CNNs ImageNet: Style Transfer, Deep Dream, Neural Doodle PDF Link Link
2 9/19 Generative models, VAEs, GANs VAE, DCGAN, C/Info GAN, BigGAN, Cycle PDF Link Link
3 10/3 Text: RNNs, Attention, Transformers Word2Vec, BERT, GPT, ELMo, Txt2img PDF Link Link
4 10/17 Quick Draw!, DRAW, Sketch-RNN Sketch, draw, paint PDF Link Link
5 10/31 Music, audio, representation, melody, rhythm WaveNet, Melody RNN PDF Link
6 11/14 3D models, transfer learning Image to 3D, 3D GAN PDF Link
7 12/5 Project presentations

Assignments and Grading

Each week a home assignment will be given in the form of a jupyter notebook. The notebook will contain code that follows each week’s class and also open segments for students to run their own code and tweak parameters to generate new artifacts. Students will be encouraged to post their successful creations on the class website. Assignments will be graded towards the final grade, and feedback will be given. 

The class will have a final project that is centered around the student’s domain of interest, applying the tools given in and out of class. Instructors will provide starting points, data, and help for finding a suitable project. Projects could be done in groups of 1, 2 or 3 students.

Final grades will be given after project submission evaluations. Project grading criteria: creative value, technical contribution, academic contribution.

Three special awards will be given to extraordinary teams to earn extra credit: 

  • The Transfer-Learning award – given to students who successfully apply a model pre-trained in one domain to generate output in a different domain (e.g. a text model to create images, or vice versa).
  • The C-AI-borg award – given to students who demonstrate their generative model created human-level outputs or otherwise surprising capability.
  • The Hinton-LeCun-Bengio award – given to students who train their model from scratch and demonstrate its utility towards generation.
Grading scheme

Class Structure

The class is based on a 2hrs every 2-weeks schedule, with biweekly readings and home assignments.

Participation in class will be encouraged, and topics from active learning practices will be applied.

Prerequisite knowledge

  • Programming and scripting: Python, command line scripting (linux/mac)
    If you’re already comfortable with programming, or are a quick learner, you can take this class. Work in Jupyter enviroment – recommended.
  • Basic mathematics: Linear algebra, statistics and probability, multivariate calculus – only cursory knowledge required, however basics will not repeated in class for lack of time.
    If interested in an in-depth understanding of Machine Learning, Deep Learning, Numeric Optimization or Statistical Modeling – consider taking a dedicated class on these subjects.

Recommended literature

  • Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron.
  • Deep Learning with Python, by Francois Chollet

Recommended courses

Related Classes / Initiatives