Feature-wise transformations
AI

Feature-wise transformations

Many real-world problems require integrating multiple sources of information. Sometimes these problems involve multiple, distinct modalities of information — vision, language, audio, etc. — as is required to understand a scene in a movie or answer a question about an image. Other times, these problems involve multiple sources of the same kind of input, i.e. when summarizing several documents […]

The Building Blocks of Interpretability
AI

The Building Blocks of Interpretability

With the growing success of neural networks, there is a corresponding need to be able to explain their decisions — including building confidence about how they will behave in the real-world, detecting model bias, and for scientific curiosity. In order to do so, we need to both construct deep abstractions and reify (or instantiate) them in rich

Using Artificial Intelligence to Augment Human Intelligence
AI

Using Artificial Intelligence to Augment Human Intelligence

What are computers for? Historically, different answers to this question – that is, different visions of computing – have helped inspire and determine the computing systems humanity has ultimately built. Consider the early electronic computers. ENIAC, the world’s first general-purpose electronic computer, was commissioned to compute artillery firing tables for the United States Army. Other

Sequence Modeling with CTC
AI

Sequence Modeling with CTC

Introduction Consider speech recognition. We have a dataset of audio clips and corresponding transcripts. Unfortunately, we don’t know how the characters in the transcript align to the audio. This makes training a speech recognizer harder than it might at first seem. Without this alignment, the simple approaches aren’t available to us. We could devise a

Feature Visualization
AI

Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability has formed in response to these concerns. As it matures, two major threads of research have begun to coalesce: feature visualization and attribution. Feature visualization answers questions about what a network — or parts of a network — are

Why Momentum Really Works
AI

Why Momentum Really Works

Step-size α = 0.02 Momentum β = 0.99 We often think of Momentum as a means of dampening oscillations and speeding up the iterations, leading to faster convergence. But it has other interesting behavior. It allows a larger range of step-sizes to be used, and creates its own oscillations. What is going on? Here’s a

Research Debt
AI

Research Debt

Achieving a research-level understanding of most topics is like climbing a mountain. Aspiring researchers must struggle to understand vast bodies of work that came before them, to learn techniques, and to gain intuition. Upon reaching the top, the new researcher begins doing novel work, throwing new stones onto the top of the mountain and making

Experiments in Handwriting with a Neural Network
AI

Experiments in Handwriting with a Neural Network

Let’s start with generating new strokes based on your handwriting input Neural networks are an extremely successful approach to machine learning, but it’s tricky to understand why they behave the way they do. This has sparked a lot of interest and effort around trying to understand and visualize them, which we think is so far

Deconvolution and Checkerboard Artifacts
AI

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts. It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior. Mysteriously, the checkerboard pattern tends to be most prominent in images with strong colors. What’s going on?

How to Use t-SNE Effectively
AI

How to Use t-SNE Effectively

Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading. By exploring how it behaves in simple cases, we can learn to use it more effectively. play_arrowpause refresh Step A popular method for exploring high-dimensional data is something called t-SNE, introduced by van der Maaten and Hinton in 2008 [1].

Scroll to Top