The most powerful machine learning models train on a lot of data. Where is this data from? It usually is from the internet and is then filtered, labeled and curated by humans. Assuming all data that current machine learning models are trained on is in some form available on the internet, you might need to ask for it on a specific forum, though. A major way forward towards more intelligent machines is to think about ways to allow ML models to use the internet themselves to learn. This isn’t an access to the “real world”, but is much less expensive than having a robot incarnation of an ML model roll around in our 3-dimensional world. Is this 3-dimensional world as important as we think, anyways? Even we spend a pretty fat share of our time interacting with the internet, even though our interfaces to the internet are pretty bad. Compare how much data bandwidth your body has to interact with the world with the share of it that can be used to interact with the internet.

A way to think about an ML model that learns from the internet directly is to look at it as a baby that is left to live on the internet. I would like to describe a three-step curriculum such a model might go through and relate it to human babies to give inspiration for how such a baby-on-the-internet model might learn. Each stage represents a new learning capability, but previously learned capabilities are not forgotten and still used when applicable.

At first it will just experience whatever random things it comes across: articles, terms of service, spam, pictures of random noise, ingredient lists. Just like a small baby can’t choose what it will listen its parents discuss about or what it sees when it is carried around.

In the second step preferences develop for what kind of data to consume. The model will look for data it can incorporate easily, but does not know yet, especially data that helps it understand the world better. It might look at Simple Wikipedia entries, simple images or listen to repetitive music. Similar to a baby that prefers to look into faces over looking into the sky.

The last stage of progress is for the model to interact with the internet and its human users to learn by interaction. In this last stage I imagine the model to participate in forum discussions, or tweet. The equivalent in humans is to be able to interact with the environment at first by noises or gestures for example, later using language.

I see these stages not only as stages a model goes through during training, but as stages in research as well. Currently we only have stage one models, that learn from whatever they come across and we rarely train them on random samples from the internet directly, since the samples would be noisy and low quality compared to the handcrafted datasets. The next step will be to incorporate stage two learning capabilities, such that training on the internet directly becomes feasible for the first time: the model itself builds the high-quality dataset. Finally, at the last step we are able to build models that can interact with the internet, too.

The purpose of this post is to draw a parallel between ML models in the online world and Sapiens models in the physical world to inspire more people to think about ML models as babies on the web. I really believe that progress can be made if we move forward from our current dataset-centric focus in the machine learning community.