Eureqa - Synthetic/Robotic Scientist

Human

The Living Force
There is a software package, Eureqa, that I've found very useful in my line of job. It's referred to as Robotic or Synthetic Scientist since it reveals the underlying (mathematical) pattern from the given data, or as stated on their website _http://www.nutonian.com/
"Eureqa is a new technology that uncovers and explains the intrinsic relationships hidden within complex data."
Maybe some of you could also find it useful. There is a one month free trial available for download on their website.

Regarding the forum's Historical Timeline Sorting project, it crossed my mind that Eureqa maybe could be used as a side help when the database is ready. I haven't checked Eureqa's applicability with respect to that kind of things/data, but there are "History Building Blocks - designed specifically for time series data" and Custom Error Metrics option, so once it is decided what "pattern" is search for, maybe it can be tried to enter the data in Eureqa and run it to see what comes out.

About the company:
_http://venturefizz.com/blog/nutonian?utm_content=3714115 said:
Nutonian - At the Cutting Edge of Technology, Science, and Data Analysis
Monday Jan 20, 2014 by Dennis Keohane - Senior Writer, VentureFizz

Nutonian is one of those companies that is way smarter than you or I.
Eureqa, Nutonian's main product, has been called a "synthetic scientist", a downloadable robot scientist, and able to "derive meaning from datasets too large or complex for humans to study."
The company developed Eureqa as a technology to simplify and automize the scientific method of research. Basically, Nutonian's products can quickly find relationships within vast quantities of data in order to "solve" some of the laws of nature, science, and mathematics that have always seemed too daunting (whether due to the sheer volume of data to sift through or the amount of time required for experiments) to find solutions for, analyze, or apply to business, robotics, or other data heavy sectors.
Nutonian was officially founded in 2011, but Eureqa and the idea of distilling large quantities of data to solve some of the most complex problems facing scientists and researchers was developed at Cornell University in 2009. Hod Lipson, a computer engineer/researcher and the director of Cornell University's Creative Machines Lab, and Michael Schmidt, who was one of Lipson's Ph.D students, created the algorithm/technology while working on a complex robotics problem.
[...]

About Eureqa:
_http://www.theguardian.com/science/2009/apr/02/eureka-laws-nature-artificial-intelligence-ai said:
'Eureka machine' puts scientists in the shade by working out laws of nature
The machine, which took only a few hours to come up with Newton's laws of motion, marks a turning point in the way science is done

Ian Sample, science correspondent
_theguardian.com, Friday 3 April 2009 08.52 BST

Scientists have created a "Eureka machine" that can work out the laws of nature by observing the world around it – a development that could dramatically speed up the discovery of new scientific truths.
The machine took only hours to come up with the basic laws of motion, a task that occupied Sir Isaac Newton for years after he was inspired by an apple falling from a tree.
Scientists at Cornell University in New York have already pointed the machine at baffling problems in biology and plan to use it to tackle questions in cosmology and social behaviour.
The work marks a turning point in the way science is done. Eureka moments, which supposedly began in Archimedes' bath more than 2,000 years ago, might soon be happening not in the minds of geniuses, but through the warm hum of electronic circuitry.
"We've reached a point in science where there's a lot of data to deal with. It's not Newton looking at an apple, or Galileo looking at heavenly bodies any more, it's more complex than that," said Hod Lipson, the computer engineer who led the project.
"This takes the grunt out of science by sifting through data and looking for the laws that govern how something behaves."
[...]

Several videos:
  • TEDxUVM 2011 - Mike Schmidt - The Robotic Scientist: Accelerating Discovery with Eureqa; _http://www.youtube.com/watch?v=6XSncCrQzwk
  • Through the Wormhole (2013); _http://fast.wistia.net/embed/iframe/x0t6owlf6k?popover=true
  • 'Eureka machine' can discover laws of nature - The machine formulates laws by observing the world and detecting patterns in the vast quantities of data it has collected (guardian video/article); _http://www.theguardian.com/science/video/2009/apr/02/eureka-machine-artificial-intelligence
 
A limited version can be used for free for personal use. I toyed around with it a while back just since "I'm a science geek", but I lack the scientific or mathematical experience to make much practical use of it. If I ever conduct a personal experiment for which I'd like to analyze the data to find a formula, I might use it.

It works through something called "symbolic regression", which is a type of genetic programming (_https://en.wikipedia.org/wiki/Genetic_programming). Basically, you have your data, and you want to see what the mathematical connection is between series A and series B (or whatever). You choose some settings to give it some rules for finding the formula (such as which mathematical operators to use when building the formulas), and set it running.

An artificial evolution simulation "evolves" the formulas so that they more accurately explain the relationship between the data sets (series' A and B in this example). Basically, a very large number of formulas are created to forma "gene pool", which start out simple, and are each described by a "genetic" code. They are "reproduced" by copying and "mutated" by randomly inserting, deleting, swapping, etc. their mathematical operators. Then they are "selected" for according to how well they explain the data AND (importantly) how simple they are. This way there is pressure for a simple formula to emerge that explains the relationships in the data. So you sort of have an ecosystem where these formulas compete for reproduction, which is granted according to the criteria set at the beginning of the simulation.

A nice aspect of the program is that it does not just return one formula, it returns a number of them, organized according to: 1.) how well they match the data, and 2.) how short/"simple" they are. So if you have the knowledge to interpret the formulas, you could choose the one(s) that seems most promising, or test them one by one on new data. Which reminds me: the program also provides the functionality to test the formulas discovered using new (untrained) data. If the formula can predict data it has never seen before, you can be much more confident that it is at least a useful approximation of the truth.
 
Back
Top Bottom