An ambitious star-mapping project highlights the growing importance of big data and the cloud, writes ARTHUR GOLDSTUCK.
At an event in Berlin today, the European Space Agency (ESA) is unveiling the biggest set of data about the stars ever gathered. The positions and magnitudes of no less than 1.7 billion stars of our Milky Way galaxy have been gathered by the Gaia spacecraft, which took off in 2013 and began collecting data a year later.
The ship is also transmitting a vast range of additional data, with distances, motions and colours of more than 1.3 billion stars collected so far. And that is without counting temperature measures, solar system analysis and radiation sources from outside the galaxy.
“The extraordinary data collected by Gaia throughout its mission will be used to eventually build the most accurate three-dimensional map of the positions, motions, and chemical composition of stars in our Galaxy,” according to a project document. “By reconstructing the properties and past trajectories of all the stars probed by Gaia, astronomers will be able to delve deep into the history of our Galaxy’s formation and evolution.”
The entire project would be impossible were it not for advances in cloud computing storage, big data analysis and artificial intelligence systems during this decade. The storage demands alone are mind-boggling. The ESA roped in cloud data services company NetApp, which focuses on management of applications and data across cloud and on-premise environments.
NetApp was previously involved with the Rosetta space mission, which landed a spacecraft on a comet in 2016. Lauched as far back as 2004, ten years later it became the first spacecraft to go into orbit around a comet, and its lander made the first successful landing on a comet.
“For the next two years Rosetta was following the comet and streaming data,” says Morne Bekker, NetApp South African country manager. “But with the comet speeding away from the sun at 120 000kph, Rosetta would soon lose solar power. Scientists seized the opportunity to attempt what no one had ever tried before — to gather unique observations through a controlled impact with the comet. Despite blistering speeds and countless unknowns, the spacecraft landed just 33m from its target point.
“It’s quite phenomenal when you think of the data and analytics harvested, and the information it can send back. Now we’re helping with the Gaia project. You can imagine how much data is being collected daily. The catalogue will probably end up at 2 Petabytes in size – that’s 2-million gigabytes. If you think of the minute points of data being extracted, obviously you have to be using AI and machine learning to analyse all of this.”
Ruben Alvarez, IT manager at the ESA, sums it up simply: “Data is everything. Our biggest challenge is processing of the data.”
Naturally, ESA required absolute reliability from data storage. It also demanded almost infinite scalability to support the massive data requirements of past, present, and future missions.
“We have a commitment to deliver data to different institutes in Europe on a daily basis,” says Alvarez. “Adding to the challenge, data from every mission must be accessible indefinitely. In the coming years, we will be launching new missions that will demand huge amounts of data. NetApp provided us with solutions that were scalable, even if we didn’t know in advance how much disk storage we were going to need.”
ESA says it expects to publish the full Gaia catalogue in 2020, making it available online to professional astronomers and the general public, with interactive, graphical interfaces.
The catalogue, says Alvarez, will unlock many mysteries of the stars.
“We call our site the Library of the Universe because we keep the science archive of
all of our scientific missions. This is how we allow people to really investigate the universe. t’s all about the data.”
The mission has tremendous scientific implications, but also makes a powerful business case for big data and cloud computing.
“The capabilities for AI and machine learning in the processing of mass amounts of data are far-reaching,” says Bekker. “Not only does it equate to extreme performance, but also to massive non-disruptive scalability where scientists can scale to 20 PB and beyond, to support the largest of learning data sets. Importantly it also allows scientists to expand their data where needed.”
Across Africa, the power of the cloud and big data is only slowly being harnessed. A new research project, Cloud Africa 2018, conducted by World Wide Worx for global networking application company F5 Networks, shows that cloud uptake is now pervasive across Kenya, Nigeria and South Africa.
However, the research reveals that each country experiences the benefits of the cloud differently. Respondents in Nigeria and Kenya named Business efficiency and Scalability by far the most important benefit, with 80% and 75% respectively selecting it as an advantage. Only 61% of South African respondents cited it.
The opposite happened with the most important benefit among South Africans: Time-to-market or speed of deployment came in as the most prominent, at 68% of respondents. In contrast, only 48% of companies in Kenya and 28% in Nigeria named it as a key benefit.
This appears to be a function of the infrastructure challenges in developing information technology markets like Nigeria and Kenya, where the cloud is used to overcome the obstacles that get in the way of efficiency.
In South Africa, where construction of the giant Square Kilometre Array multi radio telescope is due to begin next year, the learnings of Rosetta and Gaia will ensure that data collection, storage and analysis will no longer be a challenge.
- For the latest on project Gaia, visit http://sci.esa.int/gaia/