Editor’s Tech: In Colossus: The Fourbin Project, an advanced supercomputer becomes emotional and makes mankind a slave. The Colossus is also the name of the storage platform where almost all internet services of Google reside. However, we do not know whether the company has taken a direct inspiration from the Classic Science-Fi film, the meaning still exists.
In a recent blog post, Google revealed some of the “secrets” hidden behind the colossus, a massive network infrastructure that the company describes as its universal storage platform. Colossus is strong, scalable and easy to use and program. Google said that large -scale machine still tried and uses a magnetic hard disk drive and true (still developing).
Colossus provides powers to many Google services, including YouTube, Gmail, Drive, and more. The platform evolved from the Google file system project, which is a distributed collection system for the management of large, data-intensive applications, making things more manageable. Surprisingly, Google supercharged the colossus by installing a special cache technique that depends on the rapidly solid-state drive.
Google creates a colossus file system per cluster in a data center. Many of these cluster are powerful enough to manage many ex -bits of storage, with two file systems, in particular, hosting more than 10 Xbytes of each data. The company claims that the Google-operated application or services should never come out of the disk space within the Google Cloud Zone.
The data throwput in the colossus file system is impressive. Google claims that the “regular” is higher than the reading rates of 50 terabytes per second, while the writing rates are up to 25 terabytes per second.
The company said, “It is an enough throwput to send more than 100 full-flavored films every second every second.”
It is necessary to store data at the right place to achieve such over-the-top performance. Colossus internal users can decide if their files need to go to HDD or SSD, but most developers employ an automated solution that is known as L4 distributed SSD cashing. This technique uses machine learning algorithms to decide which policy to implement specific data blocks. However, the system eventually writes a new data to HDDs.
L4 can solve this problem over time by looking at I/O pattern, separating files in specific “categories”, and imitating various storage placements. According to Google’s documentation, these storage policies “place on SSD for one hour,” “space for two hours on SSD,” and “not place on SSD.”
When simulation file access patterns are properly predicted, a small portion of data is placed on SSD to absorb most initial read operations. Data is eventually migrated into cheap storage (HDDs) to reduce overall hosting costs.
The company said, “As a basis for all of Google and Google Cloud, Colossus plays an important role in providing reliable services to billions of users, and its refined SSD helps to keep the cost and performance low by adapting to the workloads automatically to change the workload.” “We are proud of the system we have created so far and are ready to continue to improve the scale, sophistication and performance.”