Structured vs Unstructured Data
Estimates of the ratio of structured versus unstructured data, put unstructured data at about 80% of data found in an organization.�� Unstructured data is in the form of e-mail, corporate documents, news, blog articles, Web pages while structured data is data captured in structured form in relational databases.�� If you fill in a form and it is captured in a database, it becomes structured.�� Otherwise, it is unstructured.�� Why does this matter?�� Because Fibre Channel’s block level storage paradigm is most effective when used with structured data.�� As structured data growth goes, so goes Fibre Channel.�� Over the past ten years, Fibre Channel has become a fixture in corporations wanting to wring the best possible performance from structured databases.�� Going forward, where is the growth going to be?�� Certainly, structured databases are not going away, in fact they will continue to grow.�� However, search and social networking firms such as Google and Facebook work with close to 100% unstructured data.�� As a result, their data centers do not use Fibre Channel, except for internal applications such as ERP.�� Yes, even Google and Facebook have to produce reports for Sarbaines-Oxley.�� Google and Facebook typically use only the two SATA drives in the server for storage and place at least three copies of the data in separate servers to enable redundancy without requiring RAID, a high cost item used in the majority of structured data implementations.�� They use a software framework call Apache Hadoop that was inspired by Google’s GFS (Google File System) and has resulted in the open source HDFS (Hadoop Distributed File System).�� HDFS was designed to work with very large files; the best file size is a multiple of 64 MBs.�� Clearly, this is overkill for text files, but when working with audio, images, and video files, HDFS really shines.�� HDFS is typically used with HBase, another open source project modeled after Google’s BigTable and/or a data warehousing framework called Hive.�� What does this mean?�� Well, with the huge growth in data resulting from Web 2.0 websites where website users provide the data as opposed to Web 1.0 where the web site owner publishes the data, we expect that the largest growth will result in unstructured data.�� The Googles and Facebooks use server based storage where the data is stored on local disk drives or network attached storage targets that are built from standard servers.�� We know that the search and social networking sites use homegrown storage.�� The question is whether traditional data centers will move in that direction or whether they will be moved in that direction as many shift to Cloud Computing and Storage models.
Cloud Computing and Storage
Cloud Computing differs from On-Premises (traditional) Computing in that the customer rents or leases the use of the IT equipment on a per usage basis rather than design, purchase, install, and maintain the IT equipment and software as well as pay the operational costs like cooling and power.�� Further, the customer controls the provisioning, modification and termination of the use of this equipment. However, the key difference is multi-tenancy in that the computing resources are shared by customers.�� One of the major downsides to Premises Computing is that in designing the data center, the customer must over-provision resources to allow for the highest usage scenario, which can be several times the resources typically needed.�� Multi-tenancy allows multiple customers to share the same resources thus mitigating the need for over provisioning and allowing for a potentially lucrative business model. Cloud Storage uses the same multi-tenant business model.�� When Google offers a free email account with 7 GBs of storage, it sound like a lot, but the vast majority of people use far less than 1 GB.�� Some Cloud Storage business models will reserve a specified amount of storage of a particular type (e.g. 1,000 GBs of Fibre Channel storage), and although the customer can be assured that no one else will use and possibly corrupt this reserved storage, these options will be costly as there is no sharing of the storage resources.
Cloud Architecture vs Business Model: Who Controls the Data
The key to whether users move in droves to Cloud Computing has to do with where the data resides and who controls and safeguards it.�� The Cloud architecture has been optimized for multi-tenancy.�� Most applications run in Virtual Machines allowing multiple customers to use the same server, operating system, and applications.�� They share storage resources across customers.�� A company runs the risk of losing its valuable data if the hosting company has lax operations or security controls.�� Many customers, especially early on, will opt for Private Clouds where the hardware and software are housed on premises, but use the Cloud Architecture that allows for multi-tenancy within a company, but also allows the company to keep control of its data.�� If Public Cloud vendors, companies offering Cloud Computing services such as Amazon, Rackspace, Microsoft and new entrants such as HP, become competitive enough both in terms of pricing and safeguarding customer data, a large amount of computing could shift to the Cloud.�� This would shield users from making decisions about what vendor to use for computing, storage, networking, databases, etc. and this could reduce demand for name brand computing and storage suppliers.�� In fact, on April 7, Facebook revealed the Open Compute Project whereby it “open sourced” the architecture of the servers used in and the high level design of its datacenters.�� Suffice it to say, there was no brand name hardware, storage or networking mentioned in the announcement. So far, forecasts of Cloud Computing have been small as much must be overcome before Public Cloud Computing and Storage can drive a major shift in IT spending (e.g. Security, trust in the quality of operations, viability of Cloud providers).��
So why does this matter? Fibre Channel to fade a bit, Ethernet to grow with higher bandwidth
Storage data is going to grow a great deal going forward.�� We expect unstructured data to grow faster as more and more data is published to the web, and to internal intranet sites using tools such as Microsoft’s SharePoint.�� Fibre Channel is expensive and is tied to structured data.�� To date, Google has not moved to 10GbE to connect their servers and storage together.�� With their model, the two SATA drives do not overcome the constraints of 1GbE.���� This is change as FLASH is more widely used as a caching mechanism in servers, probably along the order of 5-12 GBs per server.�� The FLASH caches will allow faster, lower latency connections that will overrun 1GbE forcing upgrades in their switching infrastructure to 10GbE with the switches needing 40Gb and 100Gb uplinks.�� This same phenomenon will occur at Public Cloud providers.�� Fibre Channel will continue to be used for structured data applications and with increased Wall Street regulations, which inevitably will drive the need for more structured data.���� As the industry begins to make 40Gb and 100Gb connections affordable, especially for transceivers that only need to reach 1 km on single mode fiber, and FCoE and iSCSI become mature, many new applications for structured data will move to Ethernet.��
Further, we are increasingly seeing SFP+ as the connection of choice for 10GbE.�� Cisco announced its first rack servers last week and the 10GbE LOM Option interface used SFP+.�� As 10GBASE-T has shifted to 2014 and beyond, SFP+ has a chance to become entrenched as the standard server connection.�� At the OIDA conference at Stanford this week, Donn Lee of Facebook stated that he sees the cheapest way to connect to 10GbE is using SFP+ DA Cables, and 10Gb transceivers used in Google and Facebook and Cloud computing is not marked up as these companies purchase transceivers directly from the manufacturers.�� As storage shifts to Ethernet, the BER (bit error rate) of 10GBASE-T make it less usable for structured storage applications.�� We will be watching this space as we forecast the market for connectivity between servers, switches and storage.�� Look for a report coming later this year on Storage connectivity featuring our views and forecasts of Fibre Channel, FCoE, and iSCSI.
About Kimball Brown
Kimball Brown is the VP and Senior Datacom Analyst with LightCounting; his focus is on the connectivity of servers, switches, and storage, and the associated semiconductors in the enterprise networking market.
LightCounting is a market research company focused on in-depth study of the optical communications market. Our research covers the whole supply chain including components, modules, systems, and their applications. Most of our analysis is based on confidential sales data provided exclusively to LightCounting by leading component and module suppliers.
For more information: www.LightCounting.com or 408.962.4851.