Tech Solutions Library > Making Sense of Big Data

White Paper

Making Sense of Big Data

 

 

With the right solutions, agencies can turn federal data into actionable intelligence. 

 
Executive Summary

Federal agencies, like organizations in virtually every sector, are handling more data than ever before. According to Cisco Systems, global IP traffic is expected to more than double in the span of only a few years — growing to a monthly per-capita total of 25 gigabytes by 2020 (up from 10GB per capita in 2015). 

This data boom creates obvious challenges for agencies that must find a way to process and store all of this new information. But it also presents a massive opportunity to find new efficiencies, detect previously unseen patterns and increase levels of service to citizens.  

Simplifying Data Complexity

The billions of bits of information that add up to create “Big Data” come from a variety of sources: the mobile devices that agency workers carry with them; connected IT systems; networked sensors in Internet of Things deployments; electronic forms completed by the citizens that an agency serves; and the vast number of inspection, compliance and other records kept by most agencies. Additionally, agencies can extract real-time insights from the massive amount of data generated by websites, business applications, social media platforms, application servers, hypervisors, traditional databases and open-source data stores. 

With a single cross-country airplane flight capable of generating as much as 240 terabytes of data, most organizations won’t have to hunt too hard for rich sources of information. As any IT professional who has had to expand storage, computing and networking resources in recent years can attest, most organizations will have to deal with Big Data whether they’re leveraging that data to create new insights or not.

The larger challenge is harnessing all of this data in a coherent and organized manner, processing it to arrive at new conclusions about an organization’s operational model and how it delivers services, implementing an analytics system that can help to differentiate between relevant patterns and meaningless noise, and then devising an action plan based on these new insights. 

The complexity alone of Big Data solutions presents a significant challenge. Because each organization’s data analytics needs are unique, Big Data tools must be tailored to an agency’s specific environment in order to provide the highest possible value. Technical challenges create another hurdle. While many organizations already have the storage, computing and networking equipment necessary to support a Big Data initiative, others may need to expand or upgrade their resources, and tying all of these systems into one integrated solution can pose a challenge for even savvy IT departments. Finally, the additional regulations faced by government agencies can create challenges, as organizations must be careful not to run afoul of rules governing how to store and safeguard sensitive data. 

While the challenges of getting a Big Data initiative off the ground are considerable, the potential payoff is enormous. These solutions can help to deliver a central, unified view of IT operations and services; detect patterns, highlight anomalies and pinpoint areas of impact; improve security through threat intelligence; analyze behavior for predicting attacks and threats; conduct analysis in real time; and create actionable intelligence. 

To illustrate just one example: A Big Data analytics solution can automatically monitor and look for anomalies in IT user behavior, alerting administrators when rogue devices are connected to the network, when someone changes system configurations or when a user swipes an access badge two hours before he or she is supposed to report for work and then engages in a period of heavy downloading. This sort of monitoring can give IT administrators confidence in the integrity of their systems and can help to keep enterprise IT environments secure, while requiring little in the way of manual monitoring. 

 
Real-World Applications

Several agencies are already making headway in Big Data applications. 

Performance diagnostics: Pacific Northwest National Laboratory, a Richland, Wash., laboratory managed by the Energy Department's Office of Science, used Splunk IT Service Intelligence to develop its own dashboards that provide real-time visibility and contextual insights into key performance indicators. The laboratory is looking to use the tool to not only identify problems, but also to automate responses. 

Improving fire response: The Federal Emergency Management Agency, the U.S. Fire Administration and the National Fire Incident Reporting System partnered to develop an analytical prototype and worked with four regional fire departments to use Big Data analytics to study approximately 225 million fire incidents. The effort helped the agencies to identify trends and patterns in incident types, equipment failures and firefighter casualties, and delivered new insights on how to improve training and reduce losses. 

Helping consumers: The Consumer Financial Protection Bureau uses text and log analytics from consumers’ interactions with the agency’s website, social media and online forums to study their behavior patterns. Through this work, the agency was able to help convince FICO, a credit rating service, to change its credit score algorithms by reducing the weight of medical debt in scoring, which lifted scores for many consumers. 

 
The Building Blocks of Big Data

Big Data analytics can’t exist in a vacuum. Because of the enormous quantities of data involved in these solutions, they must incorporate a robust infrastructure for storage, processing and networking, in addition to analytics software. While some organizations already have the capacity in place to absorb Big Data solutions, others will need to expand resources to accommodate these new tools, or else add new capacity to allow for a continued surplus of resources. This truly is a situation in which the chain is only as strong as its weakest link; if storage and networking are in place, but the processing power isn’t there — or vice versa — a Big Data solution simply won’t be able to function properly. 

Storage: Often, organizations already possess enough storage in-house to support a Big Data initiative. (After all, the data that will be processed and analyzed via a Big Data solution is already living somewhere.) However, agencies may decide to invest in storage solutions that are optimized for Big Data. While not necessary for all Big Data deployments, flash storage is especially attractive due to its performance advantages and high availability. 

Large users of Big Data — companies such as Google and Facebook — utilize hyperscale computing environments, which are made up of commodity servers with direct-attached storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered network-attached storage (NAS). 

Cloud storage is an option for disaster recovery and backups of on-premises Big Data solutions. While the cloud is also available as a primary source of storage, many organizations — especially large ones — find that the expense of constantly transporting data to the cloud makes this option less cost-effective than on-premises storage. 

Processing: Servers intended for Big Data analytics must have enough processing power to support this application. Some analytics vendors, such as Splunk, offer cloud processing options, which can be especially attractive to agencies that experience seasonal peaks. If an agency has quarterly filing deadlines, for example, that organization might securely spin up on-demand processing power in the cloud to process the wave of data that comes in around those dates, while relying on on-premises processing resources to handle the steadier, day-to-day demands. 

Analytics software: Agencies must select Big Data analytics products based not only on what functions the software can complete, but also on factors such as data security and ease of use. One popular function of Big Data analytics software is predictive analytics — the analysis of current data to make predictions about the future. Predictive analytics are already used across a number of fields, including actuarial science, marketing and financial services. Government applications include fraud detection, capacity planning and child protection, with some child welfare agencies using the technology to flag high-risk cases. 

Many agencies have already begun to test Big Data applications or put them into production. In 2012, the Obama administration announced the Big Data Research and Development Initiative, which aims to advance state-of-the-art core Big Data projects, accelerate discovery in science and engineering, strengthen national security, transform teaching and learning, and expand the workforce needed to develop and utilize Big Data technologies. The initiative involved a number of agencies, including the White House Office of Science and Technology Policy, the National Science Foundation, the National Institutes of Health, the Defense Department, the Defense Advanced Research Projects Agency, the Energy Department, the Health and Human Services Department and the U.S. Geological Survey. 

Networking: The massive quantities of information that must be shuttled back and forth in a Big Data initiative require robust networking hardware. Many organizations are already operating with networking hardware that facilitates 10 gigabit connections, and may have to make only minor modifications — such as the installation of new ports — to accommodate a Big Data initiative. Securing network transports is an essential step in any upgrade, especially for traffic that crosses network boundaries.

$48.6 billion 

The market for Big Data services and technology projected for 2019

SOURCE: IDC, “Worldwide Big Data Technology and Services Forecast, 2015-2019,” November 2015

 
Purpose-Built Solutions

Some vendors have developed product offerings and reference architectures that are intended specifically for Big Data platforms, incorporating storage, computing and analytics into an integrated solution. 

These purpose-built solutions allow organizations to right-size the appropriate resources for a Big Data deployment, often for a specific use case, such as log aggregation. A purpose-built solution ensures that an organization doesn’t over- or underprovision resources for its Big Data project. This model also makes it easy for organizations to deploy replicable, self-contained solutions across a large number of sites — an ability that becomes useful if an agency wants to perform log aggregation, for example, at a number of regional offices throughout the country. Because these solutions are self-contained, they can be segregated from the rest of an organization’s networking, helping to provide an extra layer of security. 

Software vendor Splunk offers purpose-built solutions for various functions, including enterprise security, IT service intelligence and user behavior analytics.

The Value of a Big Data Partner

As is often the case with emerging IT tools, many organizations seeking to deploy Big Data solutions find that they can benefit from partnerships with third-party organizations that understand their needs, the emerging technology and how to design and implement an integrated solution within the specific context and environment of the agency. 

Among the benefits a partner can provide:

Expertise that an agency doesn’t have on staff: Many agencies simply don’t have the experts on staff necessary to deploy a Big Data solution — or, if an organization does have people in those roles, it may not have enough of them. A partner can connect data scientists, vendors, solution architects and agency IT leaders to orchestrate a comprehensive Big Data analytics solution. This model is often more efficient for agencies than hiring full-time data scientists and other specialists, and contractors with specialized expertise can often complete projects in less time than in-house IT personnel who must work to learn about new technologies at the same time they are working to implement them. 

Experience and expertise in a specific industry: Federal agencies face a number of unique challenges and restrictions as they deploy IT solutions, from legacy systems to data security requirements to complicated procurement processes. Agencies can seek outside help with a Big Data initiative. Some partners are familiar with government customers and have experience navigating complex IT projects in this specialized environment. 

Tailoring a solution to carry out an agency’s specific mission: A partner like CDW can work with a software vendor such as Splunk to create a dashboard that meets an agency’s specific needs. Often, an organization rolling out a Big Data solution for the first time will have only a general idea of the ways in which the tool can be used to create value. An experienced partner will bring informed ideas about how to best leverage Big Data analytics in a way that helps agencies to better carry out their missions. 

Finding the right partner is often a key step toward turning an agency’s data into actionable intelligence that improves its ability to achieve its mission.

48% 

The percentage of organizations that invested in Big Data solutions in 2016

SOURCE: Gartner, “Gartner Survey Reveals Investment in Big Data Is Up but Fewer Organizations Plan to Invest,” October 2016


Featured Partners

Splunk® software makes it simple to collect, analyze and act upon the untapped value of the machine data generated by your technology infrastructure, security systems and business applications — giving you the insights to drive operational performance and business results. 

 

The CDW Approach


 

Call us at 888.808.4239 to set up a consultation with a federal expert.