Click here to subscribe to an e-mail digest of The Forensic Brief blog.
By Adam Cohen
Of all the nebulous, ill-defined terms being thrown around in today’s technology-driven environment — including “cloud computing” and “social media” — “big data” is jockeying for first place as the most overused. It sounds scary and ambiguous, and in some ways it is. But it is hardly a novel idea. Nevertheless, big data does have implications for electronic discovery (eDiscovery).
Generally, big data has three defining characteristics: volume, variety and velocity. Volume simply refers to the ever-increasing amount of data being generated by sources like text messaging, social media, transactional data and any other form of digital communication and record keeping. The growth of volume is compounded by the trend of decreasing storage cost. This lends itself naturally to entities and individuals keeping data that they might otherwise discard because of its lack of utility.
The next characteristic of big data is variety. Variety refers to the fact that sources of electronic information are multiplying at an accelerated rate. Just think of all of the new technology and gadgets like the increasing variety of tablets and smartphones in a variety of different operating systems with a variety of different apps, or the expanding number of social media services along with the new functionality being added onto them. It is safe to assume this trend will continue moving forward.
Finally, big data incorporates the concept of data “velocity.” In this context, velocity refers to the speed with which data is being generated. This phenomenon in turn requires an increase in the speed and sophistication of the technical tools applied to the data to make it useful or to effect legal compliance.
The problem of exploding volume and variety of data, as well as the need to find new tools and computing power to deal with them, poses serious challenges to compliance with eDiscovery obligations. However, this is not a sudden change, where big data suddenly burst on the scene, but rather a steadily growing problem that started to develop with the advent of computing. No doubt there has been and will continue to be acceleration in the growth of volume, variety and velocity.
Now that we have defined the meaning of big data, let’s consider its implications for conducting eDiscovery by addressing its three defining characteristics in turn.
- Volume: This characteristic is the central problem in eDiscovery; it is the fundamental factor driving cost. The Electronic Discovery Reference Model depicts the volume of data decreasing as the process of electronic discovery unfolds, while simultaneously the proportion of relevant data increases. This reflects the fact that the whole process of eDiscovery is largely about ingesting the universe of potentially relevant data and applying people and technology to carve out what is needed for discovery and evidence. As data volume continues to increase, the challenges of preserving, collecting, processing, reviewing, analyzing and producing it grow in parallel. As the information volume increases, so does the task of culling and processing the relevant information
- Variety: Another factor negatively impacting eDiscovery, and a major component of risk in the process, is the widespread distribution of a variety of sources of electronic information. Compliance with eDiscovery obligations requires identifying each of these sources and having a plan and a process for dealing with each source. The greater the variety of types of sources, the easier it is to overlook sources, thereby failing to cast the preservation net wide enough and opening the door to potential missed opportunities.. Moreover, new forms of electronic information mean that new technologies need to be developed or old technologies need to be modified to accommodate them for purposes of collecting, processing, reviewing, analyzing and producing data from them.
- Velocity: eDiscovery has always represented a race between volume and variety on the one hand and technical tools for properly handling it in a cost-effective manner on the other. Most recently, computer-assisted review tools have been introduced that aim to automate the document review process significantly, thereby ameliorating to some degree the most expensive part of the eDiscovery process — document review, which was previously the exclusive domain of lawyers charging hourly fees. In addition, sophisticated data analytics tools are helping to funnel large volumes of data stored in multiple types of database systems quickly and with visualization features that assist in making insight into the data easier and more comprehensible.
Big data is an all encompassing term that defines the massive growth in volume and variety of data, as well as the speed of its development and the corresponding speed required of technology to handle it. Big data suggests that eDiscovery will continue to present new and difficult challenges at a rapid pace. However, the technology for addressing these challenges also will develop rapidly. Whether it will develop rapidly enough to enable eDiscovery to withstand the pace of big data remains to be seen, but if history is any indication, there is reason for optimism.
The views expressed herein are those of the author and do not necessarily reflect the views of Ernst & Young LLP.
 Douglas, Laney. “3D Data Management: Controlling Data Volume, Velocity and Variety”. Gartner. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Retrieved 6 February 2001.