Innovation Files has moved! For ITIF's quick takes, quips, and commentary on the latest in tech policy, go to

big data

Sergey Brin, co-founder of Google, and General Michael Hayden, former Director of the CIA and NSA

Privacy Advocates Set Their Sights on the Wrong G-Men

In an op-ed in last Friday’s Washington Post, FTC Commissioner Julie Brill, bemoaned the data-driven economy, equating the data scientists in Silicon Valley with the spooks at Fort Meade.

Unfortunately, she is not the first to do so. Since the exposure of the government’s PRISM program, veteran privacy activists have been conflating the intelligence community’s questionable, closed-door electronic surveillance program with the voluntary, open, and legitimate collection of personal data by the private sector. Chris Hoofnagle at the Berkeley Center for Law and Technology states, “What’s happening now is the logical outcome of a leave-it-to-the-market public policy agenda, which left the private sector’s hands unbound to collect data for the government.” And John Podesta at the Center for American Progress argues that after Edward Snowden’s revelations, the government “should not only examine NSA surveillance activities and the laws governing them, but also private-sector activities and telecommunications technology more generally.” Some critics have even gone so far as to blame innovation and technology. Writing in Salon, Andrew Leonard placed the blame directly on the technology: “By making it economically feasible to extract meaning from the massive streams of

Read the rest

Data Science Is Not PRISM: In Defense of Analytics

In the wake of the leaks that revealed the National Security Agency’s (NSA’s) PRISM surveillance program, several recent articles have responded with criticism of “big data.” “The advantages of big data could prove to be ephemeral,” author Andre Mouton writes in USA Today, but “the costs…will probably be sticking around.” And Andrew Leonard at Salon directly blames the technology, writing, “By making it economically feasible to extract meaning from the massive streams of data that increasingly define our online existence, [distributed processing platform] Hadoop effectively enabled the surveillance state.”

Pictured: Michael Flowers, civic data icon and Analytics Director of the City of New York’s Office of Policy and Strategic Planning. Photo: DataGotham

But criticizing “big data” itself is a curious thing. In its original form, “big data” was just a catchall term for those technologies—borrowed mostly from statistics and computer science—which still worked on data analysis problems that would overload a typical processor. The connotation of “big” as in “big tobacco” was added retroactively. Many practitioners prefer the broader term “data science” for this very reason: they aren’t members of some kind of shadowy syndicate. They aren’t even in

Read the rest

Book Review of “Big Data: A Revolution That Will Transform How We Live, Work and Think”

There have been a number of attempts to chronicle exactly what is “big data” and why anyone should care.  Last year’s The Human Face of Big Data by Rick Smolan and Jennifer Erwitt focused on telling the personal stories behind big data (and accompanied these stories with some great photographs). The year before, James Gleick wrote The Information: A History, A Theory, A Flood which chronicled how information (and not just big data) has changed our world. The latest entrant is Big Data: A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-Schönberger and Kenneth Cukier which focuses heavily on explaining some of the more interesting impacts of living in a big data world. (Personally, I’m still not a fan of the term big data because 1) the term scares off people who think this is equivalent to “Big Oil” and 2) the term underrepresents the innovation happening around “small” data. But since this is the term used in the book, I’ll stick with it for this review.)

The first part of this book provides a fairly compelling vision of how big data is changing how

Read the rest


5 Q’s on Data Innovation with Dr. Dan Riskin

Dr. Riskin is the CEO of Health Fidelity, a leading provider of natural language processing solutions. He is also a Consulting Assistant Professor of Surgery at Stanford University and practices one day a week out of the Stanford affiliate hospitals. I recently had the opportunity to get his thoughts on how data-driven innovations are transforming the health care industry.

Castro: In what ways do you see data changing health care today?

Riskin: Data is used daily to define a new generation of healthcare. Not only do patients do research on the internet, request medical support by e-mail (in some systems), and share their own medical stories online, but the actual care delivered now includes apps and remote technologies that offer supplemental care. The most fundamental change related to healthcare is the redefinition of practice, often known as data-driven healthcare. Data-driven healthcare is a big data approach to healthcare, leveraging information learned from treating millions of patients to personalize care for the few. This turns a half century of medical practice using evidence based medicine on its head. Instead of defining care for millions based on a randomized trial performed

Read the rest

Bill Day

5 Q’s on Data Innovation with Bill Day

Bill Day is the platform evangelist for RunKeeper, a Boston-based start-up that helps users track and obtain their fitness goals. I asked Bill to share with me his thoughts on how data is changing how people exercise, work towards fitness goals, and monitor their health. Castro: As a runner myself, I am a huge fan of RunKeeper. Can you tell me how RunKeeper got started? Day: One of our founders, Jason Jacobs, was training for a marathon and realized that there had to be a better way to track and understand his training and performance than the very limited options available at the time. He pulled together a small team to build an iPhone app to solve that problem, and the timing was great as we were able to launch in the very early days of the App Store. Castro: RunKeeper recently launched the Health Graph platform. Can you explain what that is? ... Read the rest

mark whitehorn

Five Q’s on Data Innovation with Mark Whitehorn

Mark Whitehorn is s the professor of analytics at the University of Dundee’s School of Computing in Scotland and the author of ten books on business intelligence. I spoke to Mark about how higher-ed programs are adapting to new demands in the era of Big Data. Castro: What kinds of skills do data scientists need? Whitehorn: They need to be intelligent! Oh, I see, you want specifics! They need to be good at designing new analytical techniques and be able to code them. The job also includes general skills (e.g., excellent analytical capabilities, machine learning, data mining, statistics, math, algorithm development, writing coding, data visualisation, and understanding multi-dimensional database design and implementation) and specific skills such as technologies to handle big data (e.g., Hadoop and related technologies, MapReduce and its implementation on differing software platforms, and NoSQl databases) and knowledge of languages (e.g., SQL, MDX, R, and functional and OOP languages such as Erlang and Java). ... Read the rest