My rating: 4 of 5 stars
There are so many ways the numbers may be skewed. With the right data transformation, exclusions or imputations, the numbers can be manipulated to tell the story the researcher wants the data to tell. Always check raw data, the assumptions and methods used to transform or normalize the data and the statistical techniques selected to analyze the data.
The book gives many examples of data manipulated for some advantage, law school deans fudge the numbers to get higher law school rankings. Groupon shows the benefit of advertising with them, but Groupon looks at the number as a whole and does not break out current customers that take advantage of the discount from those that are net new customers.
The epilogue shows two data challenges I am very familiar with. To bad he does not have any quick fix for these: How do you get one system to accept the dates from another system as a date variable, not text or numeric? How do we categorize thousands of keywords into useful groups in a reasonable amount of time, especially considering these are always changing?
Fung reminds us that big data has nothing to say about causation, many things are correlated without one causing the other. He also demonstrates how statistical significance does not prove the results are important, tiny numbers with little real impact can be statistical significant.
Overall, I think the book was a good read. It had great examples for social data, marketing data, economics data and fantasy football.