All good things must come to an end and due to a need to pick up family members from the airport today, this going to have to be my last session for this conference. I'm not entirely sure why this is but Smita seems to be in the closing sessions at STP events. Thus Smita Mishra will be the last speaker I'll be live blogging for this conference. My thanks for everyone involved in putting on a great event and for inviting me to participate. With that, let's close this out.
Data is both ubiquitous and mysterious. There are a lot of details we can look at and keep track of. There is literally a sea of data surrounding our every action and expectation. What is important is not so much the aggregation of data but the synthesis of that data, making sense of the collection, separating the signal from the noise.
Data Scientists and Software testers share a lot of traits. In many cases, they are interchangeable terms. Data and the analysis of data is a critical domain inside the classic scientific method. Without data, we don't have sufficient information to make an informed decision.
When we talk about "Big Data" what we are really looking at are the opportunities to parse and distill down all of the data that surrounds us and utilize it for special purposes. Data can have four influences:
Data flows through a number of levels, from chaos to order, from Uncertainty to certainty. Smita uses the following order; Uncertainty, Awakening, Enlightenment, Wisdom, Certainty. We can also look at the levels of quality for data: Embryonic, Infancy, Adolescence, Young Adult, Maturity. To give these levels human attributes, we can use the following; Clueless, Emerging, Frenzy Stabilizing Controlled. In short, we move from immaturity to maturity, not knowing to know for certain, etc.
Dat can likewise be associated with its source; Availability, Usability, Reliability, Relevance and the ability to present the data. Additionally, it needs to have an urgency to be seen and to be synthesized. Just having a volume of data doesn't mean that it's "Big Data". The total collection of tweets is a mass of information. It's a lot of information to be sure but it's just that, we can consider it "matter unorganized". The process of going from unorganized matter to organized matter is the effective implementation of Big Data tools.
Dat can sound daunting but it doesn't need to be scary. It all comes down to the way that we think about it and the way that we use it.