While being busy learning a lot about Data Scientists; I managed to get 8 articles done on Research Orientated Data Scientist. Meanwhile, I also had a look at some advanced courses of Kirill Eremenko and V2 Maestros courses on Udemy.
Let us say that things got clearer and tools to achieve my goals are getting closer to my grasp. Sharing you some vocabulary may help you as much it helped me to get the related knowledge.
What is a Data Scientist? What does a Data Scientist do?
A Data Scientist is a practitioner of Data Science.
Here you go. We should agree that this is not what I was looking for when I started my DoctoRoad thing. So; what could a good definition be?
A Data Scientist improves business outcomes with the power of data by investigating complex businesses problems and delivering observations and solutions. This through the use of mathematics, statistics, IT, data retrieval and programming languages. In order to achieve it, a Data Scientist needs three things: expertise in data engineering, statistics mastery and business domain knowledge.
Now what? Do I need to understand specific jargon?
In addition to this definition, a Data Scientist should master the basic notions of data manipulation:
- Entity : a thing that exists which we research and predict in data science.
- Characteristics: a set of unique properties of which entities are made of.
- Properties: a set of information being business domain context orientated.
- Environment: an ecosystem in which the entity exists or functions.
- Event: a significant business activity in which an entity participates.
- Behavior: what an entity does during an event.
- Outcome: a result of an activity deemed significant by the business.
- Observation: measurement of an event deemed significant by the business.
- Data set: a collection of observations being either structured or unstructured, or even semi-structured.
Hence, Data Scientist collect and work on data sets to learn about entities and predict their future behavior/outcomes.
What else? What is the relationship?
Relationships; that’s the word! Everything is relative in life, isn’t it?
Attributes in a data set exhibit relationships. These relationships “model” the real world and have a logical “explanation”. Those may be either consistent or incidental patterns.
To help you figure out here, is a little example of two entities and related events: A occurs B not, A occurs B occurs, A increases B increases, A increases B decreases ….
On top of that, relationships can involve multiple attributes like: A occurs B increases C decreases, … We will then talk about correlations when two or more attributes linked by effect.
Then drops in the theme of “Learning”. Learning implies finding out about relationships. This leading to “Models”, being the outcome of learning.
Learning is all about building models that can be used to predict outcomes (outputs)using the predictors (inputs).
In conclusion, here is the process line:
- Picking a problem in a specified domain;
- Understanding the problem domain (entities and attributes);
- Collect data sets that represents entities;
- Discover relationships (learning); and
- Build models that represents relationships (predictors & outcomes).
With this in mind; I must admit that it is even more thrilling to pursue my DoctoRoad as Data Science was cut out for me. This journey is now clearer and even more exciting.
Data Science; here I come.