SQL For Data Science (Coursera)

Overview

SQL is one of the most basic skills for developers, DBAs, and all the data team (analysts, engineers, and scientists), but constantly it is underestimated. In general, people are reluctant to keep different languages in the same project to avoid issues with professionals, and sometimes that forces the use of ORMs when the fastest option would be to run some stored procedures. No problem here looking by the perspective of management: if all engineers could use the same technology, there are less problems with work allocation and situations of absence (vacations for a small number of people who master a key technology means they can't all leave at the same time).

Growth of cloud services and introduction of new tools to query massive amounts of data have made systems designs that delegate a large portion of business logic to the databases and SQL based tools popular again. Concepts like "data lakes" and "data warehouses" are starting to be applied in smaller companies (not only big techs) and those are dependent on good performance. 

So, to put in a nutshell, SQL is growing again as a necessary tool to design modern systems - specially in data engineering. 

This is the description of the course obtained in Coursera:




Pros

This course has an environment where we can run the SQL queries and get results. This is interesting because a big obstacle to overcome when learning SQL is installing and configuring the database (unless you are using MySQL, of course =). Let's look for an example at a typical Postges installation: you install the database, must start it - which if you are windows will probably run as a service - and then you have to use a tool such as pgAdmin to connect to the server. But pgAdmin is actually a web app which runs locally and you have to use it in your browser. That is a long path before you can write your first "SELECT"s, right?

Cons

One thing that appeared a little out of the course scope is the graded project. First, I have the impression it is not mandatory for graduation, so why have it there in the first place? Second, some part of it already expect you to know a few Data Science concepts. I mean, the title has a "for Data Science" in it, but all the exercises until the final project are about SQL only. This transition could be a little more gradual.

Conclusion

This is a great course, specially to grasp basic SQL concepts. The course instructor has great communication skills and provide good practical examples. Also, the practical exams are awesome to really digest all the theory in the lessons. 

Comments

Popular posts from this blog

Distributed Computing with Spark SQL (Coursera)

Dealing With Large Features in Git Repos

Foreign Visitors in Brazil - 2005 to 2015 - Part II