Posts

Presentation

Image
  When we think about the skills required by developers, we never consider the soft skill to present an idea to others. This is not something easy to train, but it is definitely something that can be trained. Drawing One of the most important features is to sketch some visual representation of your idea. The most used (and my best recommendation) is draw.io . It has a lot of easy-to-use features, a large library with Cloud provider symbols that will make your sketch shine, and of course, it is free. It also integrates with Confluence ( the Wiki in JIRA), so you can also use it to write documentation for your tasks. The second recommendation here is another free online tool called Excalidraw . This one does not have a lot of exterior images, but it is the simplest and most productive one. You have to focus only on the content, no time to waste with cosmetics. This one is my favorite to use in video calls. Text editing in MD Another great-to-have skill you have to learn is adding some Ma

Dealing With Large Features in Git Repos

Image
There is not a "correct" way to organize your servers and git repos and each company has its own problem and solutions. Some companies use test servers to run some very basic verifications in their codebase before deploying new features to production. Some even have a staging server that is as similar to production as possible, where new features are tested for a last time before deploying to prod. Those servers usually run deployed code that is in the development, staging, and master/main branches in the git repo. This would be what the usual flow of development looks like: Another good practice many companies adopt is the versioning of the code to keep track of what is deployed on each server. There is a lot about versioning alone that should be discussed in other posts, but let us adopt the format major.minor.build for our example. Dealing with Bugs in Tests Now, let us imagine that a bug was discovered in one of your features when it was being tested in the Test Server. W

FastAPI and Flask: Choosing One

Image
  Yes, you got that right, this is yet another text comparing two similar frameworks which are pretty solid, well known, and have a large fan-boy base everywhere. So, why bother? First, and most importantly, this is my page and I choose what I write here we eventually are faced with that sort of comparison in our daily lives as software engineers, and sometimes we need to have some ground reasons to write technical documentation.  What is Flask and FastAPI? Basically, both are lightweight web application frameworks for Python. Probably you already knew that right? Let's start talking about FastAPI . The first contact I had with FastAPI was a few years ago and it was an existing company project which did not use all the features the framework offered. It has been noticeable how much has changed in those couple of years during the last time I tried to build something in the FastAPI a few months ago. FastAPI has a lot to offer if you plan to build microservices that may use some famo

SQL For Data Science (Coursera)

Image
Overview SQL is one of the most basic skills for developers, DBAs, and all the data team (analysts, engineers, and scientists), but constantly it is underestimated. In general, people are reluctant to keep different languages in the same project to avoid issues with professionals, and sometimes that forces the use of ORMs when the fastest option would be to run some stored procedures. No problem here looking by the perspective of management: if all engineers could use the same technology, there are less problems with work allocation and situations of absence (vacations for a small number of people who master a key technology means they can't all leave at the same time). Growth of cloud services and introduction of new tools to query massive amounts of data have made systems designs that delegate a large portion of business logic to the databases and SQL based tools popular again. Concepts like "data lakes" and "data warehouses" are starting to be applied in smal

Distributed Computing with Spark SQL (Coursera)

Image
Overview This course was a great positive surprise regarding the way the practical exams worked. You can think of this as a great introduction to Databricks, along with Spark, and a fact I think few people know: you can use Databricks for free if you create a community account. Spark is one of the most important tools (if not the most important) for large data processing (a.k.a. Big Data). The main advantage it provides over traditional data processing tools is the possibility of horizontal scaling, which makes scalability almost limitless. Spark is a basic tool for Data Engineers and one of the largest problems one can face when learning Data Engineering is the problems we are trying to solve are usually for big corporations with a huge pile of data, which often involves high costs to simulate such situations. This course is part of the SQL Basics for Data Science specialization. Here are the course details from Coursera: Pros As mentioned before, the best part of this course was the

Building Batch Data Pipelines on GCP (Coursera)

Image
Overview This is another course that Google developed with Coursera and has quizzes and practical lab activities. Batch processing ( the "opposite" of streaming processing) is a way to process huge amounts of data that are available in the form of files, locally, or in buckets in the cloud. The main advantage of modern batch processing techniques involves the usage of parallelism for such tasks, which greatly improves performance and scalability. Goole Cloud has several tools to approach this problem, and I was surprised to know that it is possible to use Spark in GCP Dataproc. Pros The great difference between this course and the "regular" Coursera ones is the integration with Qwik labs, which lets the student access GCP and run all exercises there. This is definitely an advantage for students who are reluctant to inform credit card numbers (and depending on the age, don't even have a credit card) in order to test and learn GCP. Another great difference is that

Smart Analytics, Machine Learning, and AI on GCP (Coursera)

Image
Overview This is a new kind of post I have been planning to start writing for a while: giving a short review of courses I take online. The first one is the  Smart Analytics, Machine Learning, and AI on GCP , which I did using the Financial Aid Coursera provides. If you speak Portuguese and would like to know more about how to get this and also find peers to do courses together and motivate each other, consider joining this Facebook group !  The course is pretty short, as it consists of only two weeks.  Pros I think the major pro here are the practical examples of how to use Machine Learning algorithms directly in BigQuery using plain SQL. This adds a lot of value for Data Engineers who have a good SQL knowledge. It is possible to train a model and even make predictions. This is one feature of BigQuery I would never dream existed. Cons Well, some of the examples are a bit outdated, and many use existing one table databases, which does not teach one of the most important parts of handli

Foreign Visitors in Brazil - 2005 to 2015 - Part III

Image
This is the conclusion of the study on our beloved "gringos". In order to better understand the big picture, we analyse the distribution of total visitors per continent per access method. The first impression is that if visitors enter Brazil by air more than twice they enter by land, if we see continents individually, South America has almost the same number of visitors in both access methods. One interesting point we can see in this graph is the large number of europeans entering Brazil by land. We will further study the two continents with the most visitors: Europe and South America. The next graph shows which countries in South America visits Brazil the most. Our “hermanos” from Argentina won this one. The next graph attempts to correlate countries from South America and the Brazilian State they used to enter Brazil, per year, all done by land. The scale had to be enlarged to make some details visible. Arrivals equals to zero have been removed from the graph, t

Foreign Visitors in Brazil - 2005 to 2015 - Part II

Image
So this is the continuation of our previous analysis of foreign visitors from 2005 to 2015. The first thing we see is how the variable of interest (number of arrivals) behaves through the years. It is observable that there has been a slight increase in the number of foreign visitors since 2013. It is also possible to see how the accumulated visits are distributed through the continents on the graph below. From this graph we can understand that this increase is specially because of South America. For the other two continents with most visitors, Europe and North America, there has been a slight decrease in arrivals. Below we have the graph of arrivals per continent in this 10 year window. South America and Europe have by far the larger number of visitors. Next we observe how the sum of arrivals is ditributed through the months of the year. As expected, there is a larger number of arrivals during summer time in Brazil. However, there is an unexplained increase in July. So far, one possib