Blog
The following are the Blog Posts that I have writen since 2021, they are mostly hosted in my Medium page, you can have access to any of them by clicking on the link attached to each of the articles. ---------------------------------------------------------------------------------------------------------------------
-
How to set up a Python script to run in a Jupyter Notebook on the cloud
When scheduling our scripts to run automatically, we came up with the issue that the machine where in order to run it, needs to be always…
-
Data Extraction from APIs: From ETL to Visualization
Traditionally, APIs are tools that make a website’s data digestible for a computer. Through it, a computer can view and edit data, just like a person can by loading pages and submitting forms…
-
Subway Data ETL Pipeline: Part I
Traditionally, APIs are tools that make a website’s data digestible for a computer. Through it, a computer can view and edit data, just like a person can by loading pages and submitting forms…
-
Subway Data ETL Pipeline: Part II
A brief tutorial on how to extract, transform, and load data from wikipedia with webscraping and pandas…
-
Sending Data to Google Sheets Using Python: Data Manipulation and Querying
The motivation for writing this tutorial is to provide readers with the tools to extract, load, manipulate, and organize data into a familiar format: Google Spreadsheets.…
-
Bollinger Bands: How they work and how to calculate them
Bollinger Bands® are a technical analysis tool developed by John Bollinger for generating oversold or overbought signals…
-
Postman vs. Requests
When requesting information from the web with an API call, one fundamental resource is Postman.…
-
Geopy: Getting Geo Localization From Addresses
I started to work with geo localization coordinates as a project request in my current work. That request consisted of getting the coordinates.…
-
Nominal and Real Wages Analysis With pandas
Wages refer to the compensation paid to an individual after the successful completion of a task assigned…
-
From OLAP Data Cubes to Redshift
When computer-based data analytics began picking up in the early 90s, we began hearing about “data cubes”…
-
Setting Up a WSL and PySpark Data Engineering
This article will kick off a series of tutorials about how to set up and use several tools for data engineering…
-
Starting Data Analysis with PySpark
This is the second part of a series on setting up a data engineering environment. In case you missed the first part…
-
Building Batch and Streaming Apps with Spark
Take advantage of PySpark and PyData libraries by building an app that analyzes data with Spark…
-
Understanding Awk for Text Processing
A practical guide on pattern scanning with a text-processing language…
-
Data Wrangling in the Command Line
Learning how to obtain and manipulate data right in the command line with Unix Power Tools…
-
Loading Data from S3 to AWS Athena
One of the benefits of storing data in S3 buckets, is that you can ingest it and then access it in any number of ways…
-
Constructing and Visualizing Datagrids in Kangas
A comprehensive introductory tutorial on how to create your Datagrids and then manipulate, classify, and visualize in Kangas UI…
-
Image Visualization with Kangas
Applying built-in functions from Kangas UI to Hugging Face DataGrids…
-
NLP for text classification
A complete tutorial on how to construct a customized text classifier based on NLTK python package…
-
NLP Techniques for Text Normalization. Part I
Lately, I have been interested in researching about the NLP field and the different techniques used to extract knowledge from texts…
-
NLP Techniques for Text Normalization. Part II
This is the second part of the NLP tutorial referred to techniques for text normalization…
-
Information extraction algorithms using Named Entity Recognition
This is the third part of the NLP tutorial that includes Tokenization, Lemmatization, Stemming, Part-of-Speach (POS), Named Entity…
-
Getting Started with Airbyte: An Introductory Tutorial for Beginners
Data integration and management are essential aspects of modern business operations…
-
Amazon DynamoDB Unleashed: Complete tutorial for beginners
In this tutorial, we’ll dive deep into Amazon DynamoDB, a fast and fully managed NoSQL database service designed for seamless scalability…
-
Analyzing Git Repositories with PyDriller
Unlocking Git Insights: A Comprehensive Guide to Analyzing Repositories with PyDriller
-
BERT Language Model and Transformers
The following is a brief tutorial on how BERT and Transformers work in NLP-based analysis using the Masked Language Model (MLM)
-
Unlocking Data Insights with Rill: A Comprehensive Guide to Streamlined Data Analytics
Harness the Power of Rill for Real-time Data Processing and Analytics
-
Simplifying Database Documentation with dbdocs & dbdiagram
Streamline Data Management and Collaboration with Effective Documentation and Cataloging
-
>
Building Powerful Models with Rill: A Comprehensive Guide to Data Modeling
Start Data Modeling with this powerful BI-as-code tool
-
>
Transforming Data Engineering: A Deep Dive into dbt with DuckDB
Empowering Analytics and Data Science with Modern Data Transformation
-
>
Unleashing Real-Time Data Analytics with Apache Druid: A Comprehensive Tutorial
Lightning-fast, Interactive Data Exploration by harnessing the power of Apache Druid