introduction to libraries in python
. Introduction
Morning full of surprises! I found someone with mushroom-style hair and a chubby body sticking a scrapbook containing material points about data analysis on the wall of my desk.
“Learn in 3 days,” he said briefly and then left. Challenges come repeatedly, is this what it feels like to achieve a dream? I take a deep breath, I won’t give up!
libraries in python are a collection of open-source code that can be called into Python and used to assist computation. The basic python libraries used for data analysis include NUMPY, PANDAS, SCIPY, and Matplotlib with different functionalities. This module will describe the use of each library along with examples of use cases of the library in solving real cases in the world of work.
- NumPy
NumPy is a package that works in Python programming language. NumPy is short for numerical python. As the name suggests, NumPy is used to process numerical/scientific data. The processed data can be in the form of a multidimensional array.
At first, Jim Hugunin developed Numeric, the ancestor of NumPy. Then in 2015, Travis Oliphant Developed NumPy, by including all the features of its predecessor, Num array into a numeric processing package.
By using NumPy, a programmer can perform various kinds of numerical processing, including:
. Mathematical and logical operations in an array,
.Performing the Fourier transform, and,
. Perform linear algebraic operations.
To be able to use NumPy first we have to do the installation first, Run on the terminal the following code.
pip install NumPy
After that, we can call NumPy by using the import command on the python file we want.
import NumPy as np
2. Pandas
Pandas is a library that makes it easy to manipulate, clean, and analyze data structures, By using pandas, you can take advantage of five main features in data processing and analysis, namely load, prepare, manipulate, modeling, and data analysis.
Pandas use the concept of an array from NumPy but assign an index to the array, so it is called a series or data frame. So we can say Pandas stores data in dictionary-based NumPy arrays. 1-Dimensional labeled arrays are named Series. While the 2-Dimensional is called a Data Frame.
According to the author of the book Python for Data Analysis and creator of Pandas, Wes McKinney, the name pandas is based on panel data, which is an econometric term for structured multidimensional data, and is based on a word that is a functional library itself, namely Python data analysis.
To get started, first import the pandas PD library. The use of as here means that we replace the pandas call with the prefix PD for the next process.
import pandas as PD
3. SciPy
SciPy is built to work with NumPy arrays and provides many user-friendly and efficient numerical computations such as routines for numerical integration, differentiation and optimization.
Both NumPy and SciPy run on all operating systems, are quick to install and free. NumPy and SciPy are easy to use, but powerful enough to be relied on by some of the world’s leading data scientists and researchers.
4. Matplotlib
Furthermore, there is Matplotlib in the third position a python library used for data visualization. Matplotlib was originally written by John D Hunter and released in 2003. Until now Matplotlib continues to grow and integrates with many libraries for other data visualization such as Seaborn, ggplot, and many more. Matplotlib has many features for various types of data visualization. The types of data visualization in Matplotlib include basic plots, plots of arrays, statistics plots, and plots for unstructured coordinates.
Well, maybe that’s all I can share in this post about the python library for data science.