Horje
genfromtxt() function in NumPy

In Python, Numpy is a library that is created in order to perform scientific computations in Python. It provides support for arrays and it also supports basic matrix computations ( matrix multiplication, additions, subtraction, etc). In the Numpy library, we have various sets of functions such as .array(), .arrange(), .zeros(), .ones(), etc.

In this article, we will briefly explore Numpy.genfromtxt function. We will see its multiple use cases along with various examples and explanations.

What is Numpy genfromtxt() function?

In the Numpy library, numpy.genfromtxt function is used to read data from any text file and convert it into Numpy array. It is generally used in handling data with some missing or inconsistent values. This function has various fields of applications such as in the machine learning field, where missing or inconsistent data must be filled with some values to perform operations. It is also used in various other fields such as financial analysis, geospatial data processing, scientific research, etc. It makes the process of cleaning, processing, and transforming data from any file very easier.

Syntax: numpy.genfromtxt()

Parameters and Their Uses

In this, we will discuss all the parameters of the above function. We will see all the parameters briefly.

  • fname: This is the filename of the file on which we will be performing operations.
  • dtype: This specifies the data type of our resulting array. Its default value is float.
  • comments: The character that defines each line of a comment. By default, it’s #.
  • delimiter: The string or character separating two or more values. By default, it’s whitespace, but we can specify it as “,” or “.”.
  • skip_header: The number of lines function has to skip at the start of the file.
  • skip_footer: The number of lines function has to skip at the end of the file.
  • converters: This will perform mapping column numbers to functions that convert the data in those columns.
  • missing_values: This specifies what strings should be considered in place of missing values.
  • filling_values: This specifies the value used to fill in place of missing data.
  • usecols: This indicates which columns to read. For example, usecols=(0, 2) this will exactly reads only the first and third columns.
  • names: If it is True then the field names are taken from the first line after the header.
  • unpack: If it is True then the returned array is transposed.
  • replace_space : This indicates which type of character will be replacing each whitespace.
  • max_rows : This indicates the maximum rows our function need to read from the data.
  • encoding : This parameter is used to decode the string data in the file.
  • like: This helps us to compare two objects in a file.

Basic Usage of Numpy.genfromtxt

In this example, we will see a very basic use case of the above function. We will read data from a dummy file and display the output.

gfg.txt (our dummy file)

GeeksforGeeks is a leading platform that provides computer science resources and coding challenges for programmers and technology enthusiasts, along with interview and exam preparations for upcoming aspirants.
With a strong emphasis on enhancing coding skills and knowledge, it has become a trusted destination for over 12 million plus registered users worldwide.
The platform offers a vast collection of tutorials, practice problems, interview tutorials, articles, and courses, covering various domains of computer science.

Python
#importing numpy library
import numpy as np

#driver code
if __name__ == "__main__":
  data = np.genfromtxt("gfg.txt", dtype=str, encoding = None, delimiter=",")
  #displaying the data
  for i in data:
    print(i,end=" ")

Output

gfgnumpyt

reading data from text-file.

Advance Usage of Numpy.genfromtxt

In this we will fill the absent or inconsistent values from the files while converting it into numpy array. After converting the data, we will display the data with the help of for-loop.

Dummy File ( data.csv)

gfgexcel

data.csv

Example

We first import data from our dummy file (data.csv). We can notice that there are some missing values in the data. For example age is missing in id 3, coding score is missing in 2 etc. We will fill those values with a default value 1, for column with float values, we will fill them with default value 1.0. We will also perform conversion on the values. If the value’s column type is ‘int’, we will perform explicit type casting and try to convert values in the column to ‘int‘ if the values do not convert it into ‘int’, we will fill those values with some default values. At the end, we will display the numpy array data through for-loop.

Python
#importing the numpy library
import numpy as np

#driver code
if __name__ == "__main__":
    
    #performing the function and fill the required parameters
    data = np.genfromtxt(
        'data.csv', 
        delimiter=',',  
        dtype= [
            ('id', int),
            ('name', 'U10'), 
            ('age', int),
            ('codingscore', float),
            ('totalscore', float),  
            ('potd', int)
            ], 
        names=True, 
        converters={
            2: lambda x: int(x) if x else 1, 
            3: lambda x: float(x) if x else 1.0,
            4: lambda x: float(x) if x else 1.0  
        },
        missing_values={'age': '', 'codingscore': '', 'totalscore': '', 'potd': ''}, 
        filling_values={'age': 1, 'codingscore': 1.0, 'totalscore': 1.0, 'potd': 1}
    )
    #displaying the data
    for i in data:
        print(i)

Output

gfgexcel01

filled values in the vacant spaces

Conclusion

In Python, numpy library is a famous library which is generally used to perform scientific computations and mathematical computations. In numpy library we have various functions such as .arrange(), .arrays(), .zeros(), .ones() etc. We have briefly discussed one of its function Numpy.genfromtxt. We have seen its parameters with their uses . It is generally used data preprocessing, scientific computations and many more. We have briefly discussed how we can fill the default values and perform conversion in place of missing values.




Reffered: https://www.geeksforgeeks.org


Python

Related
How to Extend User Model in Django How to Extend User Model in Django
How Can I Enable Python in Excel? How Can I Enable Python in Excel?
Maximizing Django Efficiency: Using Asynchronous Signals with Huey and SQLite Maximizing Django Efficiency: Using Asynchronous Signals with Huey and SQLite
Student Results Management System Using Tkinter Student Results Management System Using Tkinter
Replicate results from numpy.random in Dask Replicate results from numpy.random in Dask

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
19