Horje
How to convert tab-separated file into a dataframe using Python

In this article, we will learn how to convert a TSV file into a data frame using Python and the Pandas library.

A TSV (Tab-Separated Values) file is a plain text file where data is organized in rows and columns, with each column separated by a tab character.

  • It is a type of delimiter-separated file, similar to CSV (Comma-Separated Values).
  • Tab-separated files are commonly used in data manipulation and analysis, and being able to convert them into a data frame can greatly enhance our ability to work with structured data efficiently.

Methods to Convert Tab-Separated Files into a Data Frame

Method 1: Using pandas ‘read_csv()’ with ‘sep’ parameter

In this method, we will use the Pandas library to read a tab-separated file into a data frame.

Look at the following code snippet.

  • We have imported the pandas library and defined the path of the tab-separated file.
  • Then, we use ‘pd.read_csv()’ function to read the contents of the tab-separated file into a DataFrame and specified that the file is tab-separated using “sep =’\t'”
  • The ‘read_csv()' function automatically detects the delimiter and parses the file accordingly.
Python
import pandas as pd
file_path = "file.tsv"
df = pd.read_csv(file_path,sep='\t')
df.head()

Output:

    0    50    5    881250949
0 0 172 5 881250949
1 0 133 1 881250949
2 196 242 3 881250949
3 186 302 3 891717742
4 22 377 1 878887116

Method 2: Using pandas ‘read_table()’ function

In the following code snippet, we have again used the pandas library in Python to read the contents of a tab-separated file named ‘file.tsv’ into a DataFrame named ‘df’. The pd.read_table() function is employed for this task, which automatically infers the tab separator.

Python
import pandas as pd
df = pd.read_table('file.tsv')
df.head()

Output:

    0    50    5    881250949
0 0 172 5 881250949
1 0 133 1 881250949
2 196 242 3 881250949
3 186 302 3 891717742
4 22 377 1 878887116

Method 3: Using csv module

The code example, begin by importing the csv module, which provides functionality for reading and writing CSV files.

  • Uses the open() function to open the file specified by file_path in read-only mode ('r'). Utilized the with statement to ensure proper file closure after reading.
  • Creates a CSV reader object using csv.reader(file, delimiter=’\t’), specifing that the values in the file are tab-separated.
Python
import csv
file_path = "file.tsv"
with open(file_path, 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    df = pd.DataFrame(reader)
df.head()

Output:

    0    1    2    3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742

Method 4: Use ‘numpy’ to load the data and then convert to a DataFrame

This code segment employs NumPy’s ‘genfromtxt()’ function to import tab-separated data from ‘file.tsv’ into a NumPy array, configuring the tab delimiter and data type. Following this, it converts the NumPy array into a pandas DataFrame, facilitating structured data representation for further analysis and manipulation.

Python
import numpy as np
import pandas as pd
data = np.genfromtxt('file.tsv', delimiter='\t', dtype=None, encoding=None)
df = pd.DataFrame(data)
df.head()

Output:

     0    1  2          3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742

How to convert tab-separated file into a dataframe using Python – FAQs

How to Write a DataFrame to a Tab Separated File in Python

You can easily write a DataFrame to a tab-separated file using the to_csv function in pandas, specifying the tab character ('\t') as the delimiter:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Occupation': ['Engineer', 'Doctor', 'Artist']
})

# Write DataFrame to a TSV file
df.to_csv('output.tsv', sep='\t', index=False)

How to Convert Tab Separated Text File to CSV in Python

To convert a TSV file to a CSV file, you can read the TSV with pandas and then write it out as a CSV:

# Read the TSV file
df = pd.read_csv('output.tsv', sep='\t')

# Write to a CSV file
df.to_csv('output.csv', index=False)

How to Import a Separated Tab File in Python

You can import a tab-separated file using pandas.read_csv by specifying the tab ('\t') as the delimiter:

# Import a tab-separated file
df = pd.read_csv('data.tsv', sep='\t')
print(df)

How to Convert a Tab Delimited String to a List in Python

If you have a string that’s tab-delimited, you can convert it into a list by using the split() method:

# Example tab-delimited string
tab_string = "Alice\t25\tEngineer"

# Convert to a list
data_list = tab_string.split('\t')
print(data_list)

How to Save a Separated Tab File

Saving a tab-separated file has already been covered above with df.to_csv('filename.tsv', sep='\t'), which will save your DataFrame as a TSV.

How to Read a Separated File in Python

Reading a file with any custom separator can be done with pandas.read_csv by specifying the appropriate delimiter:

# General example for a semicolon-separated file
df_semicolon = pd.read_csv('data_semicolon.csv', sep=';')
print(df_semicolon)



Reffered: https://www.geeksforgeeks.org


Pandas

Related
How To Read Space-Delimited Files In Pandas How To Read Space-Delimited Files In Pandas
Pandas Convert JSON to DataFrame Pandas Convert JSON to DataFrame
Numpy Reshape 2D To 3D Array Numpy Reshape 2D To 3D Array
Unnest (Explode) Multiple List Columns In A Pandas Dataframe Unnest (Explode) Multiple List Columns In A Pandas Dataframe
DataFrame vs Series in Pandas DataFrame vs Series in Pandas

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
11