Horje
Python Program to Find the Number of Unique Words in Text File

Given a  text file, write a python program to find the number of unique words in the given text file in Python.

Examples:

Input: gfg.txt
Output: 18

Contents of gfg.txt: GeeksforGeeks was created with a goal in mind to 
 provide well written well thought and well
explained solutions for selected questions

Explanation:
Frequency of words in the file are {'horje': 1, 'was': 1, 'created': 1, 
'with': 1, 'a': 1, 'goal': 1,
'in': 1, 'mind': 1, 'to': 1, 'provide': 1, 'well': 3, 'written': 1, 'thought': 1, 
'and': 1, 'explained': 1,
'solutions': 1, 'for': 1, 'selected': 1, 'questions': 1}

Count of unique words are 18.

Approach:

  •  Create a file object using the open function and pass the filename as a parameter.
  •  Read the contents in the file as a string using the read() function and convert the string to lowercase using the lower() function.
  •  Split the file contents into words using the split() function.
  •  Create a dictionary for counting the number of occurrences of each word.
  •  Create a counter variable to count a number of unique words.
  •  Traverse the dictionary and increment the counter for every unique word.
  •  Close the file object.
  •  Return the count.

Below is the implementation of the above approach.

Python3

# Function to count  the number of unique words
# in the given text file.
  
  
def countUniqueWords(fileName):
    # Create a file object using open
    # function and pass filename as parameter.
    file = open(fileName, 'r')
    # Read file contents as string and convert to lowercase.
    read_file = file.read().lower()
    words_in_file = read_file.split()  
    # Creating a dictionary for counting number of occurrences.
    count_map = {}
    for i in words_in_file:
        if i in count_map:
            count_map[i] += 1  
        else:
            count_map[i] = 1
    count = 0
    # Traverse the dictionary and increment
    # the counter for every unique word.
    for i in count_map:
        if count_map[i] == 1:
            count += 1
    file.close()
    return count  # Return the count.
  
  
# Creating sample text file for testing
with open("gfg.txt", "w") as file:  
    file.write("GeeksforGeeks was created with\
    a goal in mind to provide well written well \
    thought and well explained solutions\
    for selected questions")
  
print('Number of unique words in the file are:'
      countUniqueWords('gfg.txt'))

Output:

Number of unique words in the file are: 18

Complexity analysis.:

N is the number of words.

Time complexity: O(n)

Space Complexity: O(n)




Reffered: https://www.geeksforgeeks.org


Python

Related
Python program to find the smallest number in a file Python program to find the smallest number in a file
Why do we pass __name__ to the Flask class? Why do we pass __name__ to the Flask class?
Change case of all characters in a .txt file using Python Change case of all characters in a .txt file using Python
How to install Python packages with requirements.txt How to install Python packages with requirements.txt
Difference between Numpy array and Numpy matrix Difference between Numpy array and Numpy matrix

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
9