LeetCode 187: Repeated DNA Sequences in Go

Programming & Development / April 9, 2025

LeetCode 187: Repeated DNA Sequences in Go

Go Golang DNA Sequences Repeated Sequences Substring HashMap LeetCode 187

Introduction

LeetCode 187 asks you to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. The problem asks you to return a list of all such sequences, and you need to solve it efficiently using the given constraints.

Problem Statement

Given a string s that represents a DNA sequence, you need to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule and return them.

Function Signature:

go

func findRepeatedDnaSequences(s string) []string

Example 1:

go

s := "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
result := findRepeatedDnaSequences(s)
fmt.Println(result) // Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Example 2:

go

s := "AAAAAAAAAAAAA"
result := findRepeatedDnaSequences(s)
fmt.Println(result) // Output: ["AAAAAAAAAA"]

Constraints:

1 <= s.length <= 10^5
s[i] is one of 'A', 'C', 'G', or 'T'.

Approach

The problem involves detecting repeated 10-character-long substrings from the input string s. Here’s how we can efficiently solve this problem:

Steps:

Sliding Window Approach:

We need to examine every 10-character-long substring in the given string.
To do this, we can slide a window of size 10 across the string.

Use a HashMap:

A HashMap (or map in Go) can be used to track the frequency of each 10-character-long sequence.
If a sequence appears more than once, it is added to the result list.

Edge Case Considerations:

If the string length is less than 10, return an empty list as there are no 10-character sequences.

Optimization:

Since we need to check each substring in a string of length up to 10510^5105, the time complexity must be optimal. Using a sliding window combined with a HashMap ensures that we only make a constant-time operation for each sliding step, giving us a time complexity of O(n), where n is the length of the string.

Time Complexity:

O(n), where n is the length of the string s. We iterate through the string and perform constant-time operations (checking substrings, inserting in map) for each sliding window.

Space Complexity:

O(n) due to the space used by the HashMap to store substrings.

Go Implementation

go

package main

import "fmt"

func findRepeatedDnaSequences(s string) []string {
    // Map to store the frequency of each 10-letter substring
    seen := make(map[string]int)
    result := []string{}

    // Slide through the string, considering every substring of length 10
    for i := 0; i <= len(s)-10; i++ {
        substr := s[i:i+10]
        // If the substring has been seen before, add it to the result
        if seen[substr] == 1 {
            result = append(result, substr)
        }
        // Increment the count of the substring in the map
        seen[substr]++
    }

    return result
}

func main() {
    // Test case 1
    s1 := "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
    result1 := findRepeatedDnaSequences(s1)
    fmt.Println(result1) // Output: ["AAAAACCCCC", "CCCCCAAAAA"]

    // Test case 2
    s2 := "AAAAAAAAAAAAA"
    result2 := findRepeatedDnaSequences(s2)
    fmt.Println(result2) // Output: ["AAAAAAAAAA"]
}

Explanation of the Code:

1. findRepeatedDnaSequences function:

Step 1: Create a map called seen to store the frequency of each 10-character substring.
Step 2: Iterate through the string s from index 0 to len(s)-10. For each position, extract a 10-character substring and check if it has already appeared. If it has appeared exactly once before, we add it to the result list.
Step 3: If the substring is encountered for the first time, we store it in the seen map and continue.

The map[string]int stores substrings as keys and their frequencies as values. When we encounter a substring for the second time, it is added to the result list.

2. Main Function:

In the main function, we test the findRepeatedDnaSequences function with two examples to check the correctness of the implementation.

Example Walkthrough

Example 1:

Input: "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

We iterate over the string and extract the following substrings:

"AAAAACCCCC" appears first time.
"CCCCCAAAAA" appears first time.
Repeating substrings "AAAAACCCCC" and "CCCCCAAAAA" are added to the result when encountered again.

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Example 2:

Input: "AAAAAAAAAAAAA"

We iterate and find the substring "AAAAAAAAAA" multiple times.

Output: ["AAAAAAAAAA"]

Conclusion

LeetCode 187: Repeated DNA Sequences is an efficient problem that requires detecting 10-letter-long substrings that appear more than once in a string. Using a sliding window approach combined with a map data structure, we can solve this problem in O(n) time and O(n) space.

This approach is both time and space efficient and works well for large input sizes, as required by the problem constraints.

Share this article:

Comments

No comments yet

NUHMAN.com