DNA Sequence Analyzer - Problem

You are given a DNA sequence represented as a string containing only the characters 'A', 'T', 'G', and 'C'. Find all repeated DNA subsequences of length K that appear more than once in the sequence.

For each repeated subsequence, return:

  • The subsequence string
  • Its frequency (number of occurrences)
  • All starting positions where it appears

Return format: An array of objects, where each object has three properties: sequence (string), frequency (integer), and positions (array of integers).

Note: Return results sorted by frequency in descending order, then by sequence lexicographically if frequencies are equal.

Input & Output

Example 1 — Basic Repeated Patterns
$ Input: dna = "AGATCGATCGA", k = 3
Output: [{"sequence":"ATC","frequency":2,"positions":[2,5]},{"sequence":"CGA","frequency":2,"positions":[4,8]}]
💡 Note: Pattern 'ATC' appears at positions 2 and 5. Pattern 'CGA' appears at positions 4 and 8. Both have frequency 2, sorted lexicographically: ATC comes before CGA.
Example 2 — Single Character Repeats
$ Input: dna = "AAAAAAAAAA", k = 2
Output: [{"sequence":"AA","frequency":9,"positions":[0,1,2,3,4,5,6,7,8]}]
💡 Note: In a string of 10 A's, the pattern 'AA' appears at every position from 0 to 8, giving it a frequency of 9.
Example 3 — No Repeated Patterns
$ Input: dna = "ATCG", k = 2
Output: []
💡 Note: Patterns are 'AT', 'TC', 'CG' - each appears only once. No pattern has frequency > 1, so return empty array.

Constraints

  • 1 ≤ dna.length ≤ 104
  • 1 ≤ k ≤ dna.length
  • dna contains only characters 'A', 'T', 'G', 'C'

Visualization

Tap to expand
INPUT DNA SEQUENCEAGATCGATCGALength: 11, K: 3AGATCGATExtract all 3-letter patternsfrom DNA sequenceSLIDING WINDOW ALGORITHM1Slide K=3 window through DNA2Extract pattern at each position3Update count in hash map4Record position for each patternHash Map ProgressAGA: count=1, pos=[0]GAT: count=1, pos=[1]ATC: count=2, pos=[2,5]TCG: count=1, pos=[3]CGA: count=2, pos=[4,8]FINAL RESULTSRepeated Patterns FoundATCFrequency: 2, Positions: [2,5]At: AG[ATC]GA[ATC]GACGAFrequency: 2, Positions: [4,8]At: AGAT[CGA]TC[CGA]✓ 2 patterns with frequency > 1Sorted by frequency, then lexicographicallyKey Insight:Using a sliding window with hash map eliminates redundant scanning - track all patterns in one pass through the DNA sequence instead of rescanning for each unique pattern.TutorialsPoint - DNA Sequence Analyzer | Sliding Window Approach
Asked in
Google 35 Amazon 28 Microsoft 22 Facebook 18
23.5K Views
Medium Frequency
~25 min Avg. Time
892 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen