FileSystem Design

Q1. list of files that contain the inputWord ("hello")
Q2. Sorted list of files in terms of number of occurences of the inputWord

I can think of two methods. We can use simple Map also that internally used red black tree but the avg. time complexity for look up is O(logn) for Map and O(1) for HashMap.

1. **First method**

inputWord ="hello";
vector <pair< int, string> > v;
        for file[i] in listOfFiles
        {
        	HashMap<string,int> hm=new HashMap<>();
    	while(!eof)
    	{
    		string line=readLine(file);
    		for each word in line
    		{
    			if(hm.contains(word))
    			{
    				int val=getValue from hm using word as key
    				value++;
    				update hm with word as key and value as val;
    			}
    			else
    				insert in hm with value as 1
    			word="";
    		}
    			
    	}
    	string fileName="file "+ to_string(i);
    	v.push_back(make_pair( hm.get(inputWord),fileName);
    }
sort(v.begin(),v.end());
This code solves both Q1 and Q2.
For Q2, we can sort the vector. If we want non-increasing order, then we can pass greater<>() function as an argument or build our own comparator function while calling the inbuilt sort function. 
This code helps in fetching  results for different queries in constant time. The pre-processing needs to be done only once.

Question 1
Time Complexity: O(n)
Space complexity : O(m) (no need to store in vector.. we can just print/return at the end of file reading)
**Note: n=number of characters, m=number of words, k=number of files**

Question 2:
Time Complexity: O(n)+ O(klog(k))
Space complexity : O(m)+ O(k);

for Q queries, Total Complexity
Question 1: 
Time Complexity: O(n)
Question 2: 
Time Complexity: O(n)+Q*(klog(k))
Space complexity : O(m)+ Q*O(k)

2. **Second method**
This is the first thing that comes to mind after listening the question. 

inputWord ="hello";
vector <pair< int, string> > v;
for file[i] in listOfFiles
{
	int count=0;
	while(!eof)
	{
		string line=readLine(file);
		for each word in line
			if word==inputWord
			count++;
	}
	string fileName="file "+ to_string(i);
	v.push_back(make_pair(hm.get(inputWord), fileName);
}
Time Complexity: O(n)+O(klog(k))
Space complexity : O(k)
Note: n=number of characters, k=number of files

But for Q queries,
Time Complexity: Q*(O(n)+O(klog(k)))
1 Like