Google Interview Question – Print Message - 去年 Google 的面试题 - 打印消息

This is my technical phone interview question from Google (last year). Google has offices in London but the call was from Google Switzerland (+41). The interview lasts for 45 minutes. 

The engineer didn’t provide any interface (so you can come up with the function signature). We don’t need to store all the messages and the frequency for the function call could be several per second.  

00:00:01 foo
00:00:02 bar
00:00:06 bar // should not print
..
..
00:00:10 foo // should not print
00:00:11 foo // print again

USING HASHMAP

Using a hashmap to store the printed messages and their corresponding time is trivial but this will have a memory issue (Out-Of-Memory) if this function is running for weeks. Hashmap will grow, especially that all the messages are unique!  

USING QUEUE+HASHMAP

We can use queue to store the messages in last 10 seconds. We can use hashmap/set to store the messages printed, in order to speed up lookup. This method is the combination of both queue and hashmap/set. The following C++ code demonstrates the idea. We use the int type to represent the time type for simplification.

// member variables
unordered_set<int> cache;
queue<pair<time, int>> last10;
 void PrintMessage(int time, string msg) {
    // compute the string hash as a 32-bit integer
    int hash = compute_hash(msg);
    // remove invalid entries
    while (!last10.empty()) {
        auto key = last10.front();
        if (time - key.first >= 10) {
            last10.pop();
            cache.erase(key.second); // remove from hash set
        } else {
            break;
        }
    }  
    if (cache.count(hash)) {
        return; //  printed in last 10 seconds
    }
    // we can print the message now
    cout << msg << endl;
    // inserting the entry
    cache.insert(hash);
    last10.push(make_pair(time, hash));
}

I did make some mistakes in writing the code on the Google Docs without an actual Programming IDE!It would be good that someone can make testing data for this question. To adapt it for Online Judge, we can re-write the function signature to:  

bool PrintMessage(int time, string msg); // return whether to print this message

The engineer also asks me the test cases if I want to write unit tests for it. I can think of three cases:

  1. Empty Messages, what to do?
  2. 11 foos
  3. foo, bar, foo bar … after 6 times

Do you have more corner cases to test? Please comment below.

去年我参加了 Google 的初面(电话面试), 可惜没有通过. Google 瑞士的一个软件工程师打电话面试, 电话面试就考了一道算法题, 虽然我也准备了近一个月的时间, 但是我回答的并不完美.

虽然和我联系的Google 是在伦敦, 但是面试的时候手机上显示的是 +41 电话 来自 Google 瑞士, 整个面试大约45分钟.

题目是:

给了一些消息 和对应的日期和时间, 如果消息并不在最近10秒钟内打印过 那么就打印. 同时有可能多条消息到达(1秒之内).

就这么一个题目并没有指定接口,  而我们也不需要把所有消息都保存起来, 并且我们知道  这个 打印函数可能一秒内被调用多次: 

00:00:01 foo
00:00:02 bar
00:00:06 bar // should not print
..
..
00:00:10 foo // should not print
00:00:11 foo // print again

我的第一个想法就是使用哈希表 HashMap, 但是这里有一个问题就是: 如果这个系统运行了好久好久, 这么一来, 内存就会不够了, 特别是到达的消息都是唯一的话.

面试官在我给出上面这个初步的答案后并不满意, 当然他会给提示指出问题(out of memory), 我在思考后给出了一个用队列+哈希表的方案:

我的方案是: 用队列来保存最近10秒打印过的消息, 并且我们用哈希表来加速查找. 以下C++代码就是这种组合的方式: 

// member variables
unordered_set<int> cache;
queue<pair<time, int>> last10;
void PrintMessage(int time, string msg) {
    // 把消息字符串变成32位的哈希值
    int hash = compute_hash(msg);
    // 去除超过10秒的记录
    while (!last10.empty()) {
        auto key = last10.front();
        if (time - key.first >= 10) {
            last10.pop();
            cache.erase(key.second);
        } else {
            break;
        }
    }  
   if (cache.count(hash)) {
        return; //  已经在过去10秒钟打印过了
    }
    // 打印消息
    cout << msg << endl;
    // 插入消息
    cache.insert(hash);
    last10.push(make_pair(time, hash));
}

我并不是一下子就把代码写对, 有一些小细节的错误.  虽然没再进入下一轮面试, 但是还是体验了一把.

第二阶段就是问了单元测试相关知识, 比如上面的方法 可以定义接口为: 

bool PrintMessage(int time, string msg); // 返回是否打印

那么测试用例(Test Cases) 可以是:

  1. 空字符串
  2. 11 个 foo
  3. 6 次 foo, bar, foo bar

您还有什么比较刁钻的测试用例么? 在下面评论吧.

Thank you for reading my post, feel free to FOLLOW and Upvote @justyy which motivates me to create more quality posts.

非常感谢阅读, 欢迎FOLLOW和Upvote @justyy 能激励我创作更多更好的内容.        

// 根据我的中文博客英文博文整理而得.

近期热贴 Recent Popular Posts 

  1. Quick R Tutorial – How to Plot Sigmoid Function using R? 如何用R语言画Sigmoid函数?
  2. The Chess AI - Model Base or Machine Learning 浅谈棋类博弃的两种实现方式:模式化和机器学习 
  3. Software Engineer Interview Tips - Using Hashtable to Reduce Runtime Complexity 软件工程师面试技巧之 使用哈希表降复杂度 
  4. 记录那些值得回忆的精彩瞬间
  5. I wrote a Chinese Chess Program 软件分享: 智慧中国象棋 (Chinese Chess)
  6. One Day Visit to Fen Drayton Lake (Photography) - 村庄附近的 Fen Drayton 湖
  7. One Night in Luton. 回到Luton十一年前打工的酒巴Brookes喝酒
  8. One Day Travel (Photography) Fen Drayton Lake + St Ives 英国 Fen Drayton Lake 坐公交到 St Ives 游玩
  9. What is Data Overfitting in Machine Learning? 机器学习中的过拟现象

H2
H3
H4
3 columns
2 columns
1 column
9 Comments