Monday, November 3, 2014

User identification by his/her keyboard typing pattern

I would like to show my next masterpiece :-) It would be about biometrics - How to identify a user by it's keyboard typing pattern. Basic approach is not very complex. At first you need to record typing duration of symbols in phrase which user types on keyboard. There are two kinds of typing duration - one is time between key press and key release events, and second - time between key presses of two adjacent symbols in typing phrase. So we accumulate these durations into some list or array and we will some sort of signal. We also need to modify this signal a bit, because it contains noise - some random fluctuation of user's typing pattern. So before we could use this duration info - there is intermediate step - convert typing durations signal into frequency domain using FFT transformation. After FFT tranformation we filter-out signal components with high frequency and convert back filtered signal from frequency domain into time domain by FFT transformation. Once we have typing durations with basic typing pattern with small amount of noise - we save this signal in file and later compare it with user's current typing duration pattern. So user identification algorithm is this:
  1. Record current typing durations of symbols and durations between symbols in a typing phrase
  2. Convert these durations signal into frequency domain using FFT
  3. Filter-out high frequency signal components
  4. Convert back signal from frequency domain into time domain
  5. Compare this filtered signal with the one recorded from user previously and stored in file.
Some notes about step (5):
There can be multiple choices about how we can compare two signals. I have chosen simple approach - count How much slopes in these two separate curves has same direction, i.e. they both should increase or both should decrease. Number of same direction slopes is divided from total number of slopes and we get curves matching factor. When it will be bigger than some tolerance level - we can say that user who typed phrase is the same who typed previously recorded phrase. I have prepared proof-of-concept Java application which records/detects/shows user typing pattern. You can download it and as always - use as you wish. A little bit of explanation how Java code is structured:
  • FftBase.java
  • Helper.java
  • SimpleBiometricApplication.java

  • FftBase.java

    has functions for performing direct and inverse FFT operation. I downloaded this module from this developer.

    Helper.java

    groups bunch of functions which are used together with FFT operation, such as "filterOutHighFrequencies" (removes high frequency noise from signal in frequency domain), "normalizeFftOutput" (scales Y axis of FFT operation into 0..1 range), "extractTypingPattern" (converts typing pattern into freq. domain, removes noise, converts back into time domain, scales output to 0..1 range and returns data to the user), "loadTypingPattern" (loads recorded typing pattern from CSV file into double array), "generateCsvFileForTypingDurations" (saves typing pattern into CSV file).

    SimpleBiometricApplication.java

    Swing application which uses above modules for recording/detecting user typing pattern. Typing patterns are also pictured in a graph using JfreeChart java library. It was my first experience with Swing. At first I thought to use JavaFX, which is cool also and more configurable than Swing, but at the moment I didn't found GUI builder for Fx and because Swing is well known and used in java developing industry - I decided to learn Swing a bit. It was nice experience that you can set custom code in Swing builder which modifies some GUI component property. I just miss the feature that this custom code editor could accept anonymous function for setting some property. Now it just accepts such arguments which are accepted by Swing component some "set..." method. Probably the problem was that at the time when Swing was written - java had no anonymous functions - they could came later. And custom code for setting some property can be lengthy. It is good when this swing component set... method accepts some class - when you can write in editor anonymous class and pass it to the custom code editor. But this not always helps, because accepted class in set method parameters can implement such interface from which you CAN'T cast back into class accepted by set method. For example - i needed to modify JFrame bounds property which accepts Rectangle. Rectangle implements Shape interface. So I thought I will pass custom anonymous class made from Shape into setBounds method accepting Rectangle. But I couldn't do that because the reason was that Shape can't be converted to Rectangle class, no matter that Rectangle itself implements Shape. Comma operator in Java would help also in this case, but we don't have comma operator (at least for now). But otherwise GUI builder is very helpful, has a lot of components.

    And finally - how my Swing form looks like - when typing pattern is detected for the user:

    If user is the same who typed original recorded message - you will see such messageBox:

    For me most interested coding parts was related with FFT stuff and signal comparison. You should like it also !
    Have fun in signal processing !

    4 comments:

    1. Replies
      1. Code is already posted here. Check the article more carefully and you'll find link to the code named "download it"

        Delete
    2. Great reading. Can the typing pattern be recognised on a phrase not known in advance? Thanks

      ReplyDelete
    3. Technically - yes. In English language there are `26^2 = 676` bigrams. Of which 7 are impossible. So you need to measure `26+26^2-7 = 695` typing durations. But this is unlikely for practical reasons. For that you need to feed to the user very big block of text to be able to extract his/her typing speed on all English bigrams. So i think that it would be enough to measure typing speed of just 50 most frequent English bigrams (or even less). To get most frequent bigrams you can get from here:
      http://norvig.com/mayzner.html
      So idea is to feed block of some big text to the user and count the average typing durations on most frequent N bigrams in English.
      Then draw a curve `duration(bigram)` sorted in `X` axis by bigram frequency in English language, which would be user's typing pattern

      ReplyDelete

    Comment will be posted after comment moderation.
    Thank you for your appreciation.