Assistant Professor, School of Electrical and Computer Engineering, Yazd University, Yazd, Iran
Healthcare providers may need to publish their operational data for consultation as well as to allow more researches. Consequently, a lot of personal specific data with high level of details are publicly available. This data may contain time series, such as ECG. De-identification of time series is not enough to provide the requirement of privacy preservation. It is because, if a few numbers of time series are published, then appearing specific anomalies in them may reveal the sensitive information of an individual. The problem of privacy preserved time series publication is somewhat studied, but the issues of publishing the Ngrams of the time series, especially that of extracted from a small set of time series, are not considered well before. In this paper, we address this problem and define the k-anonymity principle for the Ngram. The proposed schema aims to provide the k-anonymization by repeating the rare n-grams to hide them in the crowd of frequent n-grams. We evaluate our method by using two datasets. Results of experiments show that our method can provide the requested anonymity level with low probability and entropy information loss.