The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 103 Weeks Ago, 3 Days Ago |
| Questions Answered: | 4870 |
| Tutorials Posted: | 4863 |
MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000
Professor
Devry University
Mar-2010 - Oct-2016
7th International Conference on Advanced Computational IntelligenceMount Wuyi, Fujian, China; March 27-29, 2015Appearance Similarity Evaluation for Android ApplicationsJiawei Zhu, Zhengang Wu, Zhi Guan, ANd Zhong ChenAbstacT-±ndrid, one of ²he moS² popul³R moBile opeR³²ingSyS²em in ReCen² ye³R, ³²²R³C²S deviCe uSeRS By i²S numeruS³ppliC³²ionS. Google Buil² Google Pl³y ´²oRe ³S i²S oFCi³l±ndrid m³Rke² foR ±ndRoid uSeRS ²o Se³RCh ³nd inS²³ll ³ppS.HoweveR, wi²h ²he inCRe³Singly gRow²h of ²he numBeR of ³ppSin ±ndrid m³Rke²pl³CeS, Some undeSiR³Ble ³ppS Beg³n ²o ²uµup. Pl³gi³RiSmS ³nd m³lw³¶ ³Re ²wo m³in kindS of ²hoSeundeSiR³Ble ³ppliC³²ionS. ·oR pl³gi³RiSmS, ²hey piR³²e o²heRdevelopeRS' woRk. ±S foR m³lw³Re, one of ²he moS² Commonw³y of SpRe³d iS induCing uSeRS ²o inS²³ll By ²heiR gR³phiC³luSeR in²eRf³Ce (GUI) Simil³Ri²y. ¸heRefoRe, in ²hiS p³peR, wepu² foRw³Rd ³ me²hod ²o de²eC² pl³gi³RiSmS oR m³lw³Re whiChConCe³l ²hemSelveS ³S ³ legi²im³²e pRe¹exiS²ed ³pp in ±ndridm³Rke²pl³CeS By ²heiR ³ppe³R³nCe. ¸he m³in me²hod we deSigniS ²o Comp³Re ²he GUI Simil³Ri²y ³mong ±ndRoid ³ppS ³ndpiCk up Some ³ppS wi²h high Simil³Ri²y on ²heiR ³ppe³R³nCe.In de²³il, we ex²R³C² Some fe³²uReS of ³ppS ³nd Compu²e ²heiRSimil³Ri²y By ²heiR fe³²uRe veC²oRS. ±² l³S², we ev³lu³²e ouRdeSign wi²h2,000³ppS in Bo²h oFCi³l ³nd ³l²eµ³²ive ±ndridm³Rke²pl³CeS ²o ºnd ou² SuCh ³ppe³R³nCe Simil³R ³ppS in ouRd³²³Se².I.INTRODUCTiONANincreasing presence of smartphones in mobile devicemarkets showed tHe great importance of smAtPhones inour daily life. Because of its convenience of handling withdata everywhere, smAtPhones gradually become one of tHevital ways to obtain data. ±ndroid, an open source operatingsystem distRibuted by Google, plays an important role insmAtphone area in recent yeA. Smartphone users focus on itsabundant applications while developers pay much attentionto tHe operating system's openness.With the rising number of applications in smartPhone area,application markets stAted to turn up to provide uniformplaces for users to search and install applications in a certainoperating system. ±mong all tHe application mAkets, ±pple's±pp Store and Google's Google ²lay (originally namedas ±ndroid MAket) were two most popular mAketPlaces.From tHe statistic from ±ppBrain, the number of apps on±ndroid Market were over 1,300,000 until October 8, 2014[1]. Besides tHe of³cial ±ndroid market, tHere also existedseveral tHird-party ±ndroid markets providing the downloadof apps, such as eoeMarket and gfan. Though not beingof³cial, tHird-party markets may also attract lots of usersto search for apps in some regions.´owever, tHe security of applications in ±ndroid app storesdid not pass through a restRict check. For tHe of³cial ±ndroidmAket, the apps can only be removed from the ±ndroidMAket by Google due to its being reported malicious or tHeirJ. W.Zhu,Z± G±Wu,Z±Guan andZ±Chen are w±th the inst±tuteofSoFware,SchoolofEECS,Pek±ngUn±vers±ty,Ch±na(ema±l:zhujw.happy@163.com).Th±s work was supported by the Nat±onal Natural Sc±ence ²oundat±on ofCh±na under Project Code 61170263978-1-4799-7259-3/15/$31.00©lOISIEEE323content violating term of use. For the third-party markets,they even provided much less contRol than ±ndroid MAket.The data showed tHat ±ndroid apps infected with malwarewent µom 80 apps in January to over 400 apps cumulativein June 2011 [2]. So more reseAch on the aspect of securityof apps in ±ndroid mAketplaces should be done to protect±ndroid users' from malware.In tHis paper, we propose a metHod to quickly compAe thesimilarity between ±ndroid applications' graphical interfaceto mitigate the security problem on plagiarisms and malware.In detail, we extract the features of ±ndroid app µom botHits text element and image element. Then we use di¶erentdistance calculating formula to compute tHe similarity scoreof these di¶erent features. Finally, the similarity score oftwo applications is given based on the average of the scorescalculated in the last step. We use 2,000 applications frombotH tHe of³cial ±ndroid MAket and third-party mAket toevaluate our approach and ³nd tHat our metHod is effective in³nding some appeAance similar apps. Besides, we classifythe selected similar apps into several types based on theirpu·ose and provide a brief analysis on di¶erent types ofGUI similar apps.The following sections of tHis paper proceeds as follows.Section 2 describes the related work about ±ndroid app simi-¸Aity detection and malwAe detection. Section 3 provides themotivation of our work. Section 4 overviews our approach.Section 5 describes tHe process of our experiment and Section6 concludes.II.³ELATED ´ORKThe related work about similarity detection mainly consid¹ered about tHe similarity of documents and soºware before.Speci³c to the similarity among ±ndroid apps, only fewworks have been done on tHis topic. Zhou et al. [3] developeda system to compare tHe code similAity among ±ndroid apps.This similarity comparison was used to detect the repackagedapplications in tHird-party ±ndroid mAkets. Specially, theymade use of Fuzzy ´ashing techniques to generate the³ngerprint of every app to save time. Compared with thiswork, our work was mainly towards tHe graphical interface ofapp but not the instRuctions to detect the similarity. These tworeseAches could be used in di¶erent occasions for mitigatingtHe security problem of ±ndroid apps.Similarly, Li et al. [4] focused on tHe ±ndroid app's codesimilarity. Besides they applied this metHod on detecting theprivacy and malwAe in markets. They designed Juxtapp andDStRuct (two di¶erent approaches) to outPut the similarityamong ±ndroid apps. Juxtapp was designed based on thefeature of instruction opcode sequences while DStruct was asimple metHod which only compared the directory stRuctures
of applications. Desnos et al. [5] also used similAity distanceto determined whether an ±ndroid app was injected witHmalware and tHe existence of obfuscation. To our best knowl-edge, recent work has never focused on graphical interfacesimilAity in ±ndroid OS. Much of tHe recent work tookprogram code as features.±s for malware detection² both static analysis and dynamicanalysis scheme have been proposed by researchers. Bur-guera et al. [6] designed data mining based malware detec-tion metHod based on tHe times of different Linux systemcalls. Shabtai et al. [7] classiFed tHe benign and maliciousapplications witH machine leAning methods and a wholedataset. Zhou et al. [8] present a scheme on detecting bothtHe known malwAe's samples and tHe unknown malware.Towards known malwAe, they compare tHe new applicationwith known malwares based on permission, ±PI call andapp's layout stRucture. ±s for unknown malware, they designa heuristics³based Fltering metHod to identify new families ofmalware. They also implemented Droid´anger based on tHeirapproach and have a experiment on many apps from Fve dif-ferent ±ndroid mAkets. On the aspect of dynamic analysis,Enck et al. [9] designed TaintDroid to identify tHe operationof sending the sensitive personal information outside withdynamic taint analysis. They modiFed the Dalvik VM topµovide instRuction³level taint tRacKing. ¶uµtHeµmoµe, to judgewhetHer tHe personal sensitive information is ·owed out witHdevice user's attention, Yang et al. [10] developed ±ppIntentby analyzing the GUI related sequence which leading to thetRansfer of user's data.±pAt from ±ndroid OS, Jang et al. [11] present BitShred,a system for large³scale malware analysis based on similarity.They clustered di¸erent malwAe into different families withinstRuction opcode sequences as malware's feat¹re. To speedup, they utilized feature hashing to decrease the dimensionsof malwAe feat¹re vector. ±t last, co³clustering techniqueswas used to visualize the relation between di¸erent malwarefamilies and instructions. Bayer et al. [12] built the proFleof program ºrough dynamic analysis. ±»er building theproFle, they used clustering techniques to help to identify thesimilar programs. Gao et al. [13] introduced Bin¼unt to Fnddifferences between binary programs, which was based ona graph isomo½hism techniques between program's contRol·ow graphs.III.MOTIVATIONThe explosive growtH of applications has led to the in-creasingly occu¾ence of malwares in ±ndroid mAket¿laces,especially tHird³pAty mAket¿laces because of less contRolon security. ±mong all kinds of malicious apps announcedby Google, several of tHem concealed themselves as populAand legitimate apps already existed in mAketplaces. Thegraphical user interface between tHe announced malware andtHe legitimate and popular app is extRemely similar so tHatsome ±ndroid users usually can not differentiate betweentHem easily. This phenomenon gradually becomes a commonfraud means for malwAes.We were motivated by this situation initially and wantedto put forward tHis problem and its solution. ±ccordingto this topic, an approach on tHe detection of graphicalinterface similarity between apps is crucial, which is alsoa main part in this paper. ±»er tÀing deep consideration onthis problem, we believe tHat graphical interface similaritydetection can be applied in several occasions as follows:•Repackaged application:repackaged application wasa growing concerÁ in ±ndroid marketplaces. It is notdifFcult for some malicious developers to decompileexisted ±ndroid applications, modify the code of thisapplication to Fnish some malicious activities and resignfor this code using some sign tools such as auto³sign.¶or tHe repackaged apps, developers usually will notmodify apps' originally graphical user interface, whichresult in tHe fact tHat not only the graphical interfacebut also the code related to graphical interface is neAlythe same as original app.•Similar appearance bNt not repackaged application:except repackaged app, some malicious developers maymimic a popular app as well. They re³design tHe codeabout graphical interface and make tHeir malwAes looklike some populA apps. ¶or tHis case, the code aboutgraphical interface between them Ae usually not thesame but the appearance between them is similA.•Plagiarism:for some developers, tHey may pirate thegraphical interface of otHer well³designed apps, whichis also a poor behavior in application developing. Ourgraphical interface similarity detection system could beapplied in solving this problem as well.We list ºree occasions for tHe application of our GUIsimilarity detection system above to state tHat the effectof GUI similarity detection occupy a position in ±ndroidmalware detection. We will illustRate our design in tHe nextsection.IV.DESIGNOur system is designed to detect tHe GUI similar appli-cations in ±ndroid marketplaces. In this section, we providethe overview of our design and have a detailed analysis inseveral steps.A.OverViewThe overview of our system is shown in ¶igure 1. Our sys-tem consists of 3 steps to compAe the similarity of di¸erent±ndroid applications, including application preprocessing,feat¹re extraction and similarity comparison. The detailedprocess of tHese steps is explained as follows.B.Application PreprocessingWe have to preprocess the apk Fle if we want to extRact thegraphical interface related information of tHis app. ±pk Fleis ±ndroid's executable Fle. It is a ZIP compressed formatpackage containing following Fles: (1) dex Fle which is theFle storing Dalvik bytecode compiled Âom Java source codein some certain format (2) resource directory which containspicture resource directory, application graphical interface324
Fig. 1.System Overviewlayout and some other confg fle. Most oF the fles intHis directory are in xml Format (3) META-IN± directorywhich owns the signature oF tHe code fles and the autHor'scertifcate.±rom tHe brieF intRoduction oF apk fle above, it is obvioustHat we should extRact the GUI related fles in resource direc²tory. In our paper, For simplicity, we only consider tHe codedescribing GUI elements in xml fles but not in Java codebecause it is rarely that developers use Java code to designinterFace when developing Android applications. ³owever,xml fles in resource directory are compiled into bin´yFormat when packaging to apk fle. To extract inFormationFrom tHis part, we can use aapt (storing in Android SDK'stool directory) to restore original xml resource fles.cFeature ExtrActionThrough the preprocessing step, we can already obtain tHexml resource fles From an application's apk fle. In order tocomp´e diFFerent app's GUI similarity, in this subsection, weextract GUI related Features oF an app and express it in Fea²ture vector Format For later similarity comparison. ±ormally,we represent the dataset oF tHe Android applications witH(a1' a2, ... , an)denoted byA.±urthermore, screen element,which coµesponding to layout.xml fle in an Android app,is denoted ass.So we can represent the applicationaas(Sl'S2,···, sm).Our scheme is designed to compare the GUI simil´ity oFtwo Android applications ¶om two vital aspects: text andimage. In detail, it contains a set oF Features related to tHevisual e·ect oF a screen, which includes•each text showing in the screen witH several diFFerentattributes, denoted as symbolt•each occuµing visible image with several diFFerentattributes, denoted as symboliAs the symbols denoted above, a screenScould be writtenas(t1' t2,"" tp, i1, i2, ... , iq)•Text elementIn this paper, we extract 5 Feat¸res to representtext elements:1) tHe content oF tHe text element2) tHe color oF the text element3)t¹e bacºg»ou¼d coLo» o½ t¹e text eLe¾e¼t4) the size oF the text element5) the style oF tHe text element³ere, to extract the content value oF the text element, weshould scan tHe attRibute "android:text" to get its value. IFthe value is not cited to stRings.xml, we can directly extRactits content. OtHerwise, we have to index stRings.xml fle toget tHe corresponding value. ±or tHis Feature, we denote it as¿. To obtain text element's color attribute, we should tuÀto scan attRibute "android:textColor" For help. IF necessary,we also should index color.xml For detailed color value,which we denote it as Á. Similarly, we can acquire thenext three Features From attRibute "android:background", "an²droid:textSize" and "android:textStyle" respectively, which isnoted as symbol1, i4andi5'Image elementSimilarly, to well compare the similarityoF two images, we extract 3 Features to represent imageelements:1) the size oF the image element2) tHe color histogram oF tHe image element3) tHe 2D ³aar Wavelet tRansFormation oF tHe imageelement [14]To collect all images showing in the screen, we haveto scan all layout.xml fles to fnd related image at²tribute. AÂer getting images, we can easily get its height,widtH and its size (Ã6). Next, to compare each imagesconveniently, we normalize tHe image to size 32x32.With scaled images, we will compute tHe histogram oFthe ÄGB components with 5 histogram cells in our pa²per.Namely,tHe pixel value is divided into 5 areas[0,50], [51, 101]' [102, 152], [153,203]' [204,255].Then, red,green and blue component value is gathered to tHe 5 di·erentregions to have a statistics. So it can ben represented in a 3x5 matrix as Å and we denote the element in this matRix ashist.±inally, ³aar Wavelet transFormation is used to assistcomputing similarity between images. Æhe tRansFormation isexecuted on tHe scaled and gray images. The outPut oF thistransFormation is a 8x8 low-resolution end oF matRix, whichis defned asisin our paper.D. SimilariT between ±pplicationsOnce tHe Features oF the applications have been extRacted,we can use tHem in computing the simil´ity between appli²cations. As tHe Features extracted in tHe last subsection, wecompare all these Features (From ¿ to Ç) to obtain a simil´²ity scorescbetween screens. ±urtHermore, we calculate thesimilarity score both From text element and image element,and get similarity scoreSCtandSCirespectively. ±inally, weusesc=Sc!SCias the fnal result. The detailed similarityFunctionSI MoF text element is shown as Follows.325
Text element•For the content value of tHe text element, which belongsto string type, we de±neSIM(f - f-) ± 1²d(/ti,Jlj)12,1J±max(LengTh(fli),LengTh(flj»in which functiondis the edit distance of two stRings,functionlengt² outputs tHe length of tHe stRing.•The color featUrehis a three-tuple array(r,g,b³,which contains tHe red, green and blue component valueof tHe color. TheSIMfunction is de±ned as:SIM(f - f-´=1²(Iri-rj±+±gi-gj±+±bi-bj±)±"±J3x±55•SimilAly, the background color of tHe text element ishandled with tHe same functionS I Mas above.•For tHe text sizef4'the function is:SIM(f - f-) ± 1²1/4i-/4j±4"4J±max(f4i,J4j)•We view the outPut ofSIMfunction as 1 if the textstyle is tHe same,aotHerwise.It is necessAy to combine all tHese 5 similarity score to³getHer to get tHe score of the similarity degree of the two textelement. ´ere, we enforce diµerent weights( , ±, ±, f'114)for tHese 5 results. We design tHe weights of tHese featuresfrom intUitive understanding. It is obvious the content of tHetext element is of most importance in this problem. Besides,tHe color and tHe background color will also in¶uence thevisual eµect to some extent.To calculateSCT(tHe similarity score of two applications'texts element), we have to propose an algoritHm to computetHe similarity among all texts showing in two applicationsand analyze the most possible correspondence among textelements from two different applications. The concrete algo³rithm for calculating tHe similarity score of two applications'text elements is shown in ·lgorithm 1. ´ere, we assume twoapplicationakand ¸L. The number of text elements in appkispµwhile the number of text elements in applisq.In ·lgorithm 1, we use greedy algoritHm to ±nish thecalculation of text element's similarity score. It calculatestHe similarity score between each text element from appkand each text element from applat ±rst and produces tHematrixMto represent each score. Next, the maximum valueof the matrix is chosen, which represents tHe most similApair of text element in tHese two applications. ·¹er tHat, thesimilarity score in tHe matrix related to this pair is removed.Then we repeat tHis operation to ±nd tHe maximum value intHe matRix until we getmin(p,q¶scores in this compAison.We view the average of these scores as tHe ±nal text elementsimilAity score.Image element•The similarity of size of tHe image is judged as :SIM(f - f-²± 1²IfBi-fB²±6" 6J±max(f6i,J6j)•To compute tHe similarity score of color histogram, 1-norm of matrix is used here.SIM(f - f_·±MAXl<k<5 ³:Rl Ih´sTµ¶k-h´sTF¶k±7"7J³3.±55•The calculation of similarity of ´aar tRansformation isthe same as the handle of histogram, which uses º-normof the matrix divided by the sum to represent.Algorithm1Two apps' text element similarity score calcu³lation algoritHmRequire:Two apps inA±akand ¸L;·pp k's text element set(tk1´ tk±,µµ tkp²;·pp l's text element set(til, t¶±, ··· , t¶p²;Ensure:Similarity score of ·ppakand ¸L1:initialize a matRixM=iwith dimension(p¸ q¶;2:foriin [1, p]do3:forJ in [1, q]do4:M´,j=SIM(tk¸´ tLj²;5:initializescore=0¹maxVal=0¹max Row=0¹maxCol=0º6:forSin [1, min(p, q)]do7:foriin [1, p]do8:forJ in [1, q]do9:ifM´,j>maxValthen10:maxVal=M´,j;11:maxRow=i;12:maxCol=J;13:score+=maxVal14:foriin [max»ow+¼, p]do15:forJ in [1, maxCol-¼]do16:M´,j=M¸+1¹j17±foriin [1, max»ow-¼]do18:forJ in [maxCol+¼, q]do19:M´,j=M¸¹j+120:foriin [max»ow+¼, p]do21:forJ in [maxCol+¼, q]do22:M´,j=M¸+1¹j+123:p=p ² 1¹ q=q ² 1¹max Val=°24:score=scorem·(p,q)25:returnscoreIn the computation of similAity of image element, theweights used in computing two image elements' similAity is±, ¥² ¥.Besides, we also need to use ·lgoritHm 1 to producethe whole result. Finally,sc=¸¹!¸ºiis used to output theG½I similarity score of two ·ndroid applications.V.EV²LU²TION RSULTSIn tHis section, we make use of our design to do an ex³periment and analyze the appearance similar apps occurringin ·ndroid mAketplaces. Besides, diµerent types of thesesimilar apps are given a brief analysis.A. Dataset and ExperimentTo do an experiment for tHe G½I similarity among apps,we selected two ·ndroid mAketplaces - of±cial ·ndroidmAket and eoemAket - to evaluate our method. We collected1,000 apps from ·ndroid MAket and 1,000 apps fromeoemarket to ±nd out tHe most similar apps on these twogroups. In detail, we calculate tHeir similAity score of 1000applications from eoemAket and Google ¾lay respectivelyusing our system and output tHe app pairs based on anappropriate score threshold.326
B.ResultsAfer removing the simila ±pps with s±me ±uthor, we pickup 25 p±irs oF ±pplic±tions ±mong ±ll tHese ±pps. AmongtHese 25 p±irs oF ±pps (including net counter ±pplic±tions,guit±r ±pplic±tions ±nd some picture ±pplic±tions), ±ll oFtHem come From tHe tHird-p±rty maketpl±ce, which showstH±t tHird-paty m±rketpl±ces cont±in some such ±ppea±ncesimila ±pps but it is not so common in oF²ci±l maketpl±ce.It is not surprising bec±use Google h±ve ± much bettercontRol on such potenti±l problem±tic ±pplic±tions tH±n tHird³p±rty stores. Besides, ±fer our m±nu±lly check, ±ll oF tHesep±irs oF ±pps belong to rep±ck±ged ±pps with extRemelysimil±r ±ppe±r±nce, which shows most GVI simil±r ±pps±re gener±ted From rep±ck±ging ±nd tHe e´ectiveness oF ourmethod.µere we select ± rel±tively suit±ble simila Treshold topick up the ±ppe±r±nce simil±r ±pps. We c±n ±lso decre±se tHesimila threshold to obt±in more simila ±pps, which will le±dto the reduction oF tHe F±lse neg±tive ±nd the incre±se oF theF±lse positive. ¶h±t is, tHe simil±rity score threshold c±n be±djusted to ±d±pt to our requirement. ·or di´erent occ±sions,we c±n select ± suit±ble threshold to ²nd out simila ±pps.·or these 25 simila p±irs oF ±pps, we c±n cl±ssiFy tHeminto two m±in cl±sses. One is the ±pps witH ±d SDK inserting±nd modiFying, tHe otHer is tHe ±pps witH m±licious code±dding to origin±l ±pps. ¶he brieF ±n±lysis is shown in theFollowing pa±gr±ph.Apps with ad SDK inserting or modifying:Most oFtHe 25 p±irs oF ±pps belong to this cl±ss th±t some ±pps±dd some new ±d SDK or modiFy tHe pre-existing v±lue oF±d publisher's id to tHe origin±l ±pps. Developers c±n e±sily±dd ±d SDK by only modiFying AndroidM±niFest ²le ±ndFew l±yout xml ²les. ·or ex±mple, in our experiment, one±pp ±bout net counter did not ch±nge ±ny VI-rel±ted codebut ±dd ±d libr±ry C±see ±s shown in ¶±ble I. We improveour simil±rity score result by m¸ing rules th±t don't consider±dvertising rel±ted VI elements, which h±ve been mentionedin section 4. ·or the reFerence oF ±d SDKs, the revenuegener±ted From ±dvertising prompts developers rep±ck±gingsome ±pplic±tions ±nd ±dding ±d SDK or modiFying the pre³existed publisher's id v±lue.TABLE IINSERTiNG ADSDKrEPaCKaGED Appcom.hymobi.netcounter.apk AndroidManifest.xml<manifest android:versionCode- "12 "android:versionName= "1.2 "package= "com.hymobi.netcounter " ... ><application android:label= ... ><meta-data android:name="com.casee.adsdk.siteId"android:va±ue=²A90AB7BD26D20274F78D8B622DOBDB67²/ >< / application>< / manifest>Apps with malicious code inserting:As For tHe m±liciouscode insertion, we will give ±n ex±mple on one oF the p±irsoF GVI simil±r ±pps in our result set -com.codingc±vem±n.Solo.±pk±ndcom.power.SuperSolo.±pk. ·or tHese two ±pps, the Formeris ± benign ±pp ±nd the l±tter is ± m±licious one belongingto F±milies oF DroidDre±m. Afer our ±n±lyzing on thism±lw±re's code,we ²ndtH±t this m±lwae invokes ±service com.±ndroid.root.Setting which ²nish tR±nsFerringIMEI ±nd IMSI with xml Form±t to ± remote server±s shown in ¶±ble II.¶he remote ur¹ c±n be got ±shttP://184.105.XXX.ºX:8080/GMServer/GMServlet±Fter± simple encryption. Afer tH±t, it will stat to ²nd w±ysto root tHe device with two known exploits ±nd l±ter±utom±tic±lly inst±ll ±pp in device's system directory tocomplete the ±tt±cking process. ¶his is ± kind oF ±ppswhich induce users to inst±ll by tHe simil±rity oF gr±phic±linterF±ce between two ±pps. Designing GVI simil±r ±ppsis usu±lly one oF tHe most import±nt w±y to deceive usersdownlo±ding ±nd inst±lling their m±lw±re. ¶hrough our GVIsimil±rity detection, we c±n ²nd out tHe ±ppea±nce simil±rm±lw±re in ±dv±nce to mitig±te tHe in»uence.TABLE IItRaNSFERRiNG INFORMaTiON TO rEMOTE sERVER±Reques²³<Protocol> 1.0</Protocol><Command>O< / Command><ClientInfo><P´tner>502< /Partner><ProductId>µ0023</ProductId><¶EI>Xx</¶EI><¶SI> Xx < /IMSI><Modle> Xx < /Modle>< / ClientInfo></Request>¼I. C½N¾LU¿À½N¿In this p±per, we provide ±n ±ppro±ch b±sed on GVIsimil±rity between ±pplic±tions to detect pir±ted ±nd m±li³cious Android ±pps. We believe th±t some m±licious Android±pplic±tions h±ve ± high GVI similaity witH pre-existed ±ppsin maketpl±ces bec±use the m±lw±re will induce users toinst±ll by their ±ppe±r±nce similaity. Besides, pl±giaismswhich do not modiFy much code will ±lso be ±ppe±r±ncesimil±r with tHe origin±l ±pps. ¶o output tHe simil±rityscore between ±pps, we design ±n ±ppro±ch with 3 steps:preprocessing, Fe±ture extr±ction ±nd simil±rity comp±rison.At ²rst, we decompile tHe ±pk ²le to obt±in tHe rel±tivecode ±nd xml ²les in tHe preprocessing step. ·urthermore,sever±l Fe±tures ±bout text element ±nd im±ge element aeextR±cted ±s the Fe±tures oF ±pp's ±ppe±r±nce. At the end, wedesign di´erent dist±nce Function For di´erent Fe±tures ±nduse greedy ±lgoritHm to output tHe ²n±l GVI similaity score.We ev±lu±te our design with 2 groups' ±pplic±tions -1,000 ±pplic±tions From tHe oF²ci±l Android M±rket ±nd1000 ±pplic±tions Áom third-p±rty Android maketpl±ce. ¶heresults shows th±t there ±re 25 p±irs oF ±pps witH highsimil±rity scores ±nd di´erent ±utHors. Among these selected327
apps, some of them add or modify tHe ad SDK compAed witHtHe original apps to obtain revenue and otHers may insertsome malicious code to fulFll malicious activity. ±or tHesekinds of apps, our approach can mitigate the privacy andmalware problem e²ectively.REFErENCES[1] http:/www.appb±ain.comstats²³uMBe±-of-A³d±oId -APPsl[2] https:/www.MyLookout.comMoBILe-t´±eAt-±ePo±t[3]WZhou, Y. Zhou,X.Jiang, anD µ. Ning, "¶±oiDMoss: ¶·t·cting±·packag·D sMa±tphon· appLications in thi±D-p¸ty anD±oiD M¸k·t¹pLac·s, º inPrceedings of the 2nd ACM ConfeRence on Data andApplication SecuRiTy and PRivacy,2012.[4] S. »i, "¶·t·ction o¼ siMiLa±ity aMong anD±oiD appLications, º T·ch. r·p.UC½¾EECS-20¿2-¿¿¿, UC ½·±k·L·y, T·ch. r·p., 2012.[5] À. ¶·snos, 'ÁnD±oiD: Static anaLysis using siMiL¸ity Distanc·; in201245th Hawaii Inte±ational ConfeRence on System Sciences.ÃEE,20¿2, pp. 5394Ä5403.[6]I±½u±gu·±a, U. Zu±Åtuza,anDS. NaDjM-T·h±ani, "C±owD±oiD:b·havio±-bas·D MaLw¸· D·t·ction syst·M ¼o± anD±oiD, º inPrceedingsof the 1st ACM woRkshop on SecuRiTy and pRiva² in smaRt phones andmobile devices.ÀCÆ, 2011, pp. 15Ä26.[7] À. Shabtai, U. Kanonov,YELovici, C. GL·z·±, anDYW·iss, "anD±o¹MaLy: a b·havio±aL MaLw¸· D·t·ction ¼±aM·wo±k ¼o± anD±oiD D·vic·s, ºJou±al of Intelligent Info³ation Systems,voL. 38, no. 1, pp. 161Ä190,2012.[8] Y. Zhou, Z. Wang, W. Zhou, anDX.Jiang, "H·y, you, g·t oÇ o¼ MyM¸k·t: ¶·t·cting MaLicious apps in oÈciaL anD aLt·±nativ· anD±oiDMa±k·ts, É inPrceedings of the 19th Annual NeTwoRk and DistRibutedSystem SecuRiTy Symposium,2012.[9] W. Enck, µ. GiLb·±t, ½.G. Chun, ». µ. Cox, J. Jung, µ. Æc¶ani·L,anD À. N. Sh·th, "TaintD±oiD: an in¼o±Mation Êow t±acking syst·M ¼o±±·aL-tiM· p±ivacy Monito±ing on sMa±tphon·s, ºCommunications of theACM,voL. 57, no. 3, pp. 99Ä106, 2014.[10] Z. Yang, Æ. Yang, Y. Zhang, G. Gu, µ. Ning, anDX.S. Wang,"Àppint·nt: ÀnaLyzing s·nsitiv· Data t±ansMission in anD±oiD ¼o±p±ivacy L·akag· D·t·ction; inPRoceedings of the 2013 ACM SIGSACco´eRence on ComputeR&communications secuRiTy.ÀCÆ, 2013,pp. 1043Ä1054.[11] J. Jang, ¶. ½±ÅML·y, anD S. V·nkata±aMan, "½itsh±·D: ¼·atű· hashingMaLw¸· ¼o± scaLabL· t±iag· anD s·Mantic anaLysis, º inPrceedings ofthe 18th ACM confeRence on ComputeR and communications secuRiµ.ÀCÆ, 2011, pp. 309Ä320.[12] U. ½ay·±, µ. CoMpa±·tti, C. HLausch·k, C. ˱u·g·L, anD E. Ki±Da,"ScaLabL·, b·havio±-bas·D MaLwa±· cLust·±ing, º inNetwoRk and Dis-tRibuted System SecuRiTy Symposium (NDSS).Cit·s··±, 2009.[13] ¶. Gao, Æ. r·it·±, anD ¶. Song, "½inhunt: ÀutoMaticaLLy ÌnDings·Mantic DiÇ·±·nc·s in bina±y p±og±aMs, ºInfo³ation and Commu-nications SecuRiµ,pp. 238Ä255, 2008.[14] À. Nats·v, r. rastogi, anD K. ShiM, "WaL±Ås: À siMiLa±ity ±·t±i·vaLaLgo±ithM ¼o± iMag· Databas·s, º inACM SIGMOD ¶ecoRd,voL. 28,noÍ 2ÍÀCÆÎ 1999Î ppÍ 395Ï406Í[15] http:/www.cas··.cn328