Technical Transactions iss. 2. AutomaticControl iss. 1-AC · Document content will be described...

121

Transcript of Technical Transactions iss. 2. AutomaticControl iss. 1-AC · Document content will be described...

  • TECHNICALTRANSACTIONS

    AUTOMATIC CONTROL

    CZASOPISMOTECHNICZNEAUTOMATYKA

    ISSUE 1-AC (2)YEAR 2013 (110)

    ZESZYT 1-AC (2)ROK 2013 (110)

    Chairman of the Cracow University of Technology Press

    Editorial BoardJan Kazior

    Przewodniczący Kolegium Redakcyjnego Wydawnictwa Politechniki Krakowskiej

    Chairman of the Editorial Board Józef GawlikPrzewodniczący Kolegium Redakcyjnego Wydawnictw Naukowych

    Pierwotną wersją każdego Czasopisma Technicznego jest wersja on-linewww.czasopismotechniczne.pl www.technicaltransactions.com

    © Politechnika KrakowskaKraków 2013

    Automatic Control Series Editor Piotr Kulczycki Redaktor Serii Automatyka

    Scientific Council

    Jan BłachutTadeusz BurczyńskiLeszek Demkowicz

    Joseph El HayekZbigniew Florjańczyk

    Józef GawlikMarian Giżejowski

    Sławomir GzellAllan N. HayhurstMaria Kusnierova

    Krzysztof MagnuckiHerbert Mang

    Arthur E. McGarityAntonio Monestiroli

    Günter WoznyRoman Zarzycki

    Rada Naukowa

    Section Editor Dorota Sapek Sekretarz SekcjiTypesetting Anna Basista Skład i łamanie

    Cover Design Michał Graffstein Projekt okładki

  • Automatic Control series

    Editorial Board

    Editor-in-Chief: Piotr Kulczycki, Poland

    Editorial Board: Krassimir Atanassov, Bulgaria

    Valentina Bălaş, Romania Tadeusz Burczyński, Poland

    Antoni L. Dawidowicz, Poland Józef Gawlik, Poland

    Aleksander Gegov, England Abdelali El Aroudi, Spain Janusz Kacprzyk, Poland

    Jerzy Klamka, Poland Erich P. Klement, Austria László T. Kóczy, Hungary

    Heikki Koivo, Finland Józef Korbicz, Poland

    Zdzisław Kowalczuk, Poland Krzysztof Kozłowski, Poland Krzysztof Malinowski, Poland

    Radko Mesiar, Slovakia Andrian Nakonechnyy, Ukraine

    Witold Pedrycz, Canada Leszek Rutkowski, Poland

    Dominik Sankowski, Poland Václav Snášel, Czech Republic Ryszard Tadeusiewicz, Poland

    Lipo Wang, Singapore Rafael Wisniewski, Denmark

    Executive Editors: Dominika Gołuńska, Poland

    Piotr A. Kowalski, Poland Szymon Łukasik, Poland

  • * SzymonŁukasik, Ph.D.,Department ofAutomaticControl and InformationTechnology, Facultyof Electrical and Computer Engineering, Cracow University of Technology; Systems ResearchInstitute,PolishAcademyofSciences.

    ** Marcin Haręza,Marcin Kaczor, Department ofAutomatic Control and Information Technology,FacultyofElectricalandComputerEngineering,CracowUniversityofTechnology.

    SZYMONŁUKASIK*,MARCINHARĘZA**,MARCINKACZOR**

    DOCUMENTCONTENTMININGFORAUTHORS’IDENTIFICATIONTASK

    EKSPLORACJATREŚCIDOKUMENTÓWWPROBLEMIEIDENTYFIKACJIAUTORÓW

    Ab s t r a c t

    This paper dealswith automatic authorship attribution throughdocuments content analysis.Thisapproachisbasedonselectingsetsofsuitablefeaturesrelyingonspecificuseofgrammar,punctuation or vocabulary and in the next step – executing given classification algorithm.Thecontributionfirstoverviewsvarious textcharacteristicswhichcanbeemployedfor thatpurpose, then presents the results of experiments involving feature selection and examinesclassifierperformanceforauthoridentificationproblem.Thepaperconcludeswithdiscussionandproposalsforfurtherresearch.

    Keywords: author identification, feature selection, classification

    S t r e s z c z e n i e

    Przedmiotemniniejszegoartykułujestproblemidentyfikacjiautoranapodstawieanalizytreścidokumentów.Podejścietoopierasięnawyborzeodpowiednichcechzwiązanychzespecyficz-nymużyciemstrukturgramatycznych,interpunkcjiorazsłownika,anastępnie–użyciewybra-negoalgorytmuklasyfikacji.Wartykuleprzedstawiononajpierwróżnecharakterystykitekstu,któremogąbyćużytewomawianymzagadnieniu,anastępniezałączonowynikieksperymen-tówobliczeniowychobejmującychwybórcechibadanieskutecznościklasyfikacjiwproble-mieidentyfikacjiautorów.Artykułpodsumowanownioskamiorazpropozycjamidalszychpracwrozważanejtematycebadawczej.

    Słowa kluczowe: identyfikacja autora, wybór cech, klasyfikacja

    TECHNICAL TRANSACTIONSAUTOMATIC CONTROL

    1-AC/2013

    CZASOPISMO TECHNICZNEAUTOMATYKA

  • 4

    1. Introduction

    Author identification is a taskcommonlyperformed inhistorical research, archeologyandcriminology.Historicallyitwaspredominantlyconsideredinthecontextofhandwrittentexti.e.takingintoaccountauthor’swritingstyle.Nowadays,asmostofdocumentsarebeingstoredintheirelectronicversions,itisimpossibletocompletetheidentificationprocedureemploying solely graphical features of the text.Moreover, novel types of textual contentlikecomputerprogrammingsourcecode,posenewproblems,byexcludingthepossibilityofusinggraphicaltextrepresentationintheframeworkofautomaticauthorshipattribution.

    Thetaskofselectingauthorofagiventextfromaknownlistofauthorscanbeperceivedasaclassificationorpatternrecognitionproblem.Itisacommonlyknownissueindatamining[23],withabroadrangeofareaswhereittranspires,e.g.biometrics,medicaldiagnosticsorintrusiondetectionsystems [6,14].Classification isa taskofassigningelements fromsocalledtestingset,denotedbymatrixY:

    Y Y Y Yn= [ ]1 2 ... (1)

    whichncolumnsrepresentfeaturesofmtestobjectsbelongingtothisdataset,tooneoftheknownC classes.Usuallyasetofrepresentativeelementsforthoseclassesisadditionallygiven,intheformofatrainingdataset:

    X X X X n= [ ]1 2 ... (2)

    withmtrainelementshavingclasslabelsexplicitlydefined.Thetaskforaclassifieristolearnhowtopredictclassassignmentfortestingdatasetusingknowledgeacquiredfromtrainingset.Incaseoftheauthorsidentificationtasks,classlabelcorrespondstoauthor’sidentifier,trainingsettoasetofdocumentswithknownauthorsandtestingsettoagroupofdocumentswhich authorship needs to be identified. To successfully perform such tasks one shouldselectcapableclassificationalgorithmand,evenmoreimportantly,definesuitabledocumentrepresentation,intheformofndistinctivefeatures.

    Thispaperinvestigatesapossibilityofemployingvariousnon-graphicalsetsoffeaturesfor author identification task.Among otherswe take into account presence of distinctivegrammaticalstructures,quantitativeanalysisofpartsofspeech,anddiversityofwordsorpunctuation. It is usability of aforementioned features that is experimentally assessed forselectedauthorsidentificationtask.

    Thereexistsavastamountofstudiesintheareaofthiscontribution,howeverusuallynot- -involvingsuchabroadrangeofcharacteristicsbeingexaminedatthesametime.Previousworkinthisareainvolveusing:lexicalfeatures(e.g.functionalwords),characterfeatures,including alphabetic or digit characters count, uppercase and lowercase characters, letterfrequencies,etc.[5],syntacticfeatures[11],semanticfeatures[1]andapplicationspecificcharacteristics, like the use of greetings, signatures, etc. [24]. The problem of languagespecificissues isalsowidelystudied[4]. Interestingapplicationsofauthorshipattributionincludemicrobloggingpostsauthoridentification[15],genderrecognition[3]orcombiningauthor classification with opinionmining [18].Accomplished surveys of techniques andstrategiescommonlyemployedforauthorshipattributiontaskscanbefoundin[9,13,22].

  • 5

    The paper is organized as follows. First, we give a detailed description of documentscharacteristics, useful for their representation in authorship attribution task. It is followed byexperimentalsetup,employedtoassesstheefficiencyofclassificationalgorithmusedwithvariousdocumentsfeaturessets.Finally,thediscussionandproposalsforfurtherresearcharegiven.

    2. Document content representation

    Documentcontentwillbedescribedherebyfivegroupsoffeaturescorrespondingtothefollowingcharacteristicsofatext,giveninitscomputer-writtenrepresentation:a) document grammatical composition, represented by features’ groupGR, composed of

    R+9characteristicsg1,g2,...,gR+9withRbeingaparameterrelatedtothesizeofanalyzedsetofgrammaticalstructures;

    b) usedpunctuationmarks,representedbyfeaturesgroupPwhichincludes16characteristicsp1,p2,...,p16;

    c) wordslength,givenbyfeaturesgroupL : l1,l2,...,l4;d) documentformattingdefinedbyfeaturesgroupF : f1,f2,...,f5;e) wordsusagedescribedbyfeaturesgroupV : v1,v2,...,v9.

    Thesamesetoffeaturescanbeusedtocharacterizenumerousdocuments–belongingtoboth,trainingandtestingdatasets.Albeit,adocumentDdescriptionmightbeforexamplegivenbyD = {GR,V}–itwouldmeanthatitischaracterizedonlybyindicatorsrelatedtogrammaticalcompositionandusedwordsstatistics.Thestudyofapplicabilityofdifferentsetoffeaturesforauthor’sidentificationtaskwillbeconductedinthenextSectionofthepaper.Wewillnowprovideadescriptionoffeatureswhichwereassignedtogroupslistedabove.

    2.1.Documentgrammaticalcomposition

    Grammarisalinguisticconceptconcerningboththeshapeofwordsandhowwords(andphrases)canbecombined[2].Grammar,inessence,consistsoftwocomponents:morphology,i.e.studyofhowwordsareformedoutofsmallerunits(morphemes)andthesyntax,whichisasystemofrulesspecifyinghowlexicalitemsoughttobecomposedtogether[21].Ofthosetwo,whenconsideringwritingstyleanalysis,thesyntaxisofmoreimportance.

    Toanalyzesyntacticstructureofsentencesoneshoulddefineanotationtoconvenientlyrepresentcontentofagiventext.Treestructureiscommonlyusedforthatpurpose.Itsbranchnodes represent non-terminal syntactic symbols (with sentence as a root) and leaves areequivalent to lexical tokensof thesentence.Toobtainstructure in thisformtextneeds tobeparsed.This language-dependent taskisoneof themost important innatural languageprocessing.Here,weassumethatanalyzedtextcorpusisparsedwithpopularStanfordparser[12].

    Toidentifyapresenceofselectedtreecomponents,weestablished256mostcommonlyusedsyntacticstructuresinEnglish,extractedfromtheWallStreetJournalcorporacontainedbyPennTreebankproject[19].Itcanbefoundonthewebsite[16].Onthatbasis,afeatureset:

    g g gR1 2, , , (3)

  • 6

    wascreated.EachcharacteristicfromthissetindicateshowmanytimestopRstructuresfromtherankingwereusedintheanalyzeddocument.Forexampleg1correspondstothenumberof“preposition+nounphrase”structuresfoundinthetext.Ingeneral,forrankingsizeRonecanassumeanyvaluebetween1and256.

    Set of featuresmentioned above corresponds to the presence of selected grammaticalconstructs in analyzed document.Next attributegR+1 describes concentration of syntacticelements listed in the ranking. Let N gR ii

    R==∑ 1 denote occurrence of constructs from

    theselectedsetofRgrammaticalstructuresinthegiventextandletNGtoindicateoverallnumber of structures pointed out by syntactic parser for the analyzed document. Then gR+1canbewrittenas:

    gNNR

    R

    G+ =1 (4)

    SubsequentattributesgR+2,gR+3,...,gR+9describetheoccurrenceoftheindividualpartsofspeechinthetext.HeregR+2correspondstotheincidenceofnouns, gR+3 –pronouns,gR+4 –adjectives,gR+5–verbs,gR+6–adverbs,gR+7–prepositions,gR+8–conjunctionsandgR+9–interjections.

    2.2.Documentpunctuation

    Punctuation is a practice of inserting standardized signs to clarify themeaning andseparatelanguagestructuralunits[20].Amongfourteenmostcommonlyusedpunctuationmarks in English, at parsing stage the following symbols are identified here: full stop,exclamation mark, question mark, comma, semicolon, colon, apostrophe, quotationmark, ellipsis, dash, hyphen, slash plus additionally multiple exclamation marks andmultiplequestionmarks.UseofthosesignsisindicatedbyfeaturesgroupPformedof16characteristics.

    Thefirstattributep1,denotedby:

    pNN

    P

    C1 = (5)

    representsnumberofpunctuationmarksNplistedabovefoundinthetext,dividedbytotalnumberofcharactersNc.

    Secondfeaturefromthissetp2describesvarietyofpunctuationmarks,byemployingthefollowingformula:

    p NP2 =* (6)

    where NP* symbolizesoverallnumberofdifferentpunctuationsignsfoundintheanalyzed

    document.Finally, featuresp3,p4, ...,p16 refer to thenumberof timeseachofpunctuationmarks

    listedabovewasusedinthetext,scaledbytotalnumberofpunctuationmarksNp.

  • 7

    2.3.Documentwordlengthstatistics

    FeaturesgroupLaimsatcapturingauthors’tendencytouseshortwords,longwordsoringeneral–wordsofapproximatelysimilarlength.LetusdenotebySasetofallsentencesinthetext,andbyWsasetofallwordsformingagivensentences ∈ S.Consequently,bysize(w)wherew ∈ Wsweunderstandlengthofaselectedwordwwhileusingcard(S)andcard(Ws)atthesametimetorepresentcardinalitiesofbothsetsdefinedabove.

    Firstfeaturel1introducedhereisreferringtotheaveragewordlengthinthesentencesformingthetext,andcanbewrittenas:

    l

    w

    WS

    w W

    ss S

    s

    1 =

    ∑∑

    size

    cardcard

    ( )

    ( )( )

    (7)

    Secondfeaturel2describesaveragewordlengthinthedocumentandisrepresentedby:

    l

    w

    Sw Ws S

    s S

    s2 =

    ∈∈

    ∑∑∑

    size

    card

    ( )

    ( ) (8)

    Twootherfeaturesfromthisgrouprefertothenumberofshortwords(consistingoflessthan4characters)andlongwords(madeupofmorethan6characters).Bothattributesarerelativetothesumofwordsusedinthetext,andcanbewrittenasfollows:

    lW

    W

    ss S

    ss S

    3 =∈

    ∑∑

    card

    card

    short( )

    ( ) (9)

    withW w w W ws sshort size= ∈ ∧

  • 8

    However,mostoftextrepositoriescompriseonlycarriagereturnsoradditionalsoftreturns.Thefollowingsetoffeaturescaptureswriters’individualpreferencestousethoseformattingelementsintheirdocuments.

    The first feature f1 considered here, refers directly to the number of paragraphs, i.e.sectionsofthetextwithfirstlinebeingindented.Nextattributef2capturesaveragenumberofsentencesincludedinoneparagraph,whichcanbewrittenas:

    f SN2

    =card

    par

    ( ) (11)

    withNparrepresentingtotalnumberofparagraphsintheanalyzeddocument.Features f3and f4characterizeaveragenumberofwordsandcharactersperparagraph.

    Theycanbewrittenasfollows:

    fW

    N

    ss S

    3 =∈∑card

    par

    ( ) (12)

    and

    fNN

    C4 =

    par (13)

    Finally,lastfeaturefromthisgroup,f5describeswriter’stendencytoformatthetextwithemptylines.Itisrepresentedbyrelativenumberofblanklines:

    fNN

    LE

    L5 = (14)

    withNLEbeingtotalamountofemptylinesinthetextandNL–alllinesintheanalyzeddocument.

    2.5.Wordsusestatistics

    Authors usually differ in the sizes and structures of their vocabularies.Therefore theanalysis of vocabulary richness and its concentration could be useful in the context ofauthorshipattribution[7].FeaturesetVbeingintroducedhereaimstoquantitativelyexpressthosecharacteristics.

    Firstfeatureunderconsiderationv1refersdirectlytothenumberofdistinctwordsusedinthetextNv.Fivefeatureswhichfollow,employtheconceptofhapax legomenon(gr.saidonce)thatiswordsthatonlyoccuronceinthetext[10]anddis legomenon–wordsappearingtwice. Let us denote byNhl,Ndl number ofhapaxes found in the document. First five ofaforementionedfeaturesarenowdefinedasfollows:

    v

    NW

    hl

    ss S

    2 =

    ∈∑card( ) (15)

  • 9

    vNN

    hl

    v3 = (16)

    v

    NW

    dl

    ss S

    4 =

    ∈∑card( ) (17)

    vNN

    dl

    v5 = (18)

    vW

    NN

    ss S

    hl

    v

    6

    10100

    1=

    ∈∑log ( )card

    (19)

    withthelattertwoknownfromtheliteratureunderthenamesofSichel(v5)andHonore(v6)measures[3].

    Definingadditionaltextcharacteristics–Nilrepresentingnumberofwordsoccurringinthetextpreciselyi-times,allowsustoformulatethreelastfeaturesinthisset,recognizedasYule(v7),Simpson(v8)andentropy(v9)measures[3]:

    vW

    N iWs

    s S

    ili

    N

    ss S

    v

    74

    1

    2

    10 1= − +

    ∈=

    ∈∑ ∑ ∑card card( ) ( )

    (20)

    v Ni

    Wi

    Wili

    N

    ss S

    ss S

    v

    81

    11

    =−

    −=∈ ∈

    ∑ ∑ ∑card card( ) ( ) (21)

    v Ni

    Wi

    Wil ss S

    i

    N

    ss S

    v

    9 101

    = −

    ∈=

    ∈∑∑ ∑

    log( ) ( )card card

    (22)

    3. Experimental studies

    Experimentsperformedweredesignedtoevaluateusefulnessoffeaturessetslistedaboveforauthorshipattributiontasksandtostudyperformanceofclassifiersemployingcarefullyselectedgroupsofattributes.

    ExperimentalstudieswereconductedusingThomsonReutersTextResearchCollection(known as TRC2 dataset). The dataset includes almost 2 million news reports collectedbetweenJanuary2008andFebruary2009.Wehavechosenrandomlydocumentsauthored

  • 10

    by 20 writers (around 100 contributions each). The first part consisting of 10 writerscontributionswasusedforfirstpartofexperiments–involvingfeaturesetevaluation,andwillbereferredtoasthetrainingdataset.Therestofexperimentaldata(labeledasthetestingdataset)wasemployedinthesecondpartofnumericalstudies,morethoroughlyinvestigatingtheperformanceofselectedclassificationalgorithms.

    As classifiers Naïve Bayes, feed-forward Neural Network with back-propagationlearning,classick-NearestNeighbor(withk=3)andRandomForestswerechosen,withthelattertwoonlybeingemployedinthelastpartofthisstudy.Optimalvaluesofparametersforinvestigatedtechniquesweredeterminedthroughasetofpilotruns.Formoredetailsonthosealgorithmsonecouldreferto[6].

    3.1.Featuresetselection

    First we examined usability of features listed in the previous Section for authorshipattribution.ForthatpurposeSequentialForwardSearch(SFS)orSequentialBackwardSearch(SBS)algorithmwereexecuted,withk-nearestneighborclassifierbeingusedforevaluation.Bothtechniquesconstitutesimilarsupervisedfeatureselectionparadigms[17].Firstalgorithmstartswithemptycandidatefeaturesubsetandateachiterationaddsthefeaturewhichmaximizesclassifieraccuracy.SBSistheoppositestrategy–itstartswiththeentiresetofavailablefeatures,and then iterativelyremovesonefeatureat the time,as longasaforementionedmeasureofaccuracy improves,where the feature removedmaximizes this improvement [8].Here, thediscriminativepowerofagivenfeaturesetwasdeterminedthroughcross-validation.

    FeatureselectionalgorithmswereexecutedwithR=256andfivedifferentschemes: – Basic Feature Selection (BFS) – feature selection is performed on all groups ofcharacteristicslistedinSection2.

    – TwoPhaseFeatureSelection(TPFS)–featureselectionalgorithmisexecutedonnon-grammaticalcharacteristics(allbutGR).ObtainedreducedfeaturesetismergedwithGR andfeatureselectionisperformedagain.

    – ParallelFeatureSelection(PARFS)–featureselectionisconductedinparallelonnon-grammaticalandgrammaticalcharacteristics,obtainedreducedfeaturesetsaremergedafterwards.

    – Grammatical Feature Selection (GFS) – feature selection is conducted only oncharacteristicsfromthegrammaticalranking.

    – Non-Grammatical Feature Selection (NGFS) – feature selection is executed only oncharacteristicsnotlistedingrammaticalranking.Table1illustratesanumberoffeaturesobtainedfordifferentvariantsoffeatureselection.

    Itisevidentlyobservedthatforwardsearchprincipallyresultsinmorecompactreducedsetofcharacteristics,whichisnaturallycausedbylocalsearchnatureofthisalgorithm.Individualcharacteristicsmostfrequentlychoseninexperimentsinvolvingdifferentfeatureselectionstrategies, for both, features listed (top ten) and not listed (top ten) in the grammaticalrankingareshownonFig.1.Itcanbeobservedthatmostusefulfeaturesarefoundwithingrammaticalranking,especiallybetweenfeaturesg20andg100.Furthermorefeaturesdescribingincidenceofpronouns(gR+3),adverbs(gR+6)andprepositions(gR+7)appearedfrequently inthereducedfeaturesets.Amongnon-grammaticalcharacteristicsratioofpunctuationmarksandoccurrenceofexclamationmark,dashandhyphenwerefoundparticularlyimportant;howeveringeneralselectionratioforthosefeaturesseemstobesignificantlylower.

  • 11Ta b l e 1

    Obtained features set sizes

    Feature selection scheme

    Sequential feature selection

    Sequential Forward Search (SFS)

    Sequential Backward Search (SBS)

    BFS 20 222TPFS 26 194

    PARFS 37 240GFS 24 233

    NGFS 13 7

    Fig.1.Tenmostfrequentlychosencharacteristicsforallvariantsoffeatureselectionandfeatureslistedandnotlistedinthegrammaticalranking

    Rys.1.Dziesięćnajczęściejwybieranychcharakterystykdlawszystkichwariantówalgorytmuwyborucechicechzispozarankingugramatycznego

    3.2.Performanceanalysis

    Nextseriesofexperimentswasdevotedtostudyingperformanceofselectedclassifierswithdifferentfeaturesetsobtainedthroughfeatureselectionandvaryingsizeofgrammaticalranking(whereapplicable).Thetrialswereexecutedfortrainingdatasetfirst,todeterminethebest-performingclassifierandthemostappropriatefeatureset,withfiverunsinvolvingcross-validationandaverageclassificationaccuracybeingreportedhere.

    Tables2–5sumupobtainedresultsfork-nearestneighbor,NaïveBayes,feed-forwardNeuralNetworkandRandomForestsclassifiers.

  • 12Ta b l e 2

    Classification accuracy [%] for training dataset and k-Nearest Neighbor classifier

    Ranking Size & Sequential

    SelectionVariant

    Feature selection variant

    R = 32 R = 64 R = 128 R = 256

    SFS SBS SFS SBS SFS SBS SFS SBS

    BFS 42.25 22.50 37.50 32.50 55.25 22.25 56.00 61.25TPFS 33.00 19.50 46.75 39.75 43.50 26.75 53.75 39.50

    PARFS 49.75 53.00 56.25 60.75 61.00 69.25 63.00 70.50GFS 51.00 50.75 51.25 62.00 61.50 69.75 62.75 75.00

    NGFS 38.25 20.25 32.75 22.50 25.00 20.25 38.50 24.25

    T a b l e 3

    Classification accuracy [%] for training dataset and Naïve Bayes classifier

    Ranking Size & Sequential

    SelectionVariant

    Feature selection variant

    R = 32 R = 64 R = 128 R = 256

    SFS SBS SFS SBS SFS SBS SFS SBS

    BFS 37.50 24.00 27.25 22.50 40.50 18.75 39.75 58.75TPFS 27.00 20.00 40.00 29.75 30.00 23.00 41.00 56.25

    PARFS 42.25 42.25 45.75 59.00 48.75 62.25 50.50 55.75GFS 38.00 44.00 41.75 57.75 43.75 62.25 45.50 56.25

    NGFS 38.00 17.00 26.75 19.50 20.75 14.50 36.25 21.50

    T a b l e 4

    Classification accuracy for training dataset and Neural Network classifier

    Ranking Size & Sequential

    SelectionVariant

    Feature selection variant

    R = 32 R = 64 R = 128 R = 256

    SFS SBS SFS SBS SFS SBS SFS SBS

    BFS 41.75 29.25 47.25 46.25 56.75 42.50 45.00 60.50TPFS 27.75 26.25 48.25 41.25 50.00 33.25 48.75 60.50

    PARFS 53.75 42.25 45.00 46.75 50.00 62.75 63.50 60.25GFS 35.00 43.25 41.00 51.00 52.25 56.50 56.60 61.25

    NGFS 38.50 17.00 26.25 11.75 24.25 17.25 35.00 22.25

  • 13Ta b l e 5

    Classification accuracy for training dataset and Random Forests classifier

    Ranking Size & Sequential

    SelectionVariant

    Feature selection variant

    R = 32 R = 64 R = 128 R = 256

    SFS SBS SFS SBS SFS SBS SFS SBS

    BFS 66.00 33.50 57.25 51.50 69.25 58.75 68.00 82.00TPFS 37.75 28.25 61.50 44.25 57.50 47.25 62.25 83.50

    PARFS 62.75 64.75 60.25 69.00 69.25 80.50 76.00 79.25GFS 61.75 63.75 53.75 70.75 71.00 80.50 68.00 80.75

    NGFS 44.25 27.75 36.75 24.25 34.50 23.25 46.25 32.00

    It canbe seen that employingabroad rangeofgrammatical features is crucial forobtaining high classification accuracy.RandomForests classifierwas found to be thebestperformingone.Amongvariousschemesof featureselection themost successfulwereTwoPhase Feature Selection,Basic Feature Selection andGrammatical FeatureSelection.

    Finally selected best-performing classifiers based on Random Forests and k-NearestNeighborweremorethoroughlyevaluatedforbothdatasets.WeusedsetofcharacteristicswithR=256 andSequentialBackwardSearchalongwithTwoPhaseFeatureSelectionorGrammaticalFeatureSelection(fork-NearestNeighbor).Resultsobtainedfor10runsandbothtrainingandtestingdatasets,with5-foldcross-validationareshownonFig.2.

    Fig.2.Classificationaccuracyfortrainingandtestingdatasets

    Rys.2.Trafnośćklasyfikacjidlazbiorówuczącegoitestującego

  • 14

    4. Conclusion

    This contribution examined the possibility of employing various characteristics ofdocuments in computer-written form for authorship attribution.The suitability of severalfeatureswasconsideredforparsedinstancesofnewsreports,takingintoaccountapossibilityofusingfewdiscriminationalgorithmsforconclusiveclassification.Itwasestablishedhere,that for ensuring proper classifier performance the most important ones are those basedonoccurrenceofgrammatical structures.Wealso identified tree-basedclassifiersasmostpromisingintermsofaccuracy–forbothtrainingandtestingdatasets.Ingeneral,authors’classificationforselectedfeaturesandbothinstancesprovedtobereasonablyaccurate.Itcanbenotedthatpresentedapproachcanbeusedfordifferentlanguages–providedthatsuitableparsingprocedurecouldbeconductedbeforehand.

    Furtherwork in the research area of this contributionwill involve employinggeneticfeature set selection. Supplementary experimental studies on effectiveness of variousclassifiersareplannedaswell.Thepossibilityofusingauthorshipattributiontechniquesforidentifyingsourcecodecreatorconstituteslikewiseapromisingareaofupcomingresearch.

    The study is co-funded by the European Union from resources of the European Social Fund. Project PO KL “Information technologies: Research and their interdisciplinary applications”, Agreement UDA-POKL.04.01.01-00-051/10-00.

    R e f e r e n c e s

    [1] ArgamonS.,Levitan S., Measuring the Usefulness of Function Words for Authorship Attribution,Proc.JointConferenceoftheAssociationforComputersandtheHumanitiesandtheAssociationforLiteraryandLinguisticComputing,paperno162,2005.

    [2] BrocciasC.,Cognitive linguistic theories of grammar and grammar teaching,[in:]DeKnopS.,DeRyckerT.(eds.),Cognitive Approaches to Pedagogical Grammar,WalterdeGruyter,Berlin2008,67-90.

    [3] ChengN.ChandramouliR.,SubbalakshmiK.P.,Author gender identification from text,DigitalInvestigation,vol.8,2011,78-88.

    [4] EderM.,RybickiJ.,Do birds of a feather really flock together, or how to choose training samples for authorship attribution,LiteraryandLinguisticComputing/toappear.

    [5] Grieve J.,Quantitative authorship attribution: An evaluation of techniques,LiteraryandLinguisticComputing,vol.22,2007,251-270.

    [6] Hastie T., Tibshirani R., Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction,Springer,NewYork2009.

    [7] Hoover D.L., Another Perspective on Vocabulary Richness, Computers and theHumanities,vol.37,2003,151-178.

    [8] JagadevA.K,DeviS.,MallR.,Soft Computing for Feature Selection,[in:]DehuriS.,Cho,S.-B.(eds.),Knowledge Mining using Intelligent Agents,ImperialCollegePress,London2011,217-258.

    [9] JockersM.L.,Witten D.M.,A comparative study of machine learning methods for authorship attribution,LiteraryandLinguisticComputing,vol.25,2010,215-223.

  • 15

    [10] JurafskyD.,MartinJ.H.,Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,PrenticeHall,NewJersey2009.

    [11] Karlgren J., Eriksson G., Authors, genre, and linguistic convention, Proc. SIGIRWorkshop on Plagiarism Analysis, Author ship Attribution, and Near-DuplicateDetection,2007,23-28.

    [12] Klein D.,Manning C.D.,Accurate Unlexicalized Parsing, Proceedings of the 41stMeetingoftheAssociationforComputationalLinguistics,2003,423-430.

    [13] Koppel M. Schler J., Argamon S., Authorship attribution in the wild, LanguageResourcesandEvaluation,vol.45,2011,83-94.

    [14] KowalskiP.A.,Procedure of feature extraction from face image for biometrical system (inPolish),TechnicalTransactions,vol.1-AC/2012,CracowUniversityofTechnologyPress,55-79.

    [15] Layton R., Authorship Attribution for Twitter in 140 Characters or Less, CracowUniversityofTechnologyPress,Proc.2ndCybercrimeandTrustworthyComputingWorkshop,2010,1-8.

    [16] ŁukasikS.,HaręzaM.,KaczorM.,Grammatical structures ranking (supplementary material), http://www.pk.edu.pl/~szymonl/nauka/Author_suppl.pdf (date of access:9.10.2013).

    [17] ŁukasikS.,KulczyckiP.,Using topology preservation measures for high-dimensional data analysis in a reduced feature space (in Polish), Technical Transactions, vol.1-AC/2012,CracowUniversityofTechnologyPress,5-15.

    [18] PanichevaP.,CardiffJ.,RossoP.,Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis, [in:]Proc. InternationalConferenceonLanguageResourcesandEvaluation,Valletta,paperno.10.491,2010.

    [19] PennTreebankProject,http://www.cis.upenn.edu/~treebank/[20] Punctuation,[in:]Merriam-Webster’sCollegiateDictionary:EleventhEdition,1009,

    Merriam-Webster,Springfield2004.[21] Radford A., Minimalist Syntax: Exploring the structure of English, Cambridge

    UniversityPress,Cambridge2004.[22] Stamatatos E., A survey of modern authorship attribution methods, Journal of the

    AmericanSocietyforInformationScienceandTechnology,vol.60,2009,538-556.[23] Wang L.P, FuX.J., Data Mining with Computational Intelligence, Springer, Berlin

    2005.[24] ZhengR.,LiJ.,ChenH.,HuangZ.,A framework for authorship identification of online

    messages: Writing style features and classification techniques,JournaloftheAmericanSocietyofInformationScienceandTechnology,vol.57,2006,378-393.

  • * MałgorzataCharytanowicz,Ph.D.,InstituteofMathematicsandComputerScience,TheJohnPaulIICatholicUniversityofLublin;SystemResearchInstitute,PolishAcademyofSciences.

    **HenrykCzachor,D.Sc.,Ph.D.,BohdanDobrzański, InstituteofAgrophysics,PolishAcademyofSciences.

    ***JerzyNiewczas,D.Sc.,Ph.D., InstituteofMathematics andComputerScience,The JohnPaul IICatholicUniversityofLublin.

    MAŁGORZATACHARYTANOWICZ*,HENRYKCZACHOR**,JERZYNIEWCZAS***

    NONPARAMETRICREGRESSIONAPPROACH:APPLICATIONSINAGRICULTURALSCIENCE

    ZASTOSOWANIEREGRESJINIEPARAMETRYCZNEJWNAUKACHROLNICZYCH

    A b s t r a c t

    In this paper, a method for determining the soil pore size distribution, constituting the subject of thepresentedinvestigations,isproposed.Aresearchstudywasconductedusingimageanalysisalgorithms,andinturn,nonparametricstatisticaltechniques.Theresultsandfurtherworkwillbediscussedinsectionfour.Thepurposeofthisinvestigationistodiscovertherelationshipbetweentheporesizeandvolumeofthecorrespondingpores.Thealgorithmpresentedhereisbasedonthetheoryofstatisticalkernelestimators.Thisfreesitofassumptionsinregardtotheformofregressionfunction.Theapproachisuniversal,andcanbesuccessfullyappliedformanytasksindatamining,wherearbitraryassumptionsconcerningtheformofregressionfunctionarenotrecommended.

    Keywords: nonparametric regression, kernel estimators, morphological image processing, closing procedure, pore size distribution, pore space, total porosity, soil structure, aggregation, soil compaction

    S t r e s z c z e n i e

    Celem niniejszego artykułu jest zaprezentowanie procedury wyznaczania rozkładu wielkości porówwagregatachglebowych.Doscharakteryzowaniazależnościpomiędzybadanymizmiennymiwykorzy-stanazostaniefunkcjaregresji.Wprzeprowadzonychbadaniachzastosowanoalgorytmyanalizyobrazówcyfrowychorazmetodykęstatystycznychestymatorówjądrowych.Przedstawionametodaumożliwiauzy-skaniewłaściwejcharakterystykirozkładuwielkościporówimożestanowićefektywnenarzędziestosowa-newwieluzagadnieniacheksploracjidanych.Jakomodelnieparametryczny,niewymagazałożeńdotyczą-cychkształtuzależnościfunkcyjnejmiędzyrozpatrywanymizmiennymi.

    Słowa kluczowe: regresja nieparametryczna, estymatory jądrowe, przekształcenia morfologiczne, operacja zamknięcia, rozkład wielkości porów, porowatość ogólna gleby, struktura gleby, agrega-cja, gęstość gleby

    TECHNICAL TRANSACTIONSAUTOMATIC CONTROL

    1-AC/2013

    CZASOPISMO TECHNICZNEAUTOMATYKA

  • 18

    1. Introduction

    Poresizedistributionisoneofmanyphysicalmeasurementscharacterizingsoilstructureasfarasplantgrowthisconcerned.Anumberofscientistshavereportedstudiesofporespaceasageneralmethodfordefiningsoilproperties.Inthisrespect,acompleteanalysisofthesoilporesizedistributionisusedforpredictingthegasdiffusivity,waterinfiltrationrates,wateravailabilitytoplants,water-storagecapacityandmovementofwater.

    Themostcommonmeasurecharacterizingthefractionoftheporespacewithinasolidisthetotalporosity,definedbytheratio:

    ϕ =VV

    P

    T

    (1)

    where:Vp – thevolumeofvoid-space,VT – thetotalvolumeofsoilmaterial,includingthesolidandvoidcomponents.

    Porosityisadimensionlessquantityandcanbereportedeitherasadecimalfractionorasapercentage.Beingsimplyafractionoftotalvolume,itcanrangebetween0and1,typicallyfallingbetween0.3and0.7formostsoils.Thisprovidesamoreusefulphysicaldescriptionofanparticular soil, suchasprovidinganestimateofcompactionand themaximumspaceavailableforwater.Moreover,anumberofscientistshavereportedthatstudiesofporesizedistributionareusefulasageneralmethodfordefiningthesoilstructure.Poresizesusuallyhavetraditionallybeendividedintomacroporesandmicropores,withthedivisionbetweenthetwobeingarbitrary.Becauseoftherelationshipbetweentheporepropertiesandthereactionofchemicalsinthesoil,amoredetailedpore-fractionanalysisseemswarranted[7,15].

    Thepurposeof this investigation is to elaborate amethodofmeasuring thepore sizedistributioninthesoil.Internaldifferenceinporositywithinanaggregatecanbevisualizedby microtomography scanning of the air-dried samples. Once scanned, the computedtomographyinformationallowsthenon-destructivevisualizationofslices,arbitrarysectionalviewsandpseudo-colorrepresentations[14].

    Fig.1.Crosssectionofatypicalsoilwithporespaceinblack

    Rys.1.Przekrójagregatuglebowego,poryzostaływyróżnionekoloremczarnym

  • 19

    Forimageprocessing,theauthorsusedaprogramforcomputerimageanalysispackage.After preprocessing methods, morphological operation closing, consisting of a dilationfollowed by erosion,was used to fill in holes and small gapswithout changing the sizeandoriginalboundary shape [5,6,12,17].Subsequently,digital imageswere segmentedintoporespaceandsolid.Thedataderivedautomaticallyfromimageswasthenstatisticallyexamined,asnonparametricstatisticalregressionallowedforthedeterminationofthesoilporedistribution.

    2. Material and Methods

    ThesestudieswereconductedusingsoilaggregatesfromthecultivatedsoillayerexploredattheInstituteofAgrophysics,ofthePolishAcademyofSciencesinLublin.Thedirectandnondestructive analyses of internal soil aggregate structureswere detected using computedtomographyequipmentNanotom800,with thevoxel-resolutionof2.5micronspervolumepixel[4,8].Threetypesofaggregates,differentintermsoffertilization,denotedasaggregates0–without fertilization,NPK–mineral fertilization, andOB–pigmanure,were studied.TomographysectionswereprocessedusingmacroswritingintheAphelion4.0.1package.

    Fig.2.Thesoilaggregatemicrotomographicimage

    Rys.2.Obraztomograficznyagregatuglebowego

    Theentireprocedurewascomposedofthefollowingsteps.Firstly,grayscaleimageswerepreprocessedbyremovingringartifactsusingtheROImethod.Afterselectingtheregionofinterest,theautomaticOtsubinarizationmethodwasthenemployed.

    Subsequently, binarymorphological closingwith increasing sizeof square structuringelement(startingwithsize2),wasprocessed.Ineachstep,thesourceimagewassubtractedfrom the target image, and the result volumes were listed, giving soil aggregate poredistribution. The operation was repeated until all pores were filled. Subtraction of thetransformedimagefromtheoriginalimagegivesthetotalporevolumeinthesamplesoilpores.Finally,theporefractionsanalysiswasperformedusingtheregressionapproach[3].

  • 20

    Fig.3.ThesoilaggregateimageafterselectingROIandbinarizationthroughusingOtsumethod

    Rys.3.ObrazagregatuglebowegopozastosowaniuselekcjiROIibinaryzacjimetodąOtsu

    Theprocedurefordeterminingthesoilporesizedistribution

    1.X-raymicrotomographicimageanalysisofaggregates.2.Ringartifactremovalusing–theROImethod.3.Imagebinarization–theOTSUmethod.4.Determiningtheporesizedistribution–binarymorphologicalclosing.5.Porefractionsanalysis–regressionapproach.

    Innextsection,theregressionapproachisshortlydescribed.Asimplelinearregressionmodel is not appropriate for the data. Therefore nonlinear estimation and nonparametricmethodswereconcerned.

    3. The regression analysis

    Thevalidationoftheefficiencyandaccuracyofregressiontechniquewasexploredbycomparingresultsonavarietyofrealdatasets.Bothaclassicalparametricregressionmodelandseveralnonparametricmethodswereexaminedasfarastheporesizedistributionwasconcerned.

    The best fit to the data in the family of nonlinear estimationwas revealed for twomodels:polynomialwiththepowerofthree;andlogarithmic,forwhichthedeterminationcoefficientswereinrange80–90%.Theproposedregressionfunctionsdidnotwelldiscoverthepropertiesof theporesizedistributions,especiallyon the left side,becauseof theirpositively skewed and unimodal character.Therefore, the nonparametric kernelmethodwasrecommended[1,9,10].

  • 21

    Fig.4.Theporesizedistribution:thedatapresentsthreetypesofaggregatesdifferentintermsoffertilization,denotedasaggregates:0–withoutfertilization,NPK–mineralfertilization,

    andOB–pigmanure

    Rys.4.Rozkładyporów,wykresyprzedstawiajątrzytypyagregatów:0–beznawożenia, NPK–nawózmineralny,OB–nawóznaturalny

  • 22

    Classical parametric methods of determining an appropriate functional relationshipbetweenthetwovariablesimposearbitraryassumptionsconcerningthefunctionalformoftheregressionfunction.Moreover,thechoiceofparametricmodeldependsverymuchonthesituation.Ifachosenparametricfamilyisnotofappropriateform,thenthereisadangerofreachingincorrectconclusionsintheregressionanalysis.Thisalsomakesitdifficulttotakeintoaccountthewholeoftheaccessibleinformation.However,therigidityofthisregressioncanbeovercomeby removing the restriction that themodel isparametric.Thisapproachleadstononparametricregression.Thisletsthedatadecidewhichfunctionfitsthembest.In this study,aclassofkernel–type regressionestimatorscalled ‘localpolynomialkernelestimators’ispresented.

    Lettherefore,m elements(xi,yi)∈ R × R,i=1,2,...,mbegiven,wherevaluesximaydesignatesomenon-randomnumbersorrealizationsoftheone-dimensionalrandomvariableX,whereasyidesignaterealizationsoftheone-dimensionalrandomvariableY.Assumingtheexistenceofthefunctionf : R→Rhavingacontinuousfirstderivativethat:

    y f xi i i= +( ) ε (2)

    whereεiareindependentrandomvariableswithzeromeanandunitfinitevariance.Letthenp ∈ Nbethedegreeofthepolynomialbeingfit.Thekernelregressionestimator obtainedbyusingweightedleastsquareswithkernelweights,isgivenbytheformula:

    (3)

    where:

    y y y ym= [ , , ... , ]1 2T (4)

    isthevectorofresponses,

    X

    x x x x x xx x x x x x

    x x x x

    p

    p

    m m

    =

    − − −− − −

    − −

    11

    1

    1 12

    1

    2 22

    2

    ( ) ( )( ) ( )

    ( )

    ��

    � � � � �22 � ( )x xm

    p−

    (5)

    isanm × (p+1)designmatrix,and:

    Wh

    Kx x

    h hK

    x xh h

    Kx x

    hm=

    diag , , ... ,1 1 11 2

    (6)

    isanm × mdiagonalmatrixofkernelweights,while:

    e = [ , , ... , ]1 0 0 T (7)

  • 23

    is the 1 × (p + 1) vector having 1 in the first entry and zero elsewhere.The coefficient h>0iscalled’abandwidth’,whilethemeasurablefunction K R: ,→ ∞[ )0 ofunitintegral,symmetricalwithrespecttozero,andhavingaweakglobalmaximuminthisplace,takesthenameofthekernel.

    Animportantproblemisthechoiceoftheparameterp.Forsufficientlysmoothregressionfunctions,theasymptoticperformanceof improvesforhighervaluesofp.However,forhigher p,thevarianceoftheestimatorbecomeslarger,andinpractice,averylargesamplemayberequired.Ontheotherhand,theevendegreepolynomialkernelestimatorhasamorecomplicatedbiasexpressionwhichdoesnotlenditselftosimpleinterpretation.Thesefactssuggeststheuseofeitherp=1orp=3.Moreover,forp=1,theconvenientexplicitformulaeexists:

    (8)

    where:

    forr=0,1,2 (9)

    Therefore,exceptinmoreadvancedstatisticalapplications,p=1ispreferred.Thechoiceofthekernelformhasnopracticalmeaningandthankstothis,itispossibleto

    takeintoaccounttheprimarilypropertiesoftheestimatorobtained.Mostoften,thestandardnormalkernelisexpressedbytheconvenientanalyticalformula:

    K x ex

    ( ) =−1

    22

    π

    2

    (10)

    isused.Thepracticalimplementationofthekernelregressionestimatorsrequiresagoodchoice

    ofbandwidth.Ifhistoosmall,aspikyroughkernelestimateisobtained,andifhistoolarge,it results inaflatkernelestimate.A frequentlyusedbandwidthselection technique is the‘cross-validationmethod’introducedbyClark[2],whichchooseshtominimize:

    (11)

    where: – theleave-one-outkernelregressionestimatorbasedondata

    x y x yi i1 1 1 1, , , ,( ) ( )− − , x y x yi i m m+ +( ) ( )1 1, , , , .Atypicalregressioncurveobtainedbythekernelregressionmethodform=200data

    pointsgeneratedfromfunction f x x e x( ) , sin ( ) ( )= ⋅ − + − −0 1 5 2 1 42 asdefinedoninterval[0,3]

    isdemonstratedonFig.5.AstandardnormalkernelKgivenbyrule(10)andbandwidthh calculatedbythecross-validationmethod(11)wereused.

  • 24

    Fig.5.Theestimatedregression:thedatapointsarerepresentedbyacross,thetruefunctioncurvebyadashedline,theregressioncurvebyasolidline,andkernelsforargumentsx=1andx=2by

    adottedline

    Rys.5.Jądrowyestymatorfunkcjiregresji:wartościpróbylosowejoznaczonokrzyżykami,estymo-wanąfunkcjęliniąprzerywaną,jądrowyestymatorfunkcjiregresjiliniąciągłą,jądradlaargumentów

    x=1ix=2liniąkropkowaną

    Thetasksconcerningthechoiceofthekernelform,thebandwidth,aswellasadditionalproceduresimprovingthequalityoftheestimatorobtained,arefoundin[13,16].Theutilityof local linearkernelestimatorshasbeen investigated in thecontextofsometypicaldataderivedfromthesoilaggregateimages.

    4. Results and discussion

    Themainaimofthisresearchis todiscovertherelationshipbetweentheporesizeand volume of the corresponding pores in soil aggregates. The data obtained by theproposedmethod,basedontheimageprocessingalgorithms,allowstheuseofregressionapproach.

    Thekernelregressionbuiltuponaweightedlocallinearregressionwasusedtoanalyzethesoilporesizedistribution.Foreaseofcomputation,thestandardnormalkernel(10)wasused.Thebandwidthwasdeterminedusingthecross–validationmethod(11).

    Thus,particle-sizedistributionwastranslatedintoanequivalentregressionmodel,whichinturn,isrelatedtocharacterizewaterretention,crucialinanymodelingstudyonwaterflowandsolute transport insoil.Thiscanbeused forcomparingandpredictingsomespecificpointsofinterestofthewaterretentioncharacteristics.

  • 25

    Fig.6.Theestimatedregression:thedatapresentsthreetypesofaggregatesdifferentintermsoffertilization,denotedasaggregates0–withoutfertilization(asolidline),NPK–mineralfertilization

    (adashedline),andOB–pigmanure(adottedline)

    Rys.6.Jądrowyestymatorfunkcjiregresjidlatrzechtypówagregatów:0–beznawożenia(liniaciągła),NPK–nawózmineralny(liniaprzerywana),OB–nawóznaturalny(liniakropkowana)

    Fractionof total aggregatevolumeoccupiedbyporeswas significantlygreater inOBaggregates.TherewasalsogreaterpercentageoflargeporesinOBaggregatesthanin0orNPKaggregates.Inthesmallestrange,however,porosityof0aggregatesexceededthatofNPKandOBaggregates.TotalporositieswerehigherforbothNPKandOBaggregates(24%and32%,correspondingly)thanthe14%for0aggregates.

    AcombinationofhighermicroporosityandhigherpercentoflargeporesinOBaggregatesmaygeneratemorefavorableconditionsformicrobialactivitythroughacombinationofhighwater-holdingcapacities,increasedaerationandgastransport.Ourcurrentaimiscomparingandrelatingspecificsofinternalporestructuresintheaggregatesfromtheirwaterstability,composition and chemical properties.Water stable aggregates are much more porous inrelationtothenonstableaggregates.

    Soilaggregationanditsmaintenanceisveryimportantforsustainableagriculturebecauseit impactsmajorityofbiological andphysical soilproperties andprocesses.Thisdirectlypromotesbettermovementof air andofwater in the soil, prevents runoff and soilwatererosion,andindirectlydeterminesplantgrowth.Howeversoilaggregatesareverysensitiveagainstwaterandmechanicalstresses.Discoveringthemainfactorsdeterminingtheirwaterstability is a great challenge forworldwide agriculturewhichwould help to prevent soildegradationandtoincreaseacarbonsequestrationinsoilsandtheirfertility.

    The aim of future research is to discover the differences between the inter aggregatepore structureofwater stableandnonstableaggregates [8].Additionalphysico-chemicalparametersobtainedbychromatographyanalysisaregoingtobeused.

  • 26

    5. Conclusions

    Recent advances in computed tomography and digital image processing algorithmsprovidetechnologicallyadvancedmeasurementtoolsforstudyingtheinternalstructuresofsoilaggregates.Thisseemsveryusefulincharacterizingtheporesizedistributionandinquantifyingthedifferencesinporestructuresoftheaggregatesfromthedifferenttypesofsoil.

    Amoredetailedanalysiscanbeobtainedbyderivingvariousmethodstoquantifytheporestructureanddevelopingaporesize-distributioncurvetopredictretentionproperties.Theproposedalgorithm,basedonimageanalysisandkernelestimatormethodology,isexpectedtobeaneffectivetechniqueforvariousagriculturalstudies.

    Nowadays,kernelregressionisacommontoolforempiricalstudies inmanyresearchareas.Thisisalsoaconsequenceofthefactthatkernelregressiontechniquesareprovidedbymanysoftwarepackages.

    Thekernelregressionapproachisalsocommoninimageprocessingandreconstructionmethods.Quiterecently,kernelregressionhasexperiencedakindofrevivalinEarthsciencesonestimatingsomecharacteristicsofthedistributioninnonparametricmodels[11].

    R e f e r e n c e s

    [1] CharytanowiczM.,KulczyckiP.,Nonparametric Regression for Analyzing Correlation between Medical Parameters,[in:]PietkaE.,KawaJ.(eds),Advances in Soft Computing – Information Technologies in Biomedicine,Springer-Verlag,Berlin,Heidelberg2008.

    [2] ClarkR.M.,Non-Parametric Estimation of a Smooth Regression Function,JournaloftheRoyalStatisticalSociety,SeriesB39,1977,107-113.

    [3] DraperN.R.,SmithH.,Applied regression analysis,JohnWileyandSons,NewYork1981.

    [4] GonetS.,CzachorH.,Organic carbon and humic substances fractions in soil aggregates 16thMeetingoftheInternationalHumicSubstancesSociety,FunctionsoftheNaturalOrganicMatterinChangingEnvironment,China,Hangzhou,9-14.09.2012,215-217,2012.

    [5] GonzalezR.C.,WoodsR.E.,Digital Image Processing,Prentice-HallInc.,NewJersey2002.

    [6] Kowalski P., Procedura ekstrakcji cech z obrazu twarzy dla potrzeb systemu biometrycznego, Technical Transactions, vol. 1-AC/2012, Cracow University ofTechnologyPress,2012,55-79.

    [7] Kravchenko,A.,ChunH.C.,MazerM.,WangW.,RoseJ.B.,Smucker,A.,RiversM., Relationships between intra-aggregate pore structures and distributions of Escherichia coli within soil macro-aggregates,AppliedSoilEcology,vol.63,2013,134-142.

    [8] KrólA.,NiewczasJ.,CharytanowiczM.,GonetS.,LichnerL.,CzachorH.,LamorskiK.,Water stable and non stable soil aggregates and their pore size distributions, 20thInternationalPosterDayand InstituteofHydrologyOpenDay“Transport ofwater,chemicalsandenergyinthesoil–plant–atmospheresystem”,2012,870-871.

    [9] KulczyckiP.,Estymatory jądrowe w analizie systemowej,WNT,Warszawa2005.

  • 27

    [10] KulczyckiP.,Kernel estimators in industrial applications, [in:] PrasadB. (ed)SoftComputingApplicationsinIndustry,Springer-Verlag,Berlin2008.

    [11] Perzanowski K.A., Wołoszyn-Gałęza A., Januszczak M., Indicative factors for European bison refuges in the Bieszczady Mountains,Ann.Zool.Fennici 45, 2008,347-352.

    [12] PrattW.K.,Digital Image Processing,JohnWileyandSons,NewYork2001.[13] SilvermanB.W.,Density Estimation for Statistics and Data Analysis,Chapmanand

    Hall,London1986.[14] Smith J.R., Chang S.F., VisualSeek: A fully automated content-based image query

    system,Proc.ACMMultimediaConf.,1996,87-98.[15] VanderWeerdenT.,KleinC.,KelliherF.,Influence of pore size distribution and soil

    water content on N2O response curves, 19thWorld Congress of Soil Science, SoilSolutionsforaChangingWorld,1–6August2010,Brisbane,Australia2010.

    [16] WandM.P.,JonesM.C.,Kernel Smoothing,ChapmanandHall,London1994.[17] WayneL.W.C.,Mathematical Morphology and Its Applications on Image Segmentation,

    MSthesis,Dept.ofComputerScienceandInformationEngineering,NationalTaiwanUniversity,Taiwan2000.

  • * DominikaGołuńska,M.Sc.,DepartmentofAutomaticControlandInformationTechnology,FacultyofElectricalandComputerEngineering,CracowUniversityofTechnology;PhDStudies,SystemsResearchInstitute,PolishAcademyofSciences.

    **MałgorzataHołda,Ph.D.,DepartmentofAutomaticControlandInformationTechnology,FacultyofElectricalandComputerEngineering,CracowUniversityofTechnology.

    DOMINIKAGOŁUŃSKA*,MAŁGORZATAHOŁDA**

    THENEEDOFFAIRNESSINGROUPCONSENSUSREACHINGPROCESSINAFUZZYENVIRONMENT

    KONCEPCJA„SPRAWIEDLIWOŚCI”WPROCESIEOSIĄGANIAKONSENSUSUWGRUPIE

    WWARUNKACHROZMYTOŚCI

    A b s t r a c t

    Inthispaperweexplaintheneedof“fairness”inthehuman-consistentcomputationalsystemwhichsupportsagroupconsensusreachingprocess.Weproposethemodelwhichcombinesmathematical approach based on fuzzy environment, i.e. “soft” consensus developed byKacprzyk and Fedrizzi and socio-psychological approach concerning fairness component.Weviewfairnessfromtwopossibleperspectives:afairdistributionandafairfinaldecision.Finally,weconfirmourassumptionsbyobservationsintheanalyzedgroupsofstudents.

    Keywords: group consensus reaching process, decision support systems, fairness, fair resource allocation, soft consensus

    S t r e s z c z e n i e

    W niniejszym artykule wyjaśniono potrzebę „sprawiedliwości” w systemie obliczeniowymwspomagającymprocesosiąganiakonsensusuwgrupiedecydentów.Autorkizaproponowałymodelłączącypodejściematematyczneopartenaśrodowiskurozmytymorazczynnikispołecz-no-psychologicznewyjaśniająceopisywanąkoncepcję.Pojęcie„sprawiedliwości”rozpatrywa-nejesttuwdwóchkategoriach:rozkładuzasobóworazdecyzjiostatecznej.Założeniapopartesąwnioskaminapodstawieobserwacjigrupstudentów.

    Słowa kluczowe: proces osiągania konsensusu w grupie, systemy wspomagania decyzji, spra-wiedliwość, sprawiedliwy rozkład zasobów, miękki konsensus

    TECHNICAL TRANSACTIONSAUTOMATIC CONTROL

    1-AC/2013

    CZASOPISMO TECHNICZNEAUTOMATYKA

  • 30

    1. Introduction

    Decision theory is an interdisciplinary domain which combines research from manydisciplines, i.e. psychology, sociology, economics, philosophy, political science, etc. Theformaldirectioncannotbetheonlycourseofdecisionmakingproblemssincealltheclassicalmethodshaveaverylimitedcapacityforexplainingempiricalchoices.

    Regardlessofitsorigin,theessenceofdecisionmakingprocessisalwaysthesame:therearesomeoptionstochoosebetweenandthechoiceismadeinagoal-directedway[6].Infact,manydifferentmodelsofdecisionmakingprocessoccurredandenrichedananalysisofhumanbehavior,socialinteractionsandothersocio-economicdescriptionsdependingonthe respectivepurpose.All of thesenovel agent-basedcomputationalmodels appeared inordertomaketheprocessmorehuman-consistentandbelievable.Thatisthereasonwhywedecidedtoapplypsychologicalandsociologicaltheoriestoinvestigateanddesignsystemsinthisresearchtopic.

    Bythesametoken,theconceptoffairnessappearstobeamultifariousissuewhichdrawsuponideasfromawholepanoplyofscientificdisciplines.Theprevalentaspect,however,istheimpossibilitytoachieveanideallyfairdistributionofpowerinthedecisionmakingprocess. Deviations of behavior from the presumed results in decision making processsuggestandalsoconfirmthatfairnessinfluencesdecisionmakingprocess.

    Groupsofindividualsareknowntobeeffectiveorgansindecisionmakingprocess.Inspiteofseveraldysfunctionsofgroupwork,therestillaremorecrucialbenefits(processgains).Namely,groupsarebetterthanindividualsatunderstandingproblems,atcatchingerrors,sotheyprovidelearning.Moreover,agrouphasmoreinformationthananymemberandcancombinethisknowledgetoderivebettersolutionsandstimulatethecreativityofparticipantsand theprocess itself.Hence, thegroup decision making processwillbe thegroundworkofourfurtherconsideration.Consideringdifferent typesofgroups[7] indecisionmakingproblem,wetookintoaccountinterpersonal orientation group.Itmeansthatoneinwhichthefinalsolutionoftheproblemisonlyaminorgoal.Here,thepriorityistoensureagoodrelationamonggroupmembersduringdecisionmakingprocessandtoachieveconsensusinthesenseofsomesatisfactoryagreementastothechosenoption.

    We want to guarantee an equal participation of all decision making members in theconsensusreachingprocess.Inmostcases,thereisalsoasmallgroupofoutsiderswhoareisolatedintheiropinionsastotherestofthegroupandareomitted.Significantly,outsidersdonotfeelthesatisfactionofdiscussion,whataffectstheeffectivenessofanentiregroup.Ofcourse,itdoesnotexcludethefinaldecisionachievement,butdecreasestheopportunityofmanyfurtheractivities,i.e.practicalimplementationofthefinaldecision,survivalofthegroupinthelongtimeperiod,etc.

    Allofthesocio-psychologicalaspectsforcedustoseekanovelapproachofconsensusdegreewhichwill consider the satisfaction of every individual throughout the consensusreachingprocess.InSectiontwowewillbrieflypresenttheoverallstructureofconsensusreachingprocess.Sectionthreeshowsanexplanationofgroupdecisionsupportsystemswithawiderdescriptionofamoderatorwhoplaysherethemainrole.Furthermore,weattempttoreducethecomplexityoftheproposedsystemwithadetaileddescriptionofrelevantaspectsonly.Inthepreviouswork[3],wemainlyfocusedonperformanceofthecoreofmultistagemodel of consensus reaching process under fuzziness. In the light of the fact that mosthumanbehaviorshavenotbeen formalizedmathematicallyyet,ourpurpose in thispaper

  • 31

    istogetabetterunderstandingofhowsocialmechanismingroupdecisionsupportsystemsworks.Therefore,inSectionfourwewillnamesomepsychologicalstudiesandsociologicalexplanationsofactualfairbehaviorandconfirmthemwithourobservationsintheanalyzedgroups of students. For comparison, we investigated the problematics of fairness in itsversatilityontheexampleofastudyoftwogroupsofstudents.Ananalysisoftheproblemoffairnessinthegroupsofstudentsundergoinganexaminationshowsthemanifoldfactorsinvolvedintheassessmentofwhatkindofbehaviorisfairandwhatkindisnot,andhowtoachievefairnessindecisionmakingprocess.Ourproposalconcerningdivisionoffairnessapproachintwopossibledirections:afairdistributionandafairfinaldecisionisdescribedinSectionfiveandsixrespectively.

    2. Group consensus reaching process

    Basically,weassumethefollowingsettings:thereisafinitesetofindividuals(experts,agents)andthefinitesetofalternatives(options).Expertsopenlyexpresstheirpreferencesbymeansofpairwisecomparisonastotheeverypairsofavailableoptions[13].Whatmattershere is that themaingoal of groupdecisionmakingprocess is tofind a solution that alldecisionmakersarewillingtosupport[2].

    Decisionmakingproblemisaniterativeandinteractiveprocesswhichincludesseveraldifferent levels, i.e. aggregating all individual preferences into one common decision,elaborating on the agreement in the spirit of consensus reaching process etc. Reachingconsensus requires: time,activeparticipatingofallmembers,creative thinkingandbeingopen-minded,activelistening,consideringideas,feelingsandsituationsofeveryparticipant.The model of consensus reaching process is manageable only if individuals are able tonegotiateandchangetheirpreferences,thustheyarewillingtosupport.

    Consensus is reachedwhen each expert in the group agrees to support the selectedfinal decision, though itmay not have been his or her first choice. It forces the groupto consider all aspects of a problem and voice objections to possible alternatives [2].Hence,themainpartofthisprocessisdiscussion,whichgivestheopportunitytoexchangeknowledge, clarify point of view, defend own preferences or to become convinced todifferentopinions.Anymembercanblockconsensus.Thatiswhythesekindsofdecisionsaremoredifficult andcomplex thanothers.Thus, toachieve themaingoal,weassumethatindividualsare“committedtoreachingconsensus”–theyareexpectedtoiterativelyupdate their testimonies, and as a result to finally attain a satisfactory agreement.Weassumethetopologicalapproachofwhereagreementismeasuredonthebasisofdistancebetweenindividualsduringeverystageoftheprocess.Initially,expertsdisagreeintheirpreferences,sotheyarefarawayfromconsensus.Theaimistominimizethisdistance,andconsequentlyleadthegroupclosertotheacceptableagreement[14].Whatmattershere,isthattheseinitialdifferencesofopinionsareastrengthofthegroupandakeytogatheradditionalinformation,clarifyissuesandforcethegrouptosearchforbettersolutionswithbiggerbenefitsforeveryone[2].

  • 32

    3. Group decision support systems for consensus reaching

    Sincethedevelopmentofmoderntechnology,computerizedsupportinmakingdecisionhas enormouslyprogressed.Today’s tools areflexible, efficient, easy inuse andallow tocreateaninteractiveuser-friendlyinterfacetoviewdata,configuremodels,etc.Thisclassof computer-based information systems including knowledge based systems that supportdecisionmakingactivitiesisdefinedbyoneterm–decision support systems.Theycombinetheintellectualreservesofindividualswiththeproficiencyofcomputertoenhancethequalityof final decision [16, p. 13]. Similarly,group decision support systemsmean interactive,intelligent,computer-basedsystemsthatfacilitatesolutiontounstructuredproblemsbyasetofdecision-makersworkingtogetherasagroup.Unstructuredproblemsare“fuzzy,complexprocessestowhichtherearenocut-and-driedsolutionmethodsandwherehumanintuitionisoftenabasis fordecisionmaking[16,p.11].”Softwareproductsprovidecollaborativesupporttogroups,i.e.supplyamechanismforteamstoshareopinions,data,information,knowledge,andotherresources.Whatmattershereisthatgroupdecisionsupportsystemisanadjuncttodecisionmakerstofacilitatetheirdecisionmakingprocessbutnottoreplacetheir judgments.Moreover, it isadynamicsystemwhichisadaptiveovertime,therefore,decisionmakersshouldbereactiveandabletochangetheiropinionsquickly.Groupdecisionsupportsystemsattempttoimprovetheeffectivenessofdecisionmaking(accuracy,quality)ratherthanitsefficiency(thecostofmakingdecisions).

    Thekey to success is tocreatemorehuman consistent andhuman centered toolsandtechniquestograspanddealwithdifficult(decisionmakingtype)problems.Thesesystemsshouldprovidecomputational tools,cognitiveaspectsandsocialdimension.In theGDSSconsideration it means that a computer asks a group to solve a problem, then collects,interpretsandintegratesthesolutionsobtainedbyhumans.

    Themainroleofthiscomputer-basedsystemplaysmoderatororfacilitatorwhotakescareofrunningthewholediscussionprocess.Moderatorconstantlymeasuresdistancesbetweenindividualsandcheckswhetherconsensusisreachedornot.Moreover,hismostimportanttask is to support thediscussion, i.e.he stimulates theexchangeof information, suggestsarguments,convincesdecisionmakerstochangetheirpreferences,focusesthediscussionontheissueswhichmayresolvetheconflictofopinionsinthegroup.Thisisrepeateduntilthegroupgetssufficientlyclosetoconsensus,i.e.untiltheindividualfuzzypreferencerelationsbecomesimilarenough,oruntilwereachsometime limit [11].Doubtless, themoderatoraffectsthegeneralsenseofsatisfactionwithinthegroupandhasadirectinfluenceonthequalityoffinaldecision.Whatmattershereisthatheonlytriestopersuadeproperexpertstochangetheiropinionsandsuggestssomerationalarguments–hedoesnotforce,argueorpushindividualstochangetheirtestimonies.

    Ourtaskistodevelopandenhancethediscussionpartandprovidethemoderatorwithaspecificknowledgeaboutgroupmembers.Brieflyspeaking,wewanttofacilitatetheworkofmoderator,providehimwithsomeusefulguidelinesandadditionalindicatorsand,asaresult,maketheconsensusreachingprocesseasier,faster,moreeffective.

  • 33

    4. Notion of fairness in the consensus reaching support systems – a socio-psychological explanations with observations in the analyzed groups of students

    One of the definition of fairness says that “fairness means the satisfaction of justifiedexpectationsofagentsthatparticipateinthesystem,accordingtorulesthatapplyinaspecificcontextbasedonreasonandprecedent[19]”.Fairnessisanintricateideathatdependsonmanyfactors,e.g..culturalvaluesorthecontextoftheproblem.Itcombinesmanydifferentresearchareassuchasmathematics,philosophy,economicsandothersocialsciences,especiallysocialpsychology.Thelastresearchareaiscrucialbecauseitgivesananswertoaquestion:“howpeopleunderstandfairbehavior[18,p.15]?”Generally,fairnessisunderstoodasequalityandbecomesanessentialelementofthenewagent-basedcomputationalmodelswhichaimatexplainingactualbehavior.Thus,anotherquestionarises:whetheronecan talkabouta simpleunifiedprinciplewhichcouldpossiblysolvethequandaryofdecisionmakingprocess?Therangeoffactorsinfluencingtheequilibriuminagroupusuallyturnsouttobeversatileanddependingongroupdynamics,psychologicalandsituational,aswellaspersonalcharacteristicsofgroupmembers.Theoverallideaisthatsubjectsmaybesimplyprejudicedintheirunderstandingoffairness.Opponentsinthedecisionmakingprocessmaybeattunedtoadifferentperceptionoffairness,whichimpedesdecisionmakingprocess,orcausesthatconsensusbecomesimpossible.Groupdecisionmakingisusuallydependableupontheactualstateofemotionspervadingthecircleofpeoplewhosetthemselvesataskofreachingconsensus.

    Ouranalysisofgroupbehaviorintwogroupsofstudentsconsistingof14and12membersrespectively (groupAandB– studentsof the thirdyear at theDepartmentofAutomaticControl and Information Technology, Faculty of Electrical and Computer Engineering,Cracow University of Technology) allowed to draw the following conclusions. It wasnoticeablethatifatleastsomepeopleinagroupcaredaboutequity,consensuswasreachedmorequicklyandthedecisionmakingprocessbecamesmoother.Thesecondoftheanalyzedgroups(B)showedaslightlymoreconsensuspronecharacter.Thedistinctionsbetweenthestandpointsof the individualagentsandsubgroupswerenot so sharpas ingroupA.ThemajorfactorinreachingconsensusinasmootherandquickerwayingroupBthaningroupAwasalowerstateofpsychologicalagitationcharacteristicofthegroup,andacalmer,lessaggressivelevelofcommunication.Initsmajorpart,thegroupconsistedofagentswithmuchless individualizedpersonalities, andabetterdeveloped senseof cooperation resulting inasubsequentfasterimplementationofthefinaldecision.

    The other crucial and decisive factor in reaching consensus in this groupwas a veryefficientcooperationinsmallsubgroups.TheindividualsingroupBusuallyexertedamuchmoreopenattitudetothepreferencesofotherindividuals,andwerelesslikelytofallpreytoaself-servingbias.Theoverallmechanismofthefunctioningofthisgroupandreachingconsensusprocesswasbasedonaveryconspicuoussupportivesystemofthepresentationofideasandtheskillfulputtingforthofthecarefullyselectedarguments.Generallyspeaking,mostofthemembersofthegroupshowedarelativelyhighlevelofemotionalintelligenceallowingtoanalyzealmostemotionlesslytheargumentsstatedbytheiropponents.WhatwasalsowellobservablewasthefactthatsomemembersofgroupBexertedastrongpositiveinfluenceontheentiregroupbyexhibitinggooddiplomaticandnegotiatingskills.Twooftheagentsinthisgroupwereextremelyflexibleinswiftchangesofthecommunicativestrategiestheydeployedtoinvigoratetheothermembersofthegrouptosharethemostexposedandmostsuccessfullysupportedopinion.

  • 34

    Theexplanations as for thebehavioralpatterns in thisgroupcanbe supportedby thedefinitionofthe cooperative game theorywhichvirtuallyisagamewhereplayerscanenforcefairbehavior.Cooperativegametheoryisconnectedwiththedistributionofbenefitsthatagroupofagentsachievesfromcooperation.Themodelassumesthatthegroupofindividualswishes tosolveacommonproblemandbycooperating theycansolve theproblemmoreefficiently [18,p.56]. In fact, research inpsychologyhasshown that ingroupsituations,decisions of individuals are influencedbymotives such as “groupperformance, sense ofresponsibilityforothers,orsocialconcerns[18,p.61]”.Itisalsoworthnoticingthatagreaterleveloffairnessandmoresuccessfulachievementofgoalsindecisionmakingprocesswereattributedtosomecharacteristicsoftheobservedgroupssuchas:aneagernesstolearntheactualdifferencesbetweentheirownopinionsandthoseoftheopponents,andanabilitytoeliminatesuchopinionswhichwere impedingagreement themost.Theopennessand themarkedly interactive character of the group relations proved to be the core of success inmeetingthepointofbalance.

    Bycomparison,thefirstgroupofstudents(A)whichblockedconsensusreachingprocesswascharacterizedveryoftenbyanoverallunwillingnesstocometoasatisfactorysolutionforallofthemembers,andaninsistenceonparticularinterestsofminigroups,orindividualswithinthegroup.Theminigroupswereveryassertiveinexpressingtheiropinionsandoftenignorantof theopinionsofopponents.Sticking to theproclaimedopinionsandnotbeingopentotheproposalsofothers,especiallythosewhichwereverydifferentendedupinanimpasseofdiscussionandan impossibilityof reachinganagreement.Quite frequentwasthesituationinwhichsomemembersofthegroupshowedatotaldisregardfortheopinionsoftheiropponents,closingthemselvesinaverylimitedmindframework.Thecreativityinreachingsatisfactorysolutionswasdecreased,whichwasnotanoptimisticforecast.Wemayassumethatthediminishedlevelofthecoherenceandcreativeunityofthegroupwillnotleadinthelongruntoanexpectedconsensus,aswellasaverymuchneededimplementationofthepossiblesolutions.

    Itcannotgounnoticedthattheentirecontextofcomingtoanagreementisfundamentallyasituational,contextual,andpsychologicalphenomenon.Suchfactorsasculturalunificationorculturalclashes,andmuchinthesamemanner,economicdifferencesorsimilareconomicstatusmaybeofsignificanceingroupconsensusreachingprocess.Theseelementsmayhaveinfluenceonwhetherthemoreegoisticallyorientedindividualsorthemorefairproneagentsoverbalancetheprocessofreachinganagreement.Theeconomicelementofgroupconsensusreaching process was thoroughly examined, for instance, by Ernest Fehr and Klaus M.Schmidt[5].Inouranalysisweignoredthesefactorsfocusingonotherprevalentingredientsofthesituationalcontextsuchas,forinstance,persuasiveandmanipulativecapabilitiesofthegroupmembers.

    Inourstudyweobservedthatagentspushedequityprincipleswhichwereadvantageousfor themmore than for other parties, in particular, thosewhichwere disadvantageous topartieswithgreatpersuasionpower.Thefirstofthetwoanalyzedgroupsofstudentsshowedacomplexityofreactionswithinthesmallgroupsorsubgroupsindecisionmakingprocess,highlydependableonpsychologicalfactors.Oneofthefirstthingswhichwerenoticedprovedthatgroupbehaviortosomeextentwasinlinewithindividualbehaviorofparticularagents.Basically,groupbehaviordependedonthedecisionrules thatagentsselectedandusedtoarriveatgroupdecisioninabiasedway.Anotherthingwhichwasverywellseenwasthatsubgroupsexpressingsimilarstandpointswithintheobservedgroupsignoredthedecision

  • 35

    ruleoftheirsocalled“opponents”–whichmeantthattheyfollowedtheself-serving,orself-interestpreferenceprinciple.What ismore,agivensubgroupsharingasimilarviewpointtypically disregarded other subgroups and thought of themselves as if they were singleagents.Aboveall,onemuststressthatexpectationsasfortheleveloffairnesswereoftennotconsistentwiththeoutcome;weexpectedtheleveloffairnesstobehigher.

    These observations bear out the importance of the self-interest preference in groupbehavioral patterns. The choices of the respective members of the group underpin ourtheoreticalassumptionsasfortheself-centerednessoftheindividualsinfluencingthefinalgroupdecisionmakingprocess.

    However,psychologicalstudieshaverevealedthatinreallife,decisionmakersarenotasselfishaswhat isshownin thesolutionsreceivedusingmechanismsofrationalchoiceapproaches,inthesenseofmaximizationofsomeutilityfunction[6].Experimentsshowedthatindividualstendtocooperateandgiveprioritytofairnessovergreedybehavior.Trust gamewilltransparentlyperformthisactivity.Inthetrust game,AhasaninitialamountofmoneyheorshecouldeitherkeeportransfertoB.IfAtransfersit,thesumistripled.Bcouldkeepthisamount,ortransferit(partiallyortotally)toA.TraditionalgametheorysuggeststhatAshouldkeepeverything,or ifAtransfersanyamount toB, thenBshouldkeepall.Experimentalstudieshaverevealedthatagentstendtotransferabout50%oftheirmoneyandthisfairnessandcooperationisrelatedtoallcultures,sexes,etc[1].

    With reference to our assumption that fairnessmeans the satisfaction of expectationsof agents, groupdecision support system should provide the sense of satisfaction amongthegroupmembersduring thediscussion andafter theprocess completion.According topsychological research, satisfaction of decision makers has a direct influence on higherqualityoffinaldecisionandseveral furtheractivities, i.e.practical implementationof thefinaldecisionorsurvivalofthegroupinthelongtimeperiod[15].

    5. Fair share of distributed resources

    Inourresearchwemainlyreflectedononeoffairnessjudgmentsidentifiedbysocialpsychology,namelydistributive fairness[17].Itisusuallyrelatedtothedistributionofresources,goodsorcosts,thustofair resource allocation problems.Resourceallocationproblemsareconcernedwiththedistributionofconstrainedresourceswithincompetingactivitiessoastoachievethebestgeneralimplementationofthesystemwithrespecttofairmanagementofalltheparticipants.Brieflyspeaking,theaimistotakeafair share of the distributed goods, thus to find such distribution that is perceived as fair by allindividuals.

    Theoverallresourceallocationproblemmightbestatedinthefollowingway.ThereisasetofI={1,2,...,m}ofmactivities.ThereisgivenasetPoflocationpatterns(locationdecisions).Foreachactivity i, i ∈ I,anonlinearfunction fi(x)of thelocationpatternx isdefined.Thisfunctionmeasurestheoutcomeyi = fi(x)ofthelocationpatternforeachactivityi.

    Togettheindividualsclosertoeachother(toobtainanagreementbetweenthem),therehavetooccursomechangesintheirinitialpreferences.Theoverallamountofchangesintheindividuals’preferencesconstitutes theresource.Thus, foragivensetofdecisionmakersmoderatorwantstoallocatefairlytheresourcewiththeobjectiveofminimizingtheoutcome(i.e.thedistancesbetweenindividuals).

  • 36

    Themaingoalofour system is to take intoaccountpreferencesofevery individualandgettheentiregroupclosertotheconsensuswithfairtreatmentofalltheparticipants.Weneglectthesituationwhenthemoderatorgetsdecisionmakersclosertotheconsensusbyargumentationandpersuasiononlyaimingatthemostpromisingdirectionsoffurtherdiscussion (thosewho reach consensus quickly), while individuals who are isolated intheiropinionareomitted.Wefoundconfirmationofthisassumptioninourobservations.Asweobservedgroupbehaviorintwoselectedgroupsthereweremanysituationswhenconsensusreachingprocesswasblockedandthedomineeringmembersof thegroup,aswellasthedominantsubgroupsexertingapowerfulinfluenceontheentiregroupcausedthat some individuals found themselves in a situation of a total isolation. One of theanalyzed groups (groupA)was likely to undergo all the distortions resulting from theself-interestprejudice.The interpersonal relations in theobservedgroupwerenotgoodenoughtoovercomeallthedifficultiescausedbythedifferencesinopinions.Someofthemembersofthegroupwereleftonthemarginandcouldnotfeelthesatisfactionensuingfromdiscussion.This,inturn,broughtaboutasituationoflittleeffectiveness.Allinall,the image of the groupwas negative, thereweremany lost opportunities of achievingconsensus.Brieflyspeaking,moderatorcannotignoretheindividualswhoareisolatedintheiropinionsastotherestofthegroupmembers,quiteonthecontrary,hehastoconvincethemtochangetheirpreviouspreferences.Thisattitudeundoubtedlyconfirmsoneofourassumptions,namely,animportanceofactiveparticipationofeveryindividualduringtheentireconsensusreachingprocess.

    Asweassumed,ourresearchshouldbedonewithrespecttofairdistribution.Thetheoryof distributive fairness can be applied whenever it is possible to precisely define a fairdistributionproblemandtofindasolutionthatisacceptedbyparticipants(orproposedbythemoderator).Ifweconsiderthedistancesoftheindividuals’opiniontothefinalopinion,naturally,thefinalopinionshouldbefairinthesensethatthedistancesoftheindividuals’preferencestothefinaldecisionshouldbefairlydistributed.

    6. Fair final decision

    Whileconsideringtheconceptoffairnesswithreferencetoconsensusreachingprocess,wedecidedtoviewthebasicideaofthisnotionfromtwopossibleperspectives.Thefirstone,presentedintheprevioussection,concernsafairdistributionofresources,whereasthesecond isdirectlyconnectedwith theoutcomeofdecisionmakingprocess,namelya fair final decision.

    Wedefineafair final decision asapossibilitytoreachafinalconsensusduringaseriesofdiscussions.However,themajorityhererefersdirectlytotheoutcomeandcanbedefinedasthesoft consensus, aconceptualhuman-consistentframeworkproposedbyKacprzykandFedrizzi[9,10],andZadrożny[4].Thedevelopedideaismeantbasicallyasanagreementofaconsiderablemajorityofindividualswithregardtoaconsiderablemajorityofalternatives.Thisoperationaldefinitionofconsensuscanbe,forinstance,expressedbyalinguisticallyquantifiedpreposition:most of the individuals agree in their preferences to almost all of the options,andtheconsensusdegree(intherange[0,1]) iscomputed.Itmeansthat,exceptnoneortotalagreementbetweenagentsastothechosensolution,thisapproachallowssomepartial,acceptableconsistency.

  • 37

    Noticethattodefineafuzzymajorityformeasuringadegreeofconsensustheapplicationoffuzzy linguistic quantifiers (most,almostalletc.)hasbeenperformed.ThecomputationsofthisrelativetypeoflinguisticquantitycanbehandledviaZadeh’sclassiccalculus[20].Regardlessofthewayofimplementation,themainconditionofthisnovelapproachisthatitdefinitelyovercomestheconventionalconceptinwhichconsensuswasunderstoodasa“fulland unanimous agreement”,whichmeans that the preferences of all the decisionmakersshouldbeexactly thesame.Obviously, thisscenario isutopianandunrealistic inpracticebecause individuals usually expose relevant differences in their standpoints, flexibility,tendencytochangeopinions,etc.Allofthesefactorsgenerallyblockthegroupfromgainingafullandunanimousagreement.

    7. Conclusions

    Inthisarticleweproposedanewconceptofsupportinggroupconsensusreachingprocessenhanced by the concept of “fairness”.We showed that this notion should be definitelytakenintoaccountwhilecreatinghuman-consistentsupportsystemsbecauseit isstronglyconnectedwithpsychology,economics,gametheory,etc.and,asaresult,takescognizanceofsocio-psychologicalaspectsofgroupbehavior.Infact,ithelpsusunderstandthetypicalhumanbehaviorwithinagroupofindividualsandtodevelopmoreintelligent,human-centricandhuman-consistentsystemsforsupportingconsensusreachinginthefuturedevelopment.

    InourresearchwecametoaconclusionthatadegreeofconsensusobtainedbyincludingaspectoffairnesswouldbehigherthanthepreviousapproachbasedsolelyonsoftconsensuswiththeuseoffuzzylogicproposedandsuccessfullyimplementedbyKacprzykanZadrożny[12].Weenrichedourconceptbythenovelfairnesscomponent.Hence,wetakelibertyofputtingforthahypothesisthattheconceptofnovelapproachaffectsdirectlytheeffectivenessofdecisionmakingprocess,satisfactionamonggroupmembersand,astheresult,thequalityofthefinaldecision,whichbecomeshighlyjustified.Theultimategoalofourfurtherresearchisthemathematicalformalizationofthefairgroupconsensusreachingprocess(buildingamodelwithregardtorealeventsandpsychologicalfactors)inordertoverify–confirmorreject–ourassumptions.

    Thus,ourcomprehensiveapproachproposedforsupportingconsensusreachingprocessunderfuzzinessrefersparticularlytoadegreeofagreementinagroup,individualpreferencesof group members and the moderation of the discussion in order to gain satisfactorysolutionsinthemoreeffectiveandefficientway.Whatmattershere,isthatwerespectthefair distribution in the sense of contribution of all decisionmembers to choose the finalsolution.Hence,thesituationwhenminoritymustobeymajorityandchangetheiropinionsaccordinglyisintheproposedsystemignored.

    Dominika Gołuńska’s contribution is partially supported by the Foundation for Polish Science under International PhD Projects in Intelligent Computing. Project financed from The European Union within the Innovative Economy Operational Programme 2007–2013 and European Regional Development Fund.

  • 38

    R e f e r e n c e s

    [1]Camerer C.F., Psychology and economics. Strategizing in the brain, Science, 300,2003,1673-1675.

    [2]CenterforExcellenceinGovernment,Facilitator’sToolbox,1996,http://www.employeesu.com/docs/EmployeeResources/Creating%20Consensus.pdf,[dateofaccess:23.03.13].

    [3]FalkiewiczD.,KacprzykJ.,Different Aspects of Supporting Group Consensus Reaching Process Under Fuzzines,TechnicalTransactions,vol.1-AC/2012,CracowUniversityofTechnologyPress,17-27.

    [4]Fedrizzi M., Kacprzyk J., Zadrożny S., An interactive multi-user decision support system for consensus reaching process using fuzzy logic with linguistic quantifiers,DecisionSupportSystems,vol.4,no.3,1988,313-327.

    [5]FehrE.,SchmidtK.,A Theory of Fairness, Competition and Cooperation,TheQuarterlyJournalofEconomics,MITPress,no.3,vol.114,1999,817-868.

    [6]HansonS.O.,Decision Theory: A Brief Introduction,Philosophy,vol.23,RoyalInstituteofTechnology,Stockholm,1994,1-94.

    [7]Jennings H.H., Sociometric differentiation of the psychegroup and sociogroup,Sociometry,vol.10,1947,71-79.

    [8]KacprzykJ.,Neuroeconomics: yet another field where rough sets can be useful?, ChacC.Cetal.(Eds.):RSCTC2008,LNCS,vol.5306,2008,1-12.

    [9]Kacprzyk J., FedrizziM.,A ‘human-consistent’ degree of consensus based on fuzzy logic with linguistic quantifiers,MathematicalSocialSciences,vol.18,1989,275-290.

    [10]KacprzykJ.,FedrizziM., A ‘soft’ measure of consensus in the setting of partial (fuzzy) preferences,EuropeanJournalofOperationalResearch,vol.34,1988,315-325.

    [11] KacprzykJ.,ZadrożnyS.,On a concept of a consensus reaching process support system based on the use of soft computing and Web techniques,[in:]D.Ruan,J.Montero,J.Lu,L.Martínez,P.D’hondt,E.E.Kerre(Eds.):ComputationalIntelligenceinDecisionandControl.WorldScientific,2008,859-864.

    [12]Kacprzyk J., Zadrożny S., Soft computing and Web intelligence for supporting consensus reaching,SoftComputing,vol.14,no.8,2010,833-846.

    [13]Kacprzyk J., Zadrożny S., Supporting Consensus Reaching Processes under Fuzzy Preferences and a Fuzzy Majority via Linguistic Summaries,PreferencesandDecisions StudiesinFuzzinessandSoftComputing,Springer,vol.257,2010,261-279.

    [14]Kacprzyk J., Zadrożny S., Raś Z., How to Support Consensus Reaching Using Action Rules: a Novel Approach, InternationalJournalofUncertainty,FuzzinessandKnowledge-BasedSystems,WorldScientific,vol.18,no.4,2010,451-470.

    [15] PriemR.L.,HarrisonD.A., MuirN.K.,Structured Conflict and Consensus Outcomes in Group Decision Making, JournalofManagement,1995,vol.21,no.4,691-710.

    [16]TurbanE.,AronsonJ.E.,LiangT.P.,Decision Support Systems and IntelligentSystems,6thEdition,PrenticeHall,11-19,2005,94-101.

    [17]TylerT.R.,SmithH.J.,The handbook of social psychology,vol.2, Social justice and social movements,McGraw-Hill,Boston,1998,595-629.

    [18]WierzbickiA.,Trust and Fairness in Open, Distributed Systems,Springer,2010,11-19,56-70.

    [19]YoungH.P.,Equity: In Theory and Practice,PrincetonUniversityPress,1994.[20]Zadeh, L, A computational approach to fuzzy quantifiers in natural languages,

    ComputersandMathematicswithApplications,no.9,1983,149-184.

  • * KrzysztofSchiff,Ph.D.,DepartmentofAutomaticControlandInformationTechnology,FacultyofElectricalandComputerEngineering,CracowUniversityofTechnology.

    KRZYSZTOFSCHIFF*

    ANTCOLONYOPTIMIZATIONALGORITHMFORTHESETCOVERINGPROBLEM

    ALGORYTMMRÓWKOWYDLAZAGADNIENIAPOKRYCIAZBIORU

    A b s t r a c t

    This articledescribes anewhybrid ant colonyoptimizationalgorithms for the set coveringproblem.Theproblemismodeledbymeansofabipartitegraph.Newheuristicpatterns,whichare used in order to choose a vertex to a created covering set have been incorporated intomodifiedhybridalgorithms.Resultsoftestsoninvestigatedalgorithmsarediscussed.

    Keywords: set covering problem, ant colony optimization, heuristic rule

    S t r e s z c z e n i e

    Wartykuleprzedstawiononowyhybrydowyalgorytmmrówkowydlaproblemuzagadnieniapokryciazbioruominimalnymkoszcie.Problemjestzamodelowanyzapomocągrafudwu-dzielnego.Wmodyfikowanymalgorytmiewprowadzononowąheurystykęwyboruwierzchoł-kówdopodzbioruwierzchołkówpokrywających.Opracowanyalgorytmprzetestowanoipo-równano,awynikitychbadańomówiono.

    Słowa kluczowe: zagadnienie pokrycia zbioru, algorytm mrówkowy, heurystyka

    TECHNICAL TRANSACTIONSAUTOMATIC CONTROL

    1-AC/2013

    CZASOPISMO TECHNICZNEAUTOMATYKA

  • 40

    1. Introduction

    Many practical optimization problems such as for example facility location problem,airlinecrewscheduling,nursescheduling,vehicleroutingandresourceallocationproblemcanbedescribedandmodeledas theSetCoveringProblem(SCP) [1,2,10–12]andmanyothercombinatorial problems can bemodeled in such a way. There aremany kinds of computeralgorithmsthathavebeendesignedsofartosolveSCPsuchasexactalgorithms[13,14],heuristicalgorithms[15–18],meta-heuristicsalgorithms[19–21],alsoalgorithms,whicharebasedonantcolonyoptimizationstrategy[3,6,22,23]andhybridofAntColonyOptimizationAlgorithm(ACO)withsocalledConstraintProgramming(CP)[26,27].Antalgorithmsweredesignedtosolvemanycombinatorialproblems[5,8,24,25]. Thispaperpresentsnewalgorithms,whicharebasedonanantcolonyoptimizationstrategy,forthesetcoveringproblemwithaminimumcoveringcostandanewheuristicinformationpatterns,whichhavebeenusedinthesealgorithms.TransitionprobabilityruleandpheromoneupdateruleandalsoamechanismcheckingconstraintsconsistencyareusedalltogetherinordertosolvetheSCPandtominimizethetotalcoveringcost.Theremainderofthispaperisstructuredasfollows:insection2theSCPisintroduced,insection3thestructureofACOalgorithmisdescribed,insection4pseudo-codeofantalgorithmswithnewheuristicpatternsandtransitionprobabilityrules,whichisusedinnewelaboratedantalgorithms,arediscussedandinsection5resultsoftheconductedcomputationalexperimentsonaspecialkindofagraphwithanalmostequaldensityandinsection6conclusionsarepresented.

    2. Set covering problem

    ThesetcoveringproblemcanbemodeledasabipartitegraphnetworkG(V1 + V2, E, w)withweightswijassignedtoedgeseij,suchthateij = (vi ∈ V1, vj ∈ V2), eij ∈ EasitisshowninFig.1andinthesamewayasitispresentedin[9].Thedegreeofvertexiisthesumofedgesadjacenttothisvertexi.Allverticesv1i canbegroupedintosubsetsinsuchawaythatallverticesfromthesetV2arecoveredbyverticesfromthesomesubsetVsofverticesV1.Avertexv2i iscoveredbyavertexv1jifanedgeeij existsbetweenavertexi andavertexjinabipartitegraph,forexamplesubsetVs ⊂ V1whichconsistsofverticesv11,v13,v14andv16 coversallverticesfromthesetV2andtheanotherexampleofsetVsissubsetwhichconsistsofverticesv14andv17.IngeneralifthecardinalitynumberofsetVsislowerthanthetotalsetcoveringcostishigher.InthesetcoveringproblemwithminimumthecardinalitynumberofsetVsandthecostthetotalcostofsetcovering,thismeansthesumofweightsassignedtoallgraphedges,whichareparticipatinginthesetcovering,hastobeminimized.

    TheobjectivefunctionFistofindacoversetwithaminimumcostandisdescribedin(1)and(2):

    min , ,F x w i V j Vij ijjn

    i

    n= ( ) ∈ ∈== ∑∑ 11 1 2 (1)

    andsubjecttoconstraintsuchthat:

    x i V j Viji Vsn

    ∈∑ ≥ ∈ ∈1 1 2, , (2)

  • 41

    where:xij = 1,whenavertexjiscoveredbyavertexi,xij = 0,whenavertexjisnotcoveredbyavertexi,wij–thisisaweightassociatedtoedgeeij (acostofcoveringavertexjbyavertexi),Vs – isasubsetofverticesV1whichcoverallverticesV2.

    Fig.1.Thesetcoveringproblemmodeledbyabipartitegraph

    Rys.1.Problempokryciazbiorumodelowanygrafemdwudzielnym

    3. Structure of ACO algorithm

    Inantalgorithmsacolonyofartificialantsislookingforagoodqualitysolutionoftheinvestigatedproblem.Thepseudo-codeofACOprocedureispresentedasalgorithm1.Eachartificialantconstructsanentire solutionof theproblem insomenumberof steps,calledintermediate solutions.Anyof intermediate solutionsare referred toas solution states. Ineachstepmofthealgorithmeachantk goesfromaonestateitoananotherstatejandthusconstructsanew intermediatesolutioncalled laterapartial solutionof theproblemsincetheentiresolutionisreceivedinsomenumberofstepsandateachofthesestepsthereisanintermediatesolutioncalledapartialsolutionorasolutionunderconstruction.Ateachstepeachantkcomputesasetoffeasibleexpansionstoitscurrentstateandmovestooneoftheseinprobability.Thissetoffeasibleexpansionsiscalledaneighborhoodofcurrentstate.Inpresentedalgorithmsconcerningprocessingofbipartitegraph,thismeansworkingonagraphmodelofthesetcoveringproblemateachstateeachantchoosesavertexfromthesetV1andaddsittoapartialsolutioninordertoconstructfinallyattheendofalgorithmactiontheentiresolutiontotheSCPproblem.AttheendofalgorithmactionthesetofverticesVsconstituteasolutiontotheSCPproblem.Eachantk startswithanemptysetVsandsuccessivelyaddstothissetVsavertexchosenoneaftertheotherfromthesetV1withprobabilityp

    kijmovingfrom

    aonetoanotherstate.Ateachstatei therearesomeverticesinthesetVsandtheseverticesfromthesetVsconstituteapartialsolutionofaproblematstepm,thismeansatstatem.Eachantinordertoconstructasolutionusescommoninformationwhichisencodedinpheromonetrails,thismeansthetraillevelofthemove,indicatinghowproficientithasbeeninthepasttomakethatparticularmove.Eachantalsodepositsapheromoneonatrailwhenasolutionhasbeenfoundandaquantityofthepheromonedepositeddependonaqualityofthissolution.Themoveofeachantalsodependsonsocalledtheattractivenessofthemove,ascomputedbysomeheuristicindicatingtheaprioridesirabilityofthatmove.Inordertoavoidaveryfast

  • 42

    convergencetoalocallyoptimalsolutionanevaporationmechanismisused,thismeansthatoverthetimethepheromonetrailevaporates,thusreducingitsattractivestrength.

    Algorithm1ACOprocedure

    begin while(existcycle)do while(existanyant,whichhasnotworked)do while(asolutionhasnotbeencompleted)do chooseanextvertextoaconstructedsolutionwithaprobabilitypkij ; updateneighborhoodofcurrentstate; end updateabestsolutionifabettersolutionhasbeenfound; end updateaglobalbestsolutionifabettersolutionhasbeenfound; useanevaporationmechanism; updateapheromonetrailsτ(i) = τ(i) + ∆τ; end end.

    Eachantkmovesfromonestateitoanotherstatejwithatransitionprobabilityrulepkij(t),whichisdescribedbytheformula:

    pj N

    j N

    ij

    ij ij

    ij ijj N

    i

    i

    =

    ( )( )

    ∈∑τ µ

    τ µ

    α β

    α β, for

    , for which not0 1

    (3)

    usingthepheromonetrailτijandtheattractivenessµijofthemove.Thepheromonetrailτij is theuseful information,which isdepositedbyothersants, for eachantduring itsworkon construction of solution, about the usage of vertex j in the past by others ants. TheattractivenessµijisadesireofchoosingavertexjfromtheneighborhoodNi ofcurrentstate when there isapartial solutionyetconstructed instate i and theattractivenessµij canbyexpressedbyasomeheuristicformula.Theattractivenessµijallowstobetterchooseasomevertexfromallvertices,fromtheneighborhoodNi ofcurrentstate,tobeaddedtoasolutionunderconstructiontakinganobjectivefunctionintoaconsideration.TheneighborhoodNi of state i is constituted by verticeswhich can be added to a constructed partial solution.At the start all vertices can be added to a partial solution of the problem, thismeans toasolutionofaproblemunderconstructionandthenumberoftheseverticesisreducednotonlybecauseoftheirinclusionintothesolution,whichisundertheconstruction,butalsobecausesomeoftheseverticescannotbeyetaddedtoasolution,whichisunderconstruction,since these vertices does not satisfied solution constraints and only these vertices can beadded to constructed partial solutionwhich still satisfied solution constraints.The partialsolutionoftheproblemisapartofsolutionandthepartialsolutionisasubsetofvertices,whichconstituteasolutionoftheproblem.Parametersαandβwhichisusedinthetransition

  • 43

    probabilityrulepkij(t)expressedby(3), indicateabout this,howimportant thepheromonetrailτijandtheattractivenessµijareduringtransitionfromonetoanotherstate.Valuesoftheseparametersαandβ shouldbesetbyexperimentandtunedtothesetcoveringproblemwithminimumcoveringcost.

    Afterasolutionhasbeenfoundeachantdepositsapheromonewithaquantity∆τonallvertices,whichconstitutethesolutionVs,inaccordancewiththepattern:

    τ τ τij ijt t( ) ( )= + ∆ (4)

    Thus these vertices which were included into a solution have received an a