<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6447849434617576469</id><updated>2012-02-16T12:44:16.271-08:00</updated><category term='matplotlib'/><category term='manifesto'/><category term='codecereal'/><category term='gsoc'/><category term='operator'/><category term='cryptography'/><category term='javascript'/><category term='ai'/><category term='qtquick'/><category term='cache'/><category term='decorators'/><category term='diagnosticar'/><category term='hash'/><category term='dynamic programming'/><category term='pyside'/><category term='plasma'/><category term='cin'/><category term='prolog'/><category term='classification'/><category term='wsl'/><category term='shell'/><category term='python'/><category term='cinlug'/><category term='portugues'/><category term='cpg'/><category term='greasemonkey'/><category term='database'/><category term='brasil'/><category term='qml'/><category term='aes'/><category term='logic'/><category term='security'/><category term='markov'/><category term='cypher'/><category term='abduction'/><category term='ghmm'/><category term='qtcomponents'/><category term='reasoning'/><category term='mongodb'/><category term='kde'/><category term='dna'/><category term='c'/><category term='protein'/><category term='bio'/><category term='swi'/><category term='twitter'/><category term='ufpe'/><category term='maximum likelihood'/><category term='slide'/><category term='qt'/><category term='baum welch'/><category term='hmm'/><category term='citi'/><category term='sha1'/><category term='epassport'/><category term='profile'/><title type='text'>CodeCereal</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>21</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-2567818768240711885</id><published>2012-01-04T03:37:00.000-08:00</published><updated>2012-01-04T03:37:01.844-08:00</updated><title type='text'>Cropping mp3 files with FFmpeg</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.madaar.net/images/ffmpeg-logo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="78" src="http://www.madaar.net/images/ffmpeg-logo.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Today, I was trying to find a free app for cropping a mp3 sound file. And I found a simple one with CLI. The &lt;a href="http://ffmpeg.org/"&gt;FFmpeg&lt;/a&gt; is a multimedia files handler and it is pretty complex. But to do this task we will use the following parameters:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;-t &lt;seconds&gt; &lt;/seconds&gt;&lt;/b&gt;chop after specified number of seconds&lt;/li&gt;&lt;li&gt;&lt;b&gt;-ss &lt;seconds&gt; &lt;/seconds&gt;&lt;/b&gt;chop until specified number of seconds &lt;/li&gt;&lt;li&gt;&lt;b&gt;-acodec copy &lt;/b&gt;to maintain encoding and sampling rate&lt;/li&gt;&lt;li&gt;&lt;b&gt;-i &lt;file&gt;&lt;/file&gt;&lt;/b&gt; use file as input file&lt;/li&gt;&lt;/ul&gt;And the final command was something like this:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;ffmpeg -acodec copy -y -t $start_at -ss $ends-at -i $inputfile.mp3 $outputfile.mp3&lt;br /&gt;&lt;/code&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-2567818768240711885?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/2567818768240711885/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2012/01/cropping-mp3-files-with-ffmpeg.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/2567818768240711885'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/2567818768240711885'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2012/01/cropping-mp3-files-with-ffmpeg.html' title='Cropping mp3 files with FFmpeg'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-5205293191580317411</id><published>2011-10-06T18:37:00.000-07:00</published><updated>2011-10-06T18:37:53.700-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='codecereal'/><category scheme='http://www.blogger.com/atom/ns#' term='manifesto'/><category scheme='http://www.blogger.com/atom/ns#' term='portugues'/><category scheme='http://www.blogger.com/atom/ns#' term='brasil'/><title type='text'>Manifesto CodeCereal</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" height="256" src="http://1.bp.blogspot.com/-ycXK5utiDyw/To4yBy-ungI/AAAAAAAAAEo/BI9oK6G4YjY/s400/cereal.jpg" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Venho  recebendo diversas críticas, de várias pessoas, que deixam de ler meus  posts porquê são em inglês, além de tratar de "assuntos complicados". O  fato é que eu estou deixando de divulgar muita coisa que faço e está em  português e não tenho tempo para fazer a tradução de tudo, devido meu  recurso finito de tempo. Também gosto de escrever em inglês como forma  de exercício, além de dar uma maior dispersão dos posts. Portanto  decidí tomara a decisão de tornar este blog híbrido onde quando possível  farei posts em inglês, mas aproveitarei o meu material escrito em  português, que caso contrário ficaria arquivado. Também continuarei a compartilhar o que for relativo a inteligência artificial no &lt;a href="http://aimotion.blogspot.com/"&gt;AIMotion&lt;/a&gt; e começarei a divulgar os posts relativos ao Qt aqui e no &lt;a href="http://blog.qtlabs.org.br/"&gt;Qt Labs Brasil&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Espero que dê certo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-5205293191580317411?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/5205293191580317411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/10/manifesto-codecereal.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/5205293191580317411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/5205293191580317411'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/10/manifesto-codecereal.html' title='Manifesto CodeCereal'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-ycXK5utiDyw/To4yBy-ungI/AAAAAAAAAEo/BI9oK6G4YjY/s72-c/cereal.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-8212717900644766739</id><published>2011-07-18T13:13:00.000-07:00</published><updated>2011-07-18T13:13:17.179-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gsoc'/><category scheme='http://www.blogger.com/atom/ns#' term='qt'/><category scheme='http://www.blogger.com/atom/ns#' term='kde'/><category scheme='http://www.blogger.com/atom/ns#' term='qtcomponents'/><category scheme='http://www.blogger.com/atom/ns#' term='plasma'/><title type='text'>Back to Plasma Components</title><content type='html'>&lt;div style="text-align: justify;"&gt;This last month I've been working slowly on my GSoC project due to the university activities (due to Brazilian academic calendar). And my thesis, which I shall talk in other post. But at least now I'm undergraduated!&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;As I explained in my last post about&amp;nbsp;&lt;a href="http://codecereal.blogspot.com/2011/05/plasma-components.html"&gt;Plasma Components&lt;/a&gt;, my GSoC project. I'm building graphics components for developers to build plasmoids in QML using non trivial common components such as Sliders, ScrollBars and RadioButtons. After the break I've done these components following the &lt;a href="http://bugreports.qt.nokia.com/browse/QTCOMPONENTS-200"&gt;Qt Components common API&lt;/a&gt;:&lt;/div&gt;&lt;ul&gt;&lt;li style="text-align: justify;"&gt;Switch&lt;/li&gt;&lt;li&gt; ButtonRow&lt;/li&gt;&lt;li&gt;ButtonColumn&lt;/li&gt;&lt;li&gt;ScrollBar&lt;/li&gt;&lt;li&gt; ScrollDecorator&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;Keyboard event handling and focus policy for the new and old components were added in this sprint. I also spent a lot of time refactoring some components. I think their code is much better now.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;I'm also adding more complex use cases at the components gallery (at kde-runtime/plasma/declarativeimports/test/gallery). By the way, this is screenshot of the new gallery:&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-M0oJY2T1HMU/TiSSCr_ESyI/AAAAAAAAAEg/wn6fXtnFeXg/s1600/gsoc-gallery.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="215" src="http://3.bp.blogspot.com/-M0oJY2T1HMU/TiSSCr_ESyI/AAAAAAAAAEg/wn6fXtnFeXg/s400/gsoc-gallery.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Plasma Components Gallery&lt;/td&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Part of the work was just straightforward, but there are some doubts I would like to ask which you think is the best, because some of the components behaviour are not defined.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;/div&gt;&lt;ol&gt;&lt;li&gt;Should ScrollDecorator appear only when it's flickableItem is flicked, like in the mobile use case?&lt;/li&gt;&lt;li&gt;ScrollDecorator must should look like ScrollBar or have it's own appearence?&lt;/li&gt;&lt;li&gt;There are no SVG graphics for CheckBox, RadioButton and Switch. Currently there are placeholders. What can I do?&lt;/li&gt;&lt;li&gt;Currently, when you click a component, it gains the focus. This logic must be in the components as it is? Or should left it external to the button.&lt;/li&gt;&lt;li&gt;The Qt Components doesn't define any enabled property, for any components. I think it's important to have such a property in all interactive components. What do you think about it?&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Any other tip/suggestion is highly appreciated.&lt;br /&gt;&lt;br /&gt;I expect to give other update as soon as I have something more to report.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-8212717900644766739?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/8212717900644766739/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/07/back-to-plasma-components.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8212717900644766739'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8212717900644766739'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/07/back-to-plasma-components.html' title='Back to Plasma Components'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-M0oJY2T1HMU/TiSSCr_ESyI/AAAAAAAAAEg/wn6fXtnFeXg/s72-c/gsoc-gallery.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-795152447697714197</id><published>2011-07-15T11:10:00.000-07:00</published><updated>2011-07-15T11:10:36.097-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pyside'/><category scheme='http://www.blogger.com/atom/ns#' term='slide'/><category scheme='http://www.blogger.com/atom/ns#' term='citi'/><category scheme='http://www.blogger.com/atom/ns#' term='qt'/><title type='text'>PySide on CITi</title><content type='html'>Here is the &lt;a href="http://www.pyside.org/"&gt;PySide&lt;/a&gt; slides from the python course in &lt;a href="http://www.citi.org.br/"&gt;CITi&lt;/a&gt;. Have fun!&lt;br /&gt;&lt;br /&gt;&lt;div id="__ss_8607466" style="width: 425px;"&gt;&lt;b style="display: block; margin: 12px 0 4px;"&gt;&lt;a href="http://www.slideshare.net/dakerfp/citi-pyside" title="CITi - PySide"&gt;CITi - PySide&lt;/a&gt;&lt;/b&gt;&lt;object height="355" id="__sse8607466" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=aulapyside-110715130719-phpapp01&amp;stripped_title=citi-pyside&amp;userName=dakerfp" /&gt;&lt;param name="allowFullScreen" value="true"/&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;embed name="__sse8607466" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=aulapyside-110715130719-phpapp01&amp;stripped_title=citi-pyside&amp;userName=dakerfp" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;div style="padding: 5px 0 12px;"&gt;View more &lt;a href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/dakerfp"&gt;Daker Fernandes&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-795152447697714197?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/795152447697714197/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/07/pyside-on-citi.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/795152447697714197'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/795152447697714197'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/07/pyside-on-citi.html' title='PySide on CITi'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-1529468622511071178</id><published>2011-07-07T15:14:00.000-07:00</published><updated>2011-07-07T16:02:31.464-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ai'/><category scheme='http://www.blogger.com/atom/ns#' term='protein'/><category scheme='http://www.blogger.com/atom/ns#' term='profile'/><category scheme='http://www.blogger.com/atom/ns#' term='hmm'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='ghmm'/><category scheme='http://www.blogger.com/atom/ns#' term='classification'/><category scheme='http://www.blogger.com/atom/ns#' term='markov'/><category scheme='http://www.blogger.com/atom/ns#' term='bio'/><title type='text'>Protein Profile with HMM</title><content type='html'>&lt;div style="text-align: justify;"&gt;&lt;a href="https://duckduckgo.com/k/?u=https%3A%2F%2Fsecure.wikimedia.org%2Fwikipedia%2Fen%2Fwiki%2FProfiling_practices"&gt;Profiling&lt;/a&gt; is the action of summarizing a set of data in a smaller mathematical model. One of the practical usages of profiling techniques is the classification of sequences. With a data set profile, you may calculate the distance of an instance to the model, and classify the instance trough this value.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Profiling proteins is a more specific case of profiling sequences. As we know from the previous ṕost about &lt;a href="http://codecereal.blogspot.com/2011/05/hidden-markov-models.html"&gt;Hidden Markov Models&lt;/a&gt; (HMMs) is a very robust mathematical model to represent probabilistically sequences. This post will detail how to profile sets of proteins with a HMM model.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The HMM model for this problem must be a HMM which when we calculate a probability of generation of a protein by the model, must give higher probabilities for proteins similar for the ones used to create the profile against the others.&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;To build the model, I've started with a &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Multiple_sequence_alignment"&gt;multiple sequence alignment&lt;/a&gt; to see how is the structure of the sequences. That means that our HMM will be a probabilistic representation of a multiple alignment. With the alignment we see that some columns are complete and some are almost, and the others have few data. These most common matches can be used as a common matches for our model and the deletions and insertions can be modelled as other states. Here is an example of a few sequences aligned and the core columns of the alignment (marked with an *):&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;code&gt;A C - - - - A T G&lt;br /&gt;T C A A C T A T C&lt;br /&gt;A C A C - - A G C&lt;br /&gt;A G A - - - A T C&lt;br /&gt;A C C G - - A T C&lt;br /&gt;* * * &amp;nbsp; &amp;nbsp; &amp;nbsp;       * * *&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;With this sample of alignment we have examples of insertions and deletions between the core alignments, which may have a state on the HMM to represent them. Note that insertions may occur on arbitrary times between the matching states. And the deletion states always replaces some matching. One possible HMM template for building the model is presented in the following picture:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: left;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-U_L5ppibdy8/ThYPmKivksI/AAAAAAAAAEQ/m4IxdghiG5c/s1600/profile_hmm.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="157" src="http://2.bp.blogspot.com/-U_L5ppibdy8/ThYPmKivksI/AAAAAAAAAEQ/m4IxdghiG5c/s400/profile_hmm.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;HMM protein profile model&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;Each matching state (M&lt;sub&gt;j&lt;/sub&gt;) is related to the matching on the j&lt;sup&gt;th&lt;/sup&gt; core alignment. The same applies for deletion state (D&lt;sub&gt;j&lt;/sub&gt;). The insertion is slightly different, because it represents the insertions between the core alignments, that's why it has one extra state, and this enable to represent states before the first and after the last alignment.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Our model will be like the one in the picture with the same length as the core alignments of a multiple alignment for a given set of sequences. However we should use &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Maximum_likelihood"&gt;maximum likelihood&lt;/a&gt; to estimate the transitions and emission for each state. The easiest way to count the total emissions/transitions, is to thread each sequence to be profiled in the model. For each symbol in the aligned sequence you must check if the symbol is in the core alignment. If it is, then increment the count of that state to the next match state, otherwise, you go to the insertion state. If it is a deletion, go to the deletion state and increment the transition. Finally, to calculate the probability for the transitions, just divide the count of each transition and divide it by all the states leaving the same state. It's important to notice that we have a stopping state, and this is a special state that has no transitions.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Note that is important to initialize the model with pseudo counts at each possible transitions. Adding this pseudo count, we let our model less rigid and avoid overfitting to the train set. Otherwise we could have some zero probabilities for some sequences.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To create the emissions probabilities, at each match state, you also have to count which symbol was emitted and then increment them. To calculate the probability, just divide it by the total symbols matched in the threading process.&amp;nbsp;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The similar process could be done with the insertion. However, the insertion states are characterized by having a low occurrence rate and this may lead us directly to a overfitting problem because the number of emissions must be too small. To avoid this we should use the background distribution as the emissions probabilities of each insertion state. The background distribution is the probability of the occurrence of a  given amino acid in the entire protein set. To calculate this, count each  amino acid type for all the train set sequence and then divided by the total count.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;For the deletion states, it's important to notice that it is a silent state. It doesn't emit any symbol at all. To signalize it in ghmm, just let all the emissions probabilities with 0. Note that the end state is also a silent state once we have no emission associated to it.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span style="color: red;"&gt;ALERT:&lt;/span&gt; the loglikelihood is the only function in the ghmm library which handles silent states correctly.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;At this point, the model is ready for use. However we have a problem of how to classify a new protein sequence. A threshold could be used to divide the two classes. However, the most common alternative is to use a null model to compare with it. The null model is a model which aims represents any protein with similar probability as any other. With this two models we could take a sequence and compare if it is more similar to a general definition of a protein or to a specific family. This model should model a sequence with average size equal to the aligned sequences being handled, and should emit any kind of symbol at each position. A common alternative for creating the null model, is using a single insertion state, which goes to a stopping state with probability of 1 divided by the average length of&amp;nbsp; sequences in the train set. For the emissions probability, we should use the background distribution, because this is related to the general amino acid distribution. At the end, the model should be like this:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: left;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-G3QMK2udvhY/ThYhT6KSdtI/AAAAAAAAAEU/SiWmZ4a4-yE/s1600/null_model.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="182" src="http://3.bp.blogspot.com/-G3QMK2udvhY/ThYhT6KSdtI/AAAAAAAAAEU/SiWmZ4a4-yE/s400/null_model.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Null Model&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;For testing the proposed model, I used a set of 100 &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Globin"&gt;globin&lt;/a&gt; proteins from the &lt;a href="http://www.ncbi.nlm.nih.gov/"&gt;NCBI&lt;/a&gt; protein repository as a train set to build a profile model, and used &lt;a href="http://ghmm.sourceforge.net/"&gt;ghmm&lt;/a&gt; to build and test the model.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To test if our model corresponds to our expectations, 100 globins different from the ones in the trains set were used with 1800 other random proteins with similar length. The loglikelihood function from the ghmm library to calculate the similarity index. The classification of the globins versus non globins, was a comparison between the likelihood of the protein with the globins profile hmm, and the null model. This test gave us 100% of accuracy! To display this result graph, each sequence was pointed out in a graph where the horizontal axis displays the length of the sequence and the vertical the log of globins / null models likelihood (or globins - null model loglikelihood). The globins are plotted in red an the others in blue. Proteins over the zero line are classified as globins, and below means they aren't globins. The graphs show us a clear distinction between the classes, and that our model is very precise for this problem.&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-CJnHWh4Ph-g/ThYjznzwy0I/AAAAAAAAAEY/Z9AbmK3hjOo/s1600/diff_profile.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="300" src="http://4.bp.blogspot.com/-CJnHWh4Ph-g/ThYjznzwy0I/AAAAAAAAAEY/Z9AbmK3hjOo/s400/diff_profile.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Length x log(Globin's Profile Model Likelihood/Null Model Likelihood)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-1529468622511071178?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/1529468622511071178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/07/protein-profile-with-hmm.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1529468622511071178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1529468622511071178'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/07/protein-profile-with-hmm.html' title='Protein Profile with HMM'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-U_L5ppibdy8/ThYPmKivksI/AAAAAAAAAEQ/m4IxdghiG5c/s72-c/profile_hmm.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-3901326781520091962</id><published>2011-06-22T16:48:00.000-07:00</published><updated>2011-06-22T16:51:04.035-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='javascript'/><category scheme='http://www.blogger.com/atom/ns#' term='sha1'/><category scheme='http://www.blogger.com/atom/ns#' term='cryptography'/><category scheme='http://www.blogger.com/atom/ns#' term='greasemonkey'/><category scheme='http://www.blogger.com/atom/ns#' term='hash'/><category scheme='http://www.blogger.com/atom/ns#' term='cypher'/><category scheme='http://www.blogger.com/atom/ns#' term='twitter'/><category scheme='http://www.blogger.com/atom/ns#' term='security'/><category scheme='http://www.blogger.com/atom/ns#' term='aes'/><title type='text'>Encrypted Tweets</title><content type='html'>&lt;div style="text-align: justify;"&gt;As a cryptography course project, me and &lt;a href="http://twitter.com/laissandrade"&gt;@laisandrade&lt;/a&gt; developed a twitter extension which allows us to tweet in a way that only a group of followers can decrypt and understand my tweet. The goal is to secure broadcasting tweets to a group of followers and avoid anyone else to understand it's content. This definition of security is safe against a &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Eavesdropping"&gt;eavesdropping attack&lt;/a&gt;, which is considered secure enough for the common sense for cryptography, what is not true. Another concerns emerges when we consider other attacks or ways of extracting informations from the cyphertext, as an example the &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Known-plaintext_attack"&gt;Known-Plaintext Attack&lt;/a&gt;.&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-jwhEXN9bnRk/TgG1xvPi7KI/AAAAAAAAAD0/MElSjV1dYLM/s1600/scheme.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="496" src="http://3.bp.blogspot.com/-jwhEXN9bnRk/TgG1xvPi7KI/AAAAAAAAAD0/MElSjV1dYLM/s640/scheme.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Configuration of the Encrypted Tweets problem - Alice securely broadcasting messages to a specific group&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;In this project we should also consider other security issues such as forgery attacks and secure information storage. The only real world problem outside the project's scope was the private key distribution. We assume that the user who wants to share it's keys to his followers could use other mechanisms such as a Diffie-Hellman key exchange or using a trusted key broker. The full project's specification can be found &lt;a href="http://crypto.stanford.edu/%7Edabo/cs255/hw_and_proj/proj1.pdf"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;For doing the task we used the &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/"&gt;Grease Monkey&lt;/a&gt; Firefox's plugin. This plugin allow us to run custom javascript scripts in specifics web pages, so we can add functionalities to known web pages. In fact, we used a &lt;a href="http://crypto.stanford.edu/%7Edabo/cs255/hw_and_proj/cs255.user.js"&gt;script stub&lt;/a&gt; for doing it. This script adds a encrypt tweet functionality as well as a key manager at password settings page. It also gives a function to store (unsafely) the user information. Was also our job to make it secure. We assumed that the script couldn't be cracked, the focus where we were concerned was about the information safety outside the script execution memory.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Qdk5ZpgWxFk/TgFCju_gjWI/AAAAAAAAADs/LmNMRFR0ziU/s1600/twitter.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="153" src="http://2.bp.blogspot.com/-Qdk5ZpgWxFk/TgFCju_gjWI/AAAAAAAAADs/LmNMRFR0ziU/s400/twitter.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Encrypt button and Group selection close to Tweet butto&lt;code&gt;n&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;code&gt;&lt;br /&gt;============================== Encryption Specification ==============================&lt;br /&gt;&lt;br /&gt;Encripted Message:&lt;br /&gt;+--------+-----+--------------------------------------+&lt;br /&gt;| "aes:" | Ctr | Extended-AES(Content, Ctr, GroupKey) |&lt;br /&gt;&lt;/code&gt;&lt;code&gt;+--------&lt;/code&gt;&lt;code&gt;+-----+--------------------------------------+&lt;br /&gt;* The Ctr is the base counter for the CTR block chaining cypher. Every message picks a random Ctr.&lt;br /&gt;* The group key is the key to the group whom members are the only who should understand the message.&lt;br /&gt;* The Extended-AES function will be detailed after.&lt;/code&gt;&lt;br /&gt;&lt;code&gt;* the "aes:" indicator is not a security threat due to&amp;nbsp;&lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Kerckhoffs%27s_Principle"&gt;Kerckhoffs's Principle&lt;/a&gt;.&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt; Content:&lt;br /&gt;+---------------+---------+&lt;br /&gt;| sha1(Message) | Message |&lt;br /&gt;+---------------+---------+&lt;br /&gt;* The sha1 cryptographic hash is used to check the integrity of the stored keys and avoid modifications on the massage bypass as a real message.&lt;br /&gt;* The sha1 produces a 160 bits length message, which is convertes into a base64 string to send it over twitter.&lt;/code&gt;&lt;br /&gt;&lt;code&gt;* This &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Message_authentication_code"&gt;Message Authentication Code&lt;/a&gt; (MAC) is necessary for &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Chosen-ciphertext_attack"&gt;Chosen Ciphertext Security&lt;/a&gt; (CCA Security).&lt;br /&gt;&lt;br /&gt;Message: the original message&lt;br /&gt;======================================================================================&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;For our Extended-AES implementation, we used a &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Cipher_block_chaining"&gt;Cypher Block Chaining&lt;/a&gt; to extend the &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Advanced_Encryption_Standard"&gt;AES&lt;/a&gt;  block cypher, that we use as a &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Block_cypher"&gt;block cypher&lt;/a&gt;. To do this we used a Counter (CTR) chaining to extend the  cypher. Note that the CTR mode uses a base counter (Ctr) as an &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Initialization_vector"&gt;Initialization Vector&lt;/a&gt; (IV) which also must be transmitted to enable to be decrypted. This IV is chosen at random by our &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Pseudorandom_generator"&gt;pseudorandom generator&lt;/a&gt;. If the IV is not truly random, an attacker may predict the IV sequence and get more information from the message. A real problem faced by pseudorandom generators are when the seed selection are not truly random. To handle this, the script stub creates a seed using the mouse positions at the first seconds of running (an entropy test is also used to check the randomness of the seed).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-0hINPdoLd04/TgHHRz3Z0FI/AAAAAAAAAD4/eSwDbHUVknc/s1600/ctr_encryption.png" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="160" src="http://1.bp.blogspot.com/-0hINPdoLd04/TgHHRz3Z0FI/AAAAAAAAAD4/eSwDbHUVknc/s400/ctr_encryption.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;When someone receives a "aes:" tagged message, the script checks the author of the message and get the key provided by him. To decrypt, we also use the Extended-AES since it is symmetric (due to XOR reversibility property). At last the SHA-1 hash is confronted with the message to check if the key is correct or the message suffered a forgery attempt.&lt;br /&gt;&lt;br /&gt;To store the groups keys we used the grease monkey's key value storage (GM_setValue and GM_getValue functions). We store and recover the values from the key &lt;code&gt;"twit-key-" + login&lt;/code&gt;, which is stored as plaintext in the hard drive. To avoid the stealth of this information from other users, virus or worm we added an encryption layer over it. The stored data representation developed was this one:&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-jGfA8gy_YW4/TgFDmT2K-8I/AAAAAAAAADw/mIk7kHAkEh4/s1600/twitter2.png" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="265" src="http://4.bp.blogspot.com/-jGfA8gy_YW4/TgFDmT2K-8I/AAAAAAAAADw/mIk7kHAkEh4/s400/twitter2.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Keys management screen&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;code&gt;&lt;br /&gt;==============================&amp;nbsp;&amp;nbsp;   Safe Storaging Specification &amp;nbsp; ==============================&lt;br /&gt;&lt;br /&gt;Stored Data:&lt;br /&gt;AES(Data, MasterKey)&lt;br /&gt;&lt;br /&gt;MasterKey:&lt;br /&gt;User secret key to access all the other keys. This key requires user input every time it logs in. &lt;br /&gt;&lt;br /&gt;Data:&lt;br /&gt;+---------------------+-----------------+&lt;br /&gt;|&amp;nbsp;&amp;nbsp; sha1(GroupKeys)&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp; Groups Keys&amp;nbsp;&amp;nbsp; |&lt;br /&gt;+---------------------+-----------------+&lt;br /&gt;* This cryptographic hash is used to check the integrity of the stored keys.&lt;br /&gt;* A malicious agent may override the values on the keys and may set the keys he can decrypt.&lt;br /&gt;&lt;br /&gt;Groups Keys:&lt;br /&gt;+----------+----------+----------+----------+---------+----------+----------+&lt;br /&gt;|&amp;nbsp;&amp;nbsp; Item &amp;nbsp; |&amp;nbsp;&amp;nbsp; "$$" &amp;nbsp; |&amp;nbsp;&amp;nbsp; Item &amp;nbsp; | &amp;nbsp; "$$" &amp;nbsp; |&amp;nbsp;&amp;nbsp; ... &amp;nbsp; | &amp;nbsp; "$$" &amp;nbsp; | &amp;nbsp; Item &amp;nbsp; |&lt;br /&gt;+----------+----------+----------+----------+---------+----------+----------+&lt;br /&gt;&lt;br /&gt;Item:&lt;br /&gt;+----------+---------+-----------+---------+---------+&lt;br /&gt;|&amp;nbsp;&amp;nbsp; User &amp;nbsp; |&amp;nbsp;&amp;nbsp; "$"&amp;nbsp;&amp;nbsp; | &amp;nbsp; Group&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp; "$"&amp;nbsp;&amp;nbsp; | &amp;nbsp; Key&amp;nbsp;&amp;nbsp; |&lt;br /&gt;+----------+---------+-----------+---------+---------+&lt;br /&gt;&lt;br /&gt;User: base64 string&lt;br /&gt;Group: base64 string&lt;br /&gt;Key: base64 string&lt;br /&gt;&lt;br /&gt;==============================================================================================&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;It's important to notice that we implemented the &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Sha1"&gt;SHA-1&lt;/a&gt; hashing algorithm to ensure the data wasn't modified while stored or transmitted. We trust it wasn't modified due to the difficulty to crack the AES cypher to forge a new MAC. And if the attacker try to change bits from the stored data, we trust that SHA-1 hash collision difficulty will be enough to ensure us that someone will be able to produce valid message, hash pairs without looking at the plaintext. If the pair is corrupted, then the user is alerted that someone modified his keys. By doing this, we may disable an attacker to modify bits from our keys and have better chances to crack the scheme. The SHA-1 also has some theoretical weakness, however, in practice it still is a very strong cryptographic hash function.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-3901326781520091962?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/3901326781520091962/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/06/encrypted-tweets.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/3901326781520091962'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/3901326781520091962'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/06/encrypted-tweets.html' title='Encrypted Tweets'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-jwhEXN9bnRk/TgG1xvPi7KI/AAAAAAAAAD0/MElSjV1dYLM/s72-c/scheme.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-4433850595377808139</id><published>2011-06-15T14:13:00.000-07:00</published><updated>2011-06-21T17:39:14.380-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='epassport'/><title type='text'>The ePassport Standard and it's Cryptographic Scheme</title><content type='html'>&lt;div style="text-align: justify;"&gt;ePassport is the name given for the new passport specification standardized by the &lt;a href="http://www.icao.int/"&gt;International Civil Aviation Organization&lt;/a&gt; (ICAO). It was created to speed up  and standardize the immigration procedures. This standard specifies &lt;a href="http://en.wikipedia.org/wiki/Radio-frequency_identification"&gt;RFID&lt;/a&gt; as the communication channel between the passport and the reader. The main feature present on it is the insertion of biometric data into the passport for biometric checks, including facial recognition, fingerprint and iris information. To keep this important data protected, an interesting cryptographic layer was added in addition to the usual document falsification mechanisms such as watermarks.&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://www.auriverde.am.br/site/img/upload/noticias/passaporte-com-chip-ja-e-emitido-em-todo-o-brasil1297706262.jpg" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="320" src="http://www.auriverde.am.br/site/img/upload/noticias/passaporte-com-chip-ja-e-emitido-em-todo-o-brasil1297706262.jpg" width="232" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Brazilian ePassport. One of few which has full compatibility with the ICAO specification.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;The name, nationality, birth date and other personal information as well as the biometric informations are held on a contactless chip inside the passport. This set of information are securely handled and is called the &lt;b&gt;Machine Readable Zone&lt;/b&gt; (MRZ) and the chip asserts their security. These informations are transmitted to a ePassport reader at immigration process. As the embedded chip has low processing power, the biometric data is transmitted to a reader so it can be checked against the porter of the passport.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-pQaH6Psj-3k/TfkgFxRaQUI/AAAAAAAAADo/EIFaz2ZEmrA/s1600/epass.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="640" src="http://3.bp.blogspot.com/-pQaH6Psj-3k/TfkgFxRaQUI/AAAAAAAAADo/EIFaz2ZEmrA/s640/epass.jpg" width="538" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Anatomy of an ePassport&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;As the physical communication layer of the ePassport broadcasts the messages, even with the low range (less than a meter) RFID communication. But as we know an eavesdropper can be very determined to listen the communication ;-). I also has an electromagnetic shield on the passport cover, however it doesn't assure to block the communication between the chip and the outside world when it's closed, however it's an extra obstacle. Another issue to prevent is the passport and digital data forgery. Otherwise a falsary could get someone else passport and insert it's biometrics information and easily bypass the immigration. As we may see, the ePassport is a complete meal for cryptography specialists.&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Starring at this cryptographic layer we have:&amp;nbsp;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;A &lt;b&gt;unique id&lt;/b&gt; used as a private key. This key is hardcoded in the chip manufacturing process and it's non traceable. Doing &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Reverse_engineering#Reverse_engineering_of_integrated_circuits/smart_cards"&gt;reverse engineering&lt;/a&gt; on this kind of chip it's really difficult because the equipment to do it is extremely expensive and hard to obtain, it is only available to large chip manufacturers.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;An access control to prevent an attacker to force the ePassport to send information in the MRZ to readers unless the passport owner authorizes it. Otherwise an attacker could use a reader and simply request the biometric information to the passport. It uses a protocol called &lt;b&gt;Basic Access Control (BAC)&lt;/b&gt; to give access for the reader. It consists on a &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition"&gt;OCR&lt;/a&gt; readable code inside the passport. With this code, the reader can say to the passport it is an authorized reader, and then transmit data to it. The BAC code is also used as key for the encrypted transmission of the data, making an eavesdropping attack more difficult. It's important to note that the BAC security entirely relies on the physical access to the passport. The BAC is inside the ICAO standard but doesn't obligates it's support.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-WTteakM_i-I/TfkWHWWC86I/AAAAAAAAADY/8sQYhZSYlE8/s1600/bac.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="381" src="http://2.bp.blogspot.com/-WTteakM_i-I/TfkWHWWC86I/AAAAAAAAADY/8sQYhZSYlE8/s400/bac.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;BAC happy path's sequence diagram&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The ICAO also standardizes a authentication for readers called &lt;b&gt;Extended Access Control   (EAC)&lt;/b&gt;. It consists of sets of signatures of terminals stored in the passport's chip. These signatures are changed periodically by the countries (which signs them) on the terminals to avoid keys stolen from equipments being used for much time. Some MRZ sensitive data, such as iris and fingerprint, are just transmitted to readers if it is EAC authenticated. Other informations are optional.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-JdHDUqrcNZE/TfkbvWBNgBI/AAAAAAAAADk/2nHAvURFOzQ/s1600/eac.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="263" src="http://3.bp.blogspot.com/-JdHDUqrcNZE/TfkbvWBNgBI/AAAAAAAAADk/2nHAvURFOzQ/s400/eac.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;EAC happy path's sequence diagram&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;ul style="text-align: justify;"&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;The &lt;b&gt;Passive Authentication (PA)&lt;/b&gt; is a cryptographic mechanism&amp;nbsp; to ensure the data in the passport is valid by the authority emitting the passports. The PA is a kind of digital signature where the country who emits passports ensures the veracity of the data with a asymmetric Country Verification Certificate Authority (CVCA), a kind of &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Message_authentication_code"&gt;message authentication code&lt;/a&gt; (MAC). This process is known as &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Asymmetric_cryptography"&gt;asymmetric cryptographic&lt;/a&gt; hashing or &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Digital_signature"&gt;digital signing&lt;/a&gt;. The PA MAC is created by the country when emits a passport, which must keep a signing key safe, which is used to create the MAC code. Then it publish the public key (all readers must know the countries public keys) which can be used to check if the MAC is valid and it was really generated by the organization which possess the signing key. The PA scheme relies on mathematical functions which are easy to encrypt a MAC with the private key, but computationally hard without them. The public key can easily check if the MAC was generated by the ones who posses the private key, and detect forged data. This process makes the forgery of digital data a really hard computational problem, and ensures with negligible chance of failure that the data is valid. All the MRZ data is signed using such process. Using PA is mandatory by ICAO standards.&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-O0UJv0Zxb8M/TfkWv1z3GZI/AAAAAAAAADc/fthS6pLy2Qc/s1600/pa.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="262" src="http://4.bp.blogspot.com/-O0UJv0Zxb8M/TfkWv1z3GZI/AAAAAAAAADc/fthS6pLy2Qc/s400/pa.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;PA happy path's sequence diagram&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The &lt;b&gt;Active Authentication (AA)&lt;/b&gt; is a kind of &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Challenge-response_authentication"&gt;challenge response authentication&lt;/a&gt; mechanism to prevent the cloning of passports. It relies on the untraceable chip key. Cloning the MRZ data is a simple task to be done, however you can't get a passport private key due to the difficulty of doing reverse engineering to obtain it. The AA works as a challenge to the passport, so you can verify if it really is. The reader encrypts a random message that only him knows it's content with the passport's public key. The message is then sent to the passport and it must decrypt and response to the reader what was the original message. This process works because of the duality principle of asymmetric cryptography. This concept of challenge is the same concept behind &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/CAPTCHA"&gt;CAPTCHAs&lt;/a&gt; and the &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Turing_test"&gt;Turing Test&lt;/a&gt;.&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-wgEg8ANTfvU/TfkYAzu6PPI/AAAAAAAAADg/TPfbXj_VoZg/s1600/aa.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="400" src="http://1.bp.blogspot.com/-wgEg8ANTfvU/TfkYAzu6PPI/AAAAAAAAADg/TPfbXj_VoZg/s400/aa.png" width="377" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;AA hapy path's sequence diagram&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-4433850595377808139?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/4433850595377808139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/06/epassport-standard-and-its.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/4433850595377808139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/4433850595377808139'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/06/epassport-standard-and-its.html' title='The ePassport Standard and it&apos;s Cryptographic Scheme'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-pQaH6Psj-3k/TfkgFxRaQUI/AAAAAAAAADo/EIFaz2ZEmrA/s72-c/epass.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-1021140542982002017</id><published>2011-06-11T08:28:00.001-07:00</published><updated>2011-07-14T04:42:52.184-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='decorators'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic programming'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='cache'/><category scheme='http://www.blogger.com/atom/ns#' term='c'/><title type='text'>Optimizing Functions with Python Caching Decorators</title><content type='html'>&lt;div style="text-align: justify;"&gt;On these last months I've been solving some problems (such as some &lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_model"&gt;HMMs&lt;/a&gt; algorithms) which the best solutions involves some kind of &lt;a href="http://en.wikipedia.org/wiki/Dynamic_programming"&gt;dynamic programming&lt;/a&gt;.  Some of them are quite simple to implement, but their recursive  formulation are far more intuitive. The problem is that even in  functional languages, the recursive functions aren't well handled unless  you some mechanism like &lt;a href="http://en.wikipedia.org/wiki/Tail_call"&gt;tail call&lt;/a&gt;,  which aren't intuitive as we would like to. The simplest example that  comes in my mind is the fibonacci function which is usually defined as:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;fib(0) = 1&lt;br /&gt;fib(1) = 1&lt;br /&gt;fib(n) = fib(n-1) + fib(n-2)&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As we know, almost all the languages compilers and interpreters use the &lt;a href="http://en.wikipedia.org/wiki/Call_stack"&gt;call stack&lt;/a&gt; to call the recursives cases on functions being executed. We can analyze the following C fibonacci version:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;int fib(n)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (n == 0 || n == 1)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return fib(n-1) + fib(n-2);&lt;br /&gt;}&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;It is really  simple to understand when contrasted with the definition. But, if we  make a trace of the the program (even with a small input value) we'll  have something like the following evaluation order:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;fib(6) = fib(5) + fib(4)&lt;br /&gt;fib(5) = fib(4) + fib(3)&lt;br /&gt;fib(4) = fib(3) + fib(2)&lt;br /&gt;fib(3) = fib(2) + fib(1)&lt;br /&gt;fib(2) = fib(1) + fib(0)&lt;br /&gt;fib(3) = fib(2) + fib(1)&lt;br /&gt;fib(2) = fib(1) + fib(0)&lt;br /&gt;fib(4) = fib(3) + fib(2)&lt;br /&gt;fib(3) = fib(2) + fib(1)&lt;br /&gt;fib(2) = fib(1) + fib(0)&lt;br /&gt;fib(3) = fib(2) + fib(1)&lt;br /&gt;fib(2) = fib(1) + fib(0)&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://black.goucher.edu/%7Ekelliher/cs18/feb23fibonacci.gif" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="320" src="http://black.goucher.edu/%7Ekelliher/cs18/feb23fibonacci.gif" width="285" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Call stack for fib(6)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;As we can see,  there is a repetition of the calculation of fib 4 to 1, many times and  is something we can avoid. In fact, the complexity of this solution has a  exponencial computational complexity because for each n from input we  branch it in 2 until it reachs 0 or 1 approximately n times, leading us  into a O(2&lt;sup&gt;n&lt;/sup&gt;) complexity. A simple way to avoid it, is converting into a interactive form:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;int fib(int n)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int current = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int previous = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int i;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i = 1; i &amp;lt; n; i++) {&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int temp = current; // XXX: nasty&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; current += previous;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; previous = temp;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return current;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt; &lt;/code&gt;&lt;br /&gt;The same result is achieved by using &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Tail_call"&gt;tail call&lt;/a&gt; for functional languages.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As  you can see, it obfuscates the intuitive definition given in as the  recursive formulation. But we still have a problem whenever we calculate  fib(n), we have to recalculate it's previous results even if they was  previously calculated. If this function is used many times in our  program it will take a lot of processing re-computing many of the  values. We can avoid this by using the dynamic programming, which keeps  the record of previously calculated results. The drawback of this  technique is the memory usage, which for large entries can become a  bottleneck. However, processing usually is a more rare computer  resource. A C implementation (not the most elegant) for it is:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;// XXX: kids, don't do this at home&lt;br /&gt;int fib_results[10000];&lt;br /&gt;int last_fib;&lt;br /&gt;&lt;br /&gt;int fib(int n)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (n &amp;lt;= last_fib &lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return fib_results[n];&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int current = fib_results[last_fib-1];&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int previous = fib_results[last_fib-2];&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (; last_fib &amp;lt; n; last_fib++) {&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int temp = current;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; current += previous;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fib_results[last_fib] = current;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; previous = temp;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return current;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;}&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;int main()&lt;/code&gt;&lt;br /&gt;&lt;code&gt;{&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; fib_results[0] = 1;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; fib_results[1] = 1;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; last_fib = 1;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; // ... other stuff ...&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 0;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;}&amp;nbsp;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As we can see, dynamic  programming isn't too hard to implement. On the other hand, reading a  code it's a though task to do unless you are already familiar with the  algorithm.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;If  we extract what is dynamic programming fundamental concept, which is  "store pre-computed results", we find a regularity in every recursive  function which we can be transformed into a dynamic programming one. One  of the reasons I love python because it's easy to use meta-programming  concepts, and that's what I will use to transform recursive functions  into it's dynamic form in a ridiculous easy way using function  decorators.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Function  decorators (or annotations in Java) are a form of meta-programming for  functions. It extends functions with some functionalities, such as  debugging, tracing, adding meta-data to the function, synchronization or  &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Memoization"&gt;memoization&lt;/a&gt;  (not memorization) of values, which is a great way of optimizing  recursive functions by caching their results (if you have enough memory  available). One possible implementation of memoitized decorator in  python is the following:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;def cached(function):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cache = {}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def wrapper(*args):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if args in cache:&lt;br /&gt;&lt;/code&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/code&gt;&lt;code&gt;return cache[args]&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/code&gt;&lt;code&gt;             result = function(*args)&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&lt;/code&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/code&gt;&lt;code&gt;             cache[args] = result&lt;/code&gt;&lt;code&gt;&amp;nbsp;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/code&gt;&lt;code&gt;             return result&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/code&gt;&lt;code&gt;return wrapper&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Note  that I'm not using kwargs because they're not hashable, such as the  tuple args, and will add a few complexity in the example. See also that  we a have a function that returns another function, which uses a given  one to calculate results and store them in a cache. To cache our fib  function we may use the following code:&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;@cached&lt;br /&gt;def fib(n):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if n == 0 or n == 1:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return fib(n-1) + fib(n-2)&lt;br /&gt;&lt;br /&gt;# or in a not so clean version:&lt;br /&gt;def normal_fib(n):&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if n == 0 or n == 1:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return fib(n-1) + fib(n-2)&lt;br /&gt;&lt;br /&gt;fib = cached(normal_fib)&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;This technique is  really useful to improve your code performance in a really easy. On the  other hand, it isn't the best solution for almost all the cases. Many  times code a dynamic programming method (if performance is crucial) will  be necessary. Is also important to notice that I didn't used any cache  memory management policy, which is important to economize memory. Most  appropriate cache data structures (such as numpy arrays for integer  arguments) also are welcome. The python 3.2 version added the lru_cache  decorator into the functools module to make this cache mechanism. If you  are already using this version, it's smarter to use it instead of  implementing your one. Here is how it must be used:&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;# Python &amp;gt; 3.2&lt;br /&gt;import functools&lt;br /&gt;@functools.lru_cached(max_size=500) # uses a fixed size cache to avoid memory usage explosion&lt;br /&gt;def fib(n)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ...&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;This  technique is very useful not only for economize the CPU resources but  also network (such as caching SQL query results), other IO operations  (such as disk reading) and even user interaction input in a  straightforward way.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-1021140542982002017?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/1021140542982002017/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/06/optimizing-functions-with-python.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1021140542982002017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1021140542982002017'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/06/optimizing-functions-with-python.html' title='Optimizing Functions with Python Caching Decorators'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-1425390479655244187</id><published>2011-06-05T08:23:00.000-07:00</published><updated>2011-06-06T19:38:07.511-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ai'/><category scheme='http://www.blogger.com/atom/ns#' term='maximum likelihood'/><category scheme='http://www.blogger.com/atom/ns#' term='hmm'/><category scheme='http://www.blogger.com/atom/ns#' term='cpg'/><category scheme='http://www.blogger.com/atom/ns#' term='matplotlib'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='ghmm'/><category scheme='http://www.blogger.com/atom/ns#' term='baum welch'/><category scheme='http://www.blogger.com/atom/ns#' term='dna'/><category scheme='http://www.blogger.com/atom/ns#' term='markov'/><category scheme='http://www.blogger.com/atom/ns#' term='bio'/><title type='text'>CpG Islands (3) - Model Evaluation</title><content type='html'>&lt;div style="text-align: justify;"&gt;Following the &lt;a href="http://codecereal.blogspot.com/2011/06/cpg-islands-2.html"&gt;post&lt;/a&gt; we've built a Hidden Markov Model (HMM) for the &lt;a href="http://codecereal.blogspot.com/2011/05/cpg-islands-1.html"&gt;CpG islands problem&lt;/a&gt;,  using the training set. Now we should evaluate if our model is adequate  to predict things about CpG islands. For evaluate we may use a tagged  sequence and see if the HMM we built can predict the islands and what is  it's accuracy.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Using the viterbi path and a  tagged sequence (out ouf the training set), enable us to compare if the  estimative given by the model is coherent with the real data. Using the  HMM and the training and test sets of the last post, we can compare the  results using a &lt;a href="http://en.wikipedia.org/wiki/Confusion_matrix"&gt;confusion matrix&lt;/a&gt;. For example, at the following snippet of the viterbi path paired with the tagged sequence:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;[ ... 0,0,0,0,0,1,1,1,1,1, ... ]&lt;br /&gt;[ ... a t t g t c C G A C  ... ]&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;We may calculate the following confusion matrix:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&amp;nbsp; +----------- P R E D I C T E D --------+&lt;br /&gt;A | &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;           | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   | Non Island |&lt;br /&gt;C +------------+------------+------------+&lt;br /&gt;T | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;      4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;     |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 &amp;nbsp; &amp;nbsp;     |&lt;br /&gt;U +------------+------------+------------+&lt;br /&gt;A | Non Island |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;      1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;      5 &amp;nbsp; &amp;nbsp;     |&lt;br /&gt;L +------------+------------+------------+&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Note that the bigger are the main  diagonal values, the more accurate our model is. Making this matrix with  the experiment made (using &lt;a href="http://codecereal.blogspot.com/2011/06/cpg-islands-2.html"&gt;HMM with maximum likelihood&lt;/a&gt;) I got the following matrix:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;&amp;nbsp; +----------- P R E D I C T E D --------+&lt;br /&gt;A | &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;           | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   | Non Island |&lt;br /&gt;C +------------+------------+------------+&lt;br /&gt;T | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   |&amp;nbsp;&amp;nbsp; 328815 &amp;nbsp;     |&amp;nbsp;&amp;nbsp;&amp;nbsp; 61411 &amp;nbsp;     |&lt;br /&gt;U +------------+------------+------------+&lt;br /&gt;A | Non Island |&amp;nbsp;&amp;nbsp; 1381515&amp;nbsp; |&amp;nbsp;&amp;nbsp; 1889819&amp;nbsp; |&lt;br /&gt;L +------------+------------+------------+&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;We may see that a good amount of  islands was correctly predicted (84,26% of the islands), but  57,76% of the non islands regions was classified as islands, which  is a bad indicator. A experiment with the HMM after running the Baum Welch was also evaluated. The results was better from the obtained by the maximum likelihood when predicting islands (96,56% of accuracy). But also obtained a higher miss rate (70,70%). Here is the confusion matrix for this HMM:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;code&gt; &amp;nbsp; +----------- P R E D I C T E D --------+&lt;br /&gt;A | &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;           | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   | Non Island |&lt;br /&gt;C +------------+------------+------------+&lt;br /&gt;T | &amp;nbsp;   Island&amp;nbsp;&amp;nbsp;   |&amp;nbsp;&amp;nbsp; 376839 &amp;nbsp;     |&amp;nbsp;&amp;nbsp;&amp;nbsp; 13387 &amp;nbsp;     |&lt;br /&gt;U +------------+------------+------------+&lt;br /&gt;A | Non Island |&amp;nbsp;&amp;nbsp; 2313138&amp;nbsp; |&amp;nbsp;&amp;nbsp; 958196 &amp;nbsp;     |&lt;br /&gt;L +------------+------------+------------+&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-jc72-jwH_FM/TeubpJtuuNI/AAAAAAAAADU/4mwwuE1v3X4/s1600/cpgs_labels.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="344" src="http://1.bp.blogspot.com/-jc72-jwH_FM/TeubpJtuuNI/AAAAAAAAADU/4mwwuE1v3X4/s640/cpgs_labels.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Matching of Real Sequence, Viterbi and Posterior results&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;The result obtained was not very accurate. It may happened because we had few data to train  and to evaluate our  model. We could also build a HMM with other indicators  that identify  the CpG islands for model it better. Remember we simplified the CpG problem to the frequencies of occurring Cs and Gs, but a better model could also use the CG pairs. Using a more complicate HMM we could have more than 2 states and then associate a set of states to the CpG and non CpG islands sites. This would allow to use better the posterior result to classify the sequence. But I keep this as future work.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-1425390479655244187?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/1425390479655244187/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/06/cpg-islands-3.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1425390479655244187'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1425390479655244187'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/06/cpg-islands-3.html' title='CpG Islands (3) - Model Evaluation'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-jc72-jwH_FM/TeubpJtuuNI/AAAAAAAAADU/4mwwuE1v3X4/s72-c/cpgs_labels.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-4247644986483006747</id><published>2011-06-02T02:43:00.000-07:00</published><updated>2011-06-06T19:37:47.277-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ai'/><category scheme='http://www.blogger.com/atom/ns#' term='maximum likelihood'/><category scheme='http://www.blogger.com/atom/ns#' term='hmm'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='ghmm'/><category scheme='http://www.blogger.com/atom/ns#' term='dna'/><category scheme='http://www.blogger.com/atom/ns#' term='markov'/><category scheme='http://www.blogger.com/atom/ns#' term='bio'/><title type='text'>CpG Islands (2) - Building a Hidden Markov Model</title><content type='html'>&lt;div style="text-align: justify;"&gt;By the definition of the CpG islands in the &lt;a href="http://codecereal.blogspot.com/2011/05/cpg-islands-1.html"&gt;previous post&lt;/a&gt; and the &lt;a href="http://codecereal.blogspot.com/2011/05/hidden-markov-models.html"&gt;Hidden Markov Models&lt;/a&gt; (HMMs) short introduction, we now can model a HMM for finding CpG islands. We can create a model very similar to the "Fair Bet Casino" problem.&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;When we are in a nucleotide of given DNA sequence there are two possibilities, that nucleotide belongs to CpG island (lets denote state S1) or do not (S0). If you analyse a sibling nucleotide it can stay in the same state or to change with complementary probabilities. We can view it as this Markov chain:&lt;/div&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-XDFtKcIOjuk/TeWhDOgAn8I/AAAAAAAAADM/QWShClt3NzQ/s1600/cpg.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="125" src="http://4.bp.blogspot.com/-XDFtKcIOjuk/TeWhDOgAn8I/AAAAAAAAADM/QWShClt3NzQ/s400/cpg.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;CpG states Markov chain&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As the emissions of the states S0 and S1, we may have the associated probabilities of the emissions of ACGT symbols. However we can't know for now how much are these probabilities. However if we have a dataset of sequences with the attached information of where the CpG islands occurs, we can estimate those parameters. I used these DNA &lt;a href="http://www.cin.ufpe.br/%7Eigcf"&gt;sequences&lt;/a&gt; tagged by biologists (used &lt;a href="http://biopython.org/wiki/Main_Page"&gt;biopython&lt;/a&gt; library to parse the &lt;a href="http://en.wikipedia.org/wiki/FASTA_format"&gt;.fasta&lt;/a&gt; files) as the training and test sets to build the model and evaluate the technique. This dataset consists of nucleotides in upper and lower cases. When we have a given nucleotide in upper case, it denotes that it is a CpG site and the lower case nucleotides means they are not. I used a statistical technique called &lt;a href="http://en.wikipedia.org/wiki/Maximum_likelihood"&gt;maximum likelihood&lt;/a&gt; to build the HMM.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Using the maximum likelihood to calculate the HMM emissions probabilities, I count each char frequency in the dataset and divide by the total amount of nucleotides in the same state:&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-fHq11ISW8xU/TedZPgK2TSI/AAAAAAAAADQ/NT6sa23R8Yo/s1600/cpgemissions.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="235" src="http://1.bp.blogspot.com/-fHq11ISW8xU/TedZPgK2TSI/AAAAAAAAADQ/NT6sa23R8Yo/s400/cpgemissions.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Emissions probability diagram&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;code&gt;P(A|S0) = a / (a+c+g+t)&lt;br /&gt;P(C|S0) = c / (a+c+g+t)&lt;br /&gt;P(G|S0) = g / (a+c+g+t)&lt;br /&gt;P(T|S0) = t / (a+c+g+t)&lt;br /&gt;P(A|S1) = A / (A+C+G+T)&lt;br /&gt;P(C|S1) = C / (A+C+G+T)&lt;br /&gt;P(G|S1) = G / (A+C+G+T)&lt;br /&gt;P(T|S1) = T / (A+C+G+T)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;And to calculate the transition probabilities of each state, count the number of transitions between a CpG to a non CpG site &lt;code&gt;&lt;/code&gt;(out_of) and the reversal transitions (into). Then divide each of them by the state frequency which they're coming from. Note that the permanence probability of each state is the complement of the probability of transiting between states since there is no other state to go:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;P(S0|S1) = out_of / (A+C+G+T)&lt;br /&gt;P(S1|S0) = into / (a+c+g+t)&lt;br /&gt;P(S0|S0) = 1 - P(S1|S0)&lt;br /&gt;P(S1|S1) = 1 - P(S0|S1)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Our model is almost done, it just lacks of the initial transitions which can be supposed to be the frequency of each state over all nucleotides:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;P(S0) = a+c+g+t/(a+c+g+t+A+C+G+T)&lt;br /&gt;P(S1) = A+C+G+T/(a+c+g+t+A+C+G+T)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;I used the &lt;a href="http://ghmm.org/"&gt;ghmm&lt;/a&gt; to create my HMM with the given dataset and it generated this model:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;DiscreteEmissionHMM(N=2, M=4)&lt;br /&gt;&amp;nbsp; state 0 (initial=0.97)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Emissions: 0.30, 0.21, 0.20, 0.29&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transitions: -&amp;gt;0 (0.00), -&amp;gt;1 (1.00)&lt;br /&gt;&amp;nbsp; state 1 (initial=0.03)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Emissions: 0.22, 0.29, 0.29, 0.20&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transitions: -&amp;gt;0 (1.00), -&amp;gt;1 (0.00)&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;It's important to notice that when we print the HMM it says that has the probability of 1 to stay in the same state. This occurs due to float formatted print. The reality is that in that value is very close to 1 but still has a chance to hop between states. It's also interesting to see the probability of emission of C and G when we contrast the states. In S1 the probability of getting a C or a G is significantly higher than in S0.&lt;br /&gt;&lt;br /&gt;Another manner of creating the HMM, is by initializing a model with the same structure we modelled the problem and then apply some unsupervised learning algorithms, such as the &lt;a href="http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm"&gt;Baum Welch&lt;/a&gt; algorithm. The Baum Welch uses randomized initial weights in the HMM and once you feed untagged sequences, it tries to infer a model. However, usually is far better initialize your HMM with biased data about the problem (for example using the model we created) and let the Baum Welch calibrate the probabilities. With ghmm do the following:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;sigma = ghmm.IntegerRange(0,4) # our emission alphabet&lt;br /&gt;emission_seq = ghmm.EmissionSequence(sigma, [0, 4, 5, ... ]) # create a emission sequence&lt;br /&gt;hmm.baumWelch(sigma, emission_seq) # use a template hmm to calibrate the probablities&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;In the next post I will show how is the evaluation process of the HMM in the CpG island.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-4247644986483006747?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/4247644986483006747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/06/cpg-islands-2.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/4247644986483006747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/4247644986483006747'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/06/cpg-islands-2.html' title='CpG Islands (2) - Building a Hidden Markov Model'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-XDFtKcIOjuk/TeWhDOgAn8I/AAAAAAAAADM/QWShClt3NzQ/s72-c/cpg.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-281202682215387837</id><published>2011-05-30T17:40:00.000-07:00</published><updated>2011-06-28T11:31:46.845-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ai'/><category scheme='http://www.blogger.com/atom/ns#' term='hmm'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='ghmm'/><category scheme='http://www.blogger.com/atom/ns#' term='markov'/><title type='text'>Hidden Markov Models</title><content type='html'>&lt;div style="text-align: justify;"&gt;Nowadays, many applications use &lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_model"&gt;Hidden Markov Models&lt;/a&gt; (HMMs) to solve crucial issues such as  bioinformatics, speech recognition, musical analysis, digital signal  processing, data mining, financial applications, time series  analysis and many others. HMMs are probabilistic models which are very useful to model sequence behaviours or discrete time series events. Formally it models &lt;a href="http://en.wikipedia.org/wiki/Markov_process"&gt;Markov processes&lt;/a&gt; with hidden states, like an extension for &lt;a href="http://en.wikipedia.org/wiki/Markov_chain"&gt;Markov Chains&lt;/a&gt;. For computer scientists, is a state machine with probabilistic transitions where each state can emit a value with a given probability. &lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;For better understanding HMMs, I will illustrate how it works with "The Fair Bet Casino" problem. Imagine you are in a casino where you can bet on coins tosses, tossed by a dealer. A coin toss can have two outcomes: head (H) or tail (T). Now suppose that the coin dealer has two coins, a fair (F) which outputs both H and T with 1/2 probabilities and a biased coin (B) which outputs H with probability 3/4 and T with 1/4. Using probability language we say:&lt;/div&gt;&lt;ul&gt;&lt;li&gt;P(H&lt;sub&gt;i+1&lt;/sub&gt;|F&lt;sub&gt;i&lt;/sub&gt;) = 1/2&lt;/li&gt;&lt;li&gt;P(T&lt;sub&gt;i+1&lt;/sub&gt;|F&lt;sub&gt;i&lt;/sub&gt;) = 1/2&lt;/li&gt;&lt;li&gt;P(H&lt;sub&gt;i+1&lt;/sub&gt;|B&lt;sub&gt;i&lt;/sub&gt;) = 3/4&lt;/li&gt;&lt;li&gt;P(T&lt;sub&gt;i+1&lt;/sub&gt;|B&lt;sub&gt;i&lt;/sub&gt;) = 1/4&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;Now imagine that the dealer changes the coin in a way you can't see, but you know that he does it with a 1/10 probability. So thinking the coin tosses as a sequence of events we can say:&lt;/div&gt;&lt;ul&gt;&lt;li&gt;P(F&lt;sub&gt;i+1&lt;/sub&gt;|F&lt;sub&gt;i&lt;/sub&gt;) = 9/10&lt;/li&gt;&lt;li&gt;P(B&lt;sub&gt;i+1&lt;/sub&gt;|F&lt;sub&gt;i&lt;/sub&gt;) = 1/10&lt;/li&gt;&lt;li&gt;P(B&lt;sub&gt;i+1&lt;/sub&gt;|B&lt;sub&gt;i&lt;/sub&gt;) = 9/10&lt;/li&gt;&lt;li&gt;P(F&lt;sub&gt;i+1&lt;/sub&gt;|B&lt;sub&gt;i&lt;/sub&gt;) = 1/10&amp;nbsp; &lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;We can model it using a graph to illustrate the process:&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Lgf7NJe-MkY/TeK1zpDQ97I/AAAAAAAAADA/egoF0eM_UvQ/s1600/fairbet.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="313" src="http://2.bp.blogspot.com/-Lgf7NJe-MkY/TeK1zpDQ97I/AAAAAAAAADA/egoF0eM_UvQ/s320/fairbet.png" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;HMM for "The Fair Bet Casino" problem&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;That's a HMM! It isn't any rocket science. Is just important to add a few remarks. We call the set of all possible emissions of the Markov process as the alphabet Σ ({H, T} in our problem). For many of computational method involving HMMs you will also need a initial state distribution π. For our problem we may assume that the we have equal probability for each coin.&lt;br /&gt;&lt;br /&gt;Now comes in our mind what we can do with the model in our hands. There are lot's of stuff to do with it, such as: given a sequence of results, when the dealer used the biased coin or even generate a random sequence with a coherent behaviour when compared to the model.&lt;br /&gt;&lt;br /&gt;There is a nice library called &lt;a href="http://ghmm.org/"&gt;ghmm&lt;/a&gt; (available for C and Python) which handles HMMs and already gives us the most famous and important HMM algorithms. Unfortunately the python wrapper is not pythonic. Let's model our problem in python to have some fun:&lt;code&gt;&lt;br /&gt;&lt;br /&gt;import ghmm&lt;br /&gt;&lt;br /&gt;# setting 0 for Heads and 1 for Tails as our Alphabet &lt;br /&gt;sigma = ghmm.IntegerRange(0, 2)&lt;br /&gt;&lt;br /&gt;# transition matrix: rows and columns means origin and destiny states&lt;br /&gt;transitions_probabilities = [&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [0.9, 0.1], # 0: fair state&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [0.1, 0.9], # 1: biased state&lt;br /&gt;]&lt;br /&gt;&lt;br /&gt;# emission matrix: rows and columns means states and symbols respectively&lt;br /&gt;emissions_probabilities = [&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [0.5, 0.5], # 0: fair state emissions probabilities&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; [0.75, 0.25], # 1: biased state emissions probabilities&lt;br /&gt;]&lt;br /&gt;&lt;br /&gt;# probability of initial states&lt;br /&gt;pi = [0.5, 0.5] # equal probabilities for 0 and 1&lt;br /&gt;&lt;br /&gt;hmm = ghmm.HMMFromMatrices(&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sigma,&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; # you can model HMMs with others emission probability distributions&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ghmm.DiscreteDistribution(sigma),&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; transitions_probabilities,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; emissions_probabilities,&lt;br /&gt;&amp;nbsp; &amp;nbsp; pi&lt;br /&gt;)&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; print hmm&lt;/code&gt;&lt;br /&gt;&lt;code&gt;DiscreteEmissionHMM(N=2, M=2)&lt;/code&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&amp;nbsp; state 0 (initial=0.50)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Emissions: 0.50, 0.50&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transitions: -&amp;gt;0 (0.90), -&amp;gt;1 (0.10)&lt;br /&gt;&lt;br /&gt;&amp;nbsp; state 1 (initial=0.50)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Emissions: 0.75, 0.25&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transitions: -&amp;gt;0 (0.10), -&amp;gt;1 (0.90)&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Now that we have our HMM object on the hand we can play with it. Suppose you have the given sequence of coin tosses and you would like to distinguish which coin was being used at a given state:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;tosses = [1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0,  1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1]&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;The &lt;a href="http://en.wikipedia.org/wiki/Viterbi_algorithm"&gt;viterbi algorithm&lt;/a&gt; can be used to trace the most probable states at each coin toss according to the HMM distribution:&lt;/div&gt;&lt;code&gt;&lt;br /&gt;# not as pythonic is could be :-/&lt;br /&gt;sequence = ghmm.EmissionSequence(sigma, tosses)&lt;br /&gt;&lt;br /&gt;viterbi_path, _ = hmm.viterbi(sequence)&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; print viterbi_path&lt;br /&gt;[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Nice! But sometimes is interesting to have the probability of each state on the point instead of only the most probable one. To have that, you must use the &lt;a href="http://en.wikipedia.org/wiki/Posterior_mode"&gt;posterior&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Forward_algorithm"&gt;forward&lt;/a&gt; algorithms to have more detailed information.&lt;/div&gt;&lt;code&gt;&lt;br /&gt;states_probabilities = hmm.posterior(sequence)&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; print &lt;/code&gt;&lt;code&gt; states_probabilities&lt;/code&gt;&lt;br /&gt;&lt;code&gt; [[0.8407944139086141, 0.1592055860913865], [0.860787703168127, 0.13921229683187356], ... ]&lt;/code&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The posterior method result, returns the list of probabilities at each state, for example, in the first index we have &lt;code&gt;[0.8407944139086141, 0.1592055860913865]&lt;/code&gt;. That means that we have ~0.84 probability of chance that the dealer is using the fair coin and ~0.16 for the biased coin. We also can plot a graph to show the behaviour of the curve of the probability of the dealer being using the fair coin (I used matplotlib for the graphs).&lt;/div&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-MkJ2Jo70248/TeMJmgJcJII/AAAAAAAAADE/ZVf9J-nL0zc/s1600/viterbi.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="152" src="http://1.bp.blogspot.com/-MkJ2Jo70248/TeMJmgJcJII/AAAAAAAAADE/ZVf9J-nL0zc/s640/viterbi.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Probability of being a fair coin over time&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;This is only a superficial example of what can HMMs do. It's worthy give a look at it if you want do some sequence or time series analysis in any domain. I hope this post presented and cleared what are HMM and how they can be used to analyse data.&lt;/div&gt;&lt;div style="clear: both;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-281202682215387837?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/281202682215387837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/05/hidden-markov-models.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/281202682215387837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/281202682215387837'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/05/hidden-markov-models.html' title='Hidden Markov Models'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-Lgf7NJe-MkY/TeK1zpDQ97I/AAAAAAAAADA/egoF0eM_UvQ/s72-c/fairbet.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-1809075304702018988</id><published>2011-05-29T05:55:00.000-07:00</published><updated>2011-06-06T19:36:58.774-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hmm'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='ghmm'/><category scheme='http://www.blogger.com/atom/ns#' term='dna'/><category scheme='http://www.blogger.com/atom/ns#' term='bio'/><title type='text'>CpG Islands (1) - Problem Motivation &amp; Definitions</title><content type='html'>&lt;div style="text-align: justify;"&gt;This semester I'm attending the course &lt;a href="http://www.cin.ufpe.br/%7Eigcf/wiki/pmwiki.php?n=Disciplinas.ProcessamentoDeCadeias2011"&gt;Processing of Sequences and Methods and Algorithm in Computational Biology&lt;/a&gt; (basically DNA and proteins). One of the focus of it is the use of the Hidden Markov Models to solve many of it's problems. One well studied problems is how to find codifying sequences (snippets that can be translated into proteins) in a given sequence of DNA, which has both codifying and non codifying regions. One of the most used techniques is called CpG islands.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;In a random DNA sequence we have a lower frequency of two ((G)uanine and (C)ytosine) than the other two nucleotide types ((A)denine and (T)hymine) due to a phenomenon called &lt;a href="https://secure.wikimedia.org/wikipedia/en/wiki/Dna_methylation"&gt;DNA methylation&lt;/a&gt;. Due to this methylation, it's more probable to occur a deletion of a C and G nucleotides than with A or T&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://library.thinkquest.org/C0123260/basic%20knowledge/images/basic%20knowledge/DNA/DNA%20model%201.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="133" src="http://library.thinkquest.org/C0123260/basic%20knowledge/images/basic%20knowledge/DNA/DNA%20model%201.jpg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;DNA double helix structure&lt;br /&gt;&lt;br /&gt;&lt;/td&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style="text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;In the species evolution, occurs that genes which are essential for it's survival cannot mutate much, otherwise the organism can't live and reproduce to transmit it's gene to their descendants. Because of this, is very unlikely that even with lots of generations, those genes doesn't suffer much alteration. It's important to notice that non codifying regions can mutate without affecting the organism because it doesn't affect a vital part of the genetic code.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;With these two facts in hand we can see that mutation of the DNA can occur more frequently on non codifying regions. As the methylation of C is a very common mutation cause, we can observe that non codifying regions will have less C and G due to the methylation which it doesn't interfere (much) in the organism. The opposite occurs with codifying regions because it has less frequent mutations.&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://www.mad-cow.org/exon1_CpG.gif" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="320" src="http://www.mad-cow.org/exon1_CpG.gif" width="315" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;CpG island around an important human DNA regulation site&lt;/td&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;With this information in our hands we can use it to discover regions which are likely to be a codifying region. The sites which contains greater concentrations of C-phosphate-G (called CpG islands) are a strong indicator that it hasn't suffered much mutations through the generations. So it's very probable to be an area of interest for biologists. I'll describe in the next posts how can we identify these CpG using HMM and the ghmm python library.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-1809075304702018988?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/1809075304702018988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/05/cpg-islands-1.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1809075304702018988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/1809075304702018988'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/05/cpg-islands-1.html' title='CpG Islands (1) - Problem Motivation &amp; Definitions'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-984177687687411526</id><published>2011-05-27T13:58:00.000-07:00</published><updated>2011-05-27T14:00:16.742-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='qml'/><category scheme='http://www.blogger.com/atom/ns#' term='wsl'/><category scheme='http://www.blogger.com/atom/ns#' term='qtquick'/><category scheme='http://www.blogger.com/atom/ns#' term='qt'/><title type='text'>QtQuick - WSL II</title><content type='html'>&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;&lt;a href="http://qt.nokia.com/qtquick/"&gt;QtQuick&lt;/a&gt;  uses QML, a powerful tool for creating fluid interfaces in a clean,  easy and fast way. It uses a declarative language (Javascript based) and  it has a really good performance, which makes it suitable to make apps  for any kind of platform. It's also really easy to extend QML with  Qt/C++, enbling developers to speedup the application logic or something  else needed. Is also possible to use OpenGL to render QtQuick without  any change in the code, sending the painting job to the video card.  IMHO, it's greatest advantage, is because suitable for designer who  already know HTML and CSS to build awesome interfaces, being easier than  Flash and ActionScript. This is really welcome  because the QtQuick prototypes are more straightforward, because can be  directly integrated in the final application code.&lt;br /&gt;&lt;br /&gt;These are some videos showing what can be done with QtQuick:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;object style="height: 390px; width: 640px;"&gt;&lt;param name="movie" value="http://www.youtube.com/v/rqt7vM_vP3o?version=3"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/rqt7vM_vP3o?version=3" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="640" height="390"&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;object style="height: 390px; width: 640px;"&gt;&lt;param name="movie" value="http://www.youtube.com/v/GuAxYgOjOVA?version=3"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/GuAxYgOjOVA?version=3" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="640" height="390"&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;object style="height: 390px; width: 640px;"&gt;&lt;param name="movie" value="http://www.youtube.com/v/n3W5O2biSPU?version=3"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/n3W5O2biSPU?version=3" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="640" height="390"&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;&lt;object style="height: 390px; width: 640px;"&gt;&lt;param name="movie" value="http://www.youtube.com/v/KLQD2jYS-XU?version=3"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/KLQD2jYS-XU?version=3" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="640" height="390"&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;I've been developing with QtQuick (and some extensions such as the MeeGo and the &lt;a href="http://codecereal.blogspot.com/2011/05/plasma-components.html"&gt;Plasma Components&lt;/a&gt;) for quite a long time (for almost a year from now). So I'm sharing my QtQuick presentation on the II Workshop of Free Software at  CIn-UFPE at 28/03/2011. The workshop was organized by the University Linux User Group (&lt;a href="http://cinlug-br.org/"&gt;CIn-LUG&lt;/a&gt;) with the intent to show new free technologies and show how to start using them.&lt;br /&gt;&lt;div id="__ss_8128784" style="width: 425px;"&gt;&lt;b style="display: block; margin: 12px 0 4px;"&gt;&lt;a href="http://www.slideshare.net/dakerfp/qtquick-wsl-ii" title="QtQuick - WSL II"&gt;QtQuick - WSL II&lt;/a&gt;&lt;/b&gt;&lt;object height="355" id="__sse8128784" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=wsl-qtquick-110527144153-phpapp02&amp;stripped_title=qtquick-wsl-ii&amp;userName=dakerfp" /&gt;&lt;param name="allowFullScreen" value="true"/&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;embed name="__sse8128784" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=wsl-qtquick-110527144153-phpapp02&amp;stripped_title=qtquick-wsl-ii&amp;userName=dakerfp" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;div style="padding: 5px 0 12px;"&gt;View more &lt;a href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/dakerfp"&gt;Daker Fernandes&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;The exercises resources and examples can be found &lt;a href="http://dl.dropbox.com/u/8800422/qtquick-wsl-exercises.zip"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Happy hacking!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-984177687687411526?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/984177687687411526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/05/qtquick-wsl-ii.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/984177687687411526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/984177687687411526'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/05/qtquick-wsl-ii.html' title='QtQuick - WSL II'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-8161996012226830759</id><published>2011-05-19T13:58:00.000-07:00</published><updated>2011-05-19T13:59:45.731-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='gsoc'/><category scheme='http://www.blogger.com/atom/ns#' term='qtquick'/><category scheme='http://www.blogger.com/atom/ns#' term='kde'/><category scheme='http://www.blogger.com/atom/ns#' term='qtcomponents'/><category scheme='http://www.blogger.com/atom/ns#' term='plasma'/><title type='text'>Plasma Components</title><content type='html'>Hi,&lt;br /&gt;&lt;br /&gt;I'm here to talk about my GSoC project (\m/), the plasma components. As you may know, QML is a declarative language to build rich interfaces introduced in Qt 4.7 by providing simple primitives. As it is a powerful way to develop interfaces and it's the future of UI development for Qt was necessary to make the plasma support it.&lt;br /&gt;&lt;br /&gt;Create interfaces in QML is really easy and fast but sometimes we need common widgets and may be boring to reimplement and replicate them in every application we create (e.g. Button, Slider, ScrollBar). For avoid this code replication, the Qt Components project was created to unify an API for a set of components.&lt;br /&gt;&lt;br /&gt;The current (often updated) defined components and it's API can be found at &lt;a href="http://bugreports.qt.nokia.com/browse/QTCOMPONENTS-200"&gt;QTCOMPONENTS-200&lt;/a&gt;. The Qt Components intends to be a cross platform, but sometimes we need to have a closer integration of the component and the platform. In plasma desktop we want to show tooltips or use the theme svg images for example. The common API defines a set of properties a component must have, but doesn't disallow to have extra properties and functionalities (suggestions ?). Creating the plasma components with the theme integration and adding plasma behaviour is what my GSoC project is about :-).&lt;br /&gt;&lt;br /&gt;I already started working on the plasma components. They can be found in the kde-runtime repository at the plasma/declarative branch. And more speciffically in the plasma/declarativeimports/plasmacomponents directory. The components which already there are not yet done, they lack of tooltips, keyboard events handling, focus policy, and some (CheckBox and RadioButton) hasn't images yet. However they're fully functional and they cover all the properties and behaviours defined in the common API. Here is the list of the components in this state mentioned above (until this post publication):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;BusyIndicator&lt;/li&gt;&lt;li&gt;Button&lt;/li&gt;&lt;li&gt;CheckBox&lt;/li&gt;&lt;li&gt;RadioButton&lt;/li&gt;&lt;li&gt;Slider&lt;/li&gt;&lt;/ul&gt;These are components not present in the common API but they're highly wanted in plasma:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;ScrollBar&lt;/li&gt;&lt;li&gt;ListItem&lt;/li&gt;&lt;li&gt;ListItemView&lt;/li&gt;&lt;li&gt;ListHighlight&lt;/li&gt;&lt;/ul&gt;To test these components, its also kept a components gallery (plasma/declarativeimports/test/Gallery.qml) that you may view it with qmlviewer (after installing the components).&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-JIE5jKZJz4k/TdQ7b-38cHI/AAAAAAAAACY/iFknoH7f4wA/s1600/gallery.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="233" src="http://2.bp.blogspot.com/-JIE5jKZJz4k/TdQ7b-38cHI/AAAAAAAAACY/iFknoH7f4wA/s640/gallery.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Gallery screenshot&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;Feedback is highly appreciated. ;-)&lt;br /&gt;&lt;br /&gt;Cheers!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-8161996012226830759?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/8161996012226830759/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2011/05/plasma-components.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8161996012226830759'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8161996012226830759'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2011/05/plasma-components.html' title='Plasma Components'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-JIE5jKZJz4k/TdQ7b-38cHI/AAAAAAAAACY/iFknoH7f4wA/s72-c/gallery.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-7369139865823969899</id><published>2010-09-13T17:53:00.000-07:00</published><updated>2011-05-19T14:22:02.135-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='prolog'/><category scheme='http://www.blogger.com/atom/ns#' term='swi'/><title type='text'>I knew it!</title><content type='html'>&lt;a href="http://www.bbspot.com/News/2006/08/language_quiz.php"&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.bbspot.com/News/2006/08/language_quiz.php"&gt;&lt;img alt="You are Prolog. You enjoy looking for different ways to solve a problem.  You take longer to solve them, but usually come up with more than one solution." border="0" height="90" src="http://www.bbspot.com/Images/News_Features/2006/08/language/prolog.jpg" width="300" /&gt;&lt;br /&gt;Which Programming Language are You?&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-7369139865823969899?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/7369139865823969899/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/09/i-knew-it.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/7369139865823969899'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/7369139865823969899'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/09/i-knew-it.html' title='I knew it!'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-5127763677664366271</id><published>2010-08-24T17:07:00.000-07:00</published><updated>2011-05-19T14:21:30.756-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wsl'/><category scheme='http://www.blogger.com/atom/ns#' term='cinlug'/><category scheme='http://www.blogger.com/atom/ns#' term='ufpe'/><category scheme='http://www.blogger.com/atom/ns#' term='cin'/><category scheme='http://www.blogger.com/atom/ns#' term='mongodb'/><title type='text'>MongoDB Presentation</title><content type='html'>Hi,&lt;br /&gt;&lt;br /&gt;today I had the honor to make a presentation with Brunno Gommes (&lt;a href="http://twitter.com/brunnogomes"&gt;@brunnogomes&lt;/a&gt;) about MongoDB on a Workshop (actually the second) of Free Software at CIn-UFPE. The workshop was organized by the University Linux User Group (&lt;a href="http://cinlug-br.org/"&gt;CIn-LUG&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The content of the presentation was to introduce people to MongoDB basic operations and give a snapshot of the advanced features. To those who don't know MongoDB I strongly recommend to start to reading the presentation. Probably I will talk more about it here on the Codecereal.&lt;br /&gt;&lt;br /&gt;Anyway, I'm publishing my presentation here, so enjoy it:&lt;br /&gt;&lt;br /&gt;&lt;div id="__ss_5049381" style="width: 425px;"&gt;&lt;b style="display: block; margin: 12px 0pt 4px;"&gt;&lt;a href="http://www.slideshare.net/dakerfp/mongodb-workshop-cinlug" title="Mongodb workshop cinlug"&gt;Mongodb workshop cinlug&lt;/a&gt;&lt;/b&gt;&lt;object height="355" id="__sse5049381" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=mongodbworkshopcinlug-100824175819-phpapp01&amp;amp;stripped_title=mongodb-workshop-cinlug" /&gt;&lt;param name="allowFullScreen" value="true"/&gt;&lt;param name="allowScriptAccess" value="always"/&gt;&lt;embed name="__sse5049381" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=mongodbworkshopcinlug-100824175819-phpapp01&amp;amp;stripped_title=mongodb-workshop-cinlug" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;div style="padding: 5px 0pt 12px;"&gt;View more &lt;a href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a href="http://www.slideshare.net/dakerfp"&gt;Daker Fernandes&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-5127763677664366271?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/5127763677664366271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/08/mongodb-presentation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/5127763677664366271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/5127763677664366271'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/08/mongodb-presentation.html' title='MongoDB Presentation'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-7013528578147849207</id><published>2010-07-29T11:53:00.000-07:00</published><updated>2011-05-19T14:19:44.890-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reasoning'/><category scheme='http://www.blogger.com/atom/ns#' term='abduction'/><category scheme='http://www.blogger.com/atom/ns#' term='prolog'/><category scheme='http://www.blogger.com/atom/ns#' term='swi'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='diagnosticar'/><title type='text'>DiagnostiCar (4) - Abductive Reasoning</title><content type='html'>Hi,&lt;br /&gt;&lt;br /&gt;finally I had some time to write about abductive reasoning, the core of the DiagnostiCar expert system. As you may know from my &lt;a href="http://codecereal.blogspot.com/2010/04/diagnosticar-introduction.html"&gt;previous posts&lt;/a&gt; (means you're tough and reached this point :P), the abductive reasoning tries to infer what facts implies a given situation.&lt;br /&gt;&lt;br /&gt;To build such a reasoning we must start with the conclusion and try to prove it. Let's say that we have the following scenario:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;known mortal(X) &amp;lt;- human(X).&lt;br /&gt;&lt;br /&gt;known human(socrates).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;How abductive reasoning behaves when tries to find out what things are mortal? &lt;br /&gt;&lt;br /&gt;The first step is to check if &lt;b&gt;mortal(X)&lt;/b&gt; is a tautology such as:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;known mortal(xerxes).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;As in our example there isn't such tautology the reasoning agent must try to find a rule which implies that something is mortal. In our case proceeds that &lt;b&gt;mortal(X) &amp;lt;- human(X)&lt;/b&gt; so to prove that something is mortal we have to prove that it is human. The first step is then repeated and we reach the tautology &lt;b&gt;human(socrates)&lt;/b&gt; which asserts that Socrates is mortal.&lt;br /&gt;&lt;br /&gt;Note that we have no difference with we try to use Prolog directly:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mortal(X) :-&lt;br /&gt;    human(X).&lt;br /&gt;&lt;br /&gt;human(socrates).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Now consider a non trivial example. Imagine that we want to prove that &lt;b&gt;mortal(aristoteles)&lt;/b&gt; but we may let the reasoner conclude that such affirmation is only possible if Aristoteles is a human. Pretty smart huh? That's what abductive reasoning is capable of. To build such a predicate we will need to know not only what are proving but also the user id (to ask questions when possible) and a list to yield what is being assumed.&lt;br /&gt;&lt;br /&gt;Now lets try to prove a goal using the following rules (together with the Prolog code):&lt;br /&gt;&lt;br /&gt;1 - Check if the goal is a tautology:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(_, Goal, _) :-&lt;br /&gt;    known(Goal).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;2 - Check equalities:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(_, Value == Value, _) :-&lt;br /&gt;    Value = Value.&lt;br /&gt;&lt;br /&gt;abduction(_, Value \= Value, _) :-&lt;br /&gt;    Value \= Value.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;3 - Check if the goal is implied by something else:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(Uid, Goal, Assuming) :-&lt;br /&gt;    known(Goal &amp;lt;- Cause),&lt;br /&gt;    abduction(Uid, Cause, Assuming).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;4 - If the goal is an or or an and conjunction:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(Uid, Left &amp;amp; Right, Assuming) :-&lt;br /&gt;    abduction(Uid, Left, Assuming),&lt;br /&gt;    abduction(Uid, Right, Assuming).&lt;br /&gt;&lt;br /&gt;abduction(Uid, Left v _, Assuming) :-&lt;br /&gt;    abduction(Uid, Left, Assuming).&lt;br /&gt;&lt;br /&gt;abduction(Uid,_ v Right,Assuming) :-&lt;br /&gt;    abduction(Uid, Right, Assuming).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;5 - Ask the user:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(Uid, Goal, _) :-&lt;br /&gt;    functor(Goal,Question,1), &lt;br /&gt;    askable(Question),&lt;br /&gt;    arg(1,Goal,Answer),&lt;br /&gt;    ask(Uid,Question,Answer). % Defined in a previous post&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;6 - Make an assumption:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;abduction(_, Goal, Assuming) :-&lt;br /&gt;    assumable(Goal),&lt;br /&gt;    member(Goal, Assuming).&lt;br /&gt;    % Says that the goal must be assumed to reach the goal.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;This does it's task but we still have some problems. The first one is the lack of constraints. This reasoning doesn't consider if you assume something that implies in a contradiction. e.g.: &lt;b&gt;known false &amp;lt;- mortal(X) &amp;amp; immortal(X)&lt;/b&gt;. Another problem we can observe is that the system doesn't follow the Occam's razor principle, which affirms that between two valid explanation the simplest tends to be true. Finally we have a crucial problem is a infinite loop occurence when you try to prove the negation of something and the goal is a tautology.&lt;br /&gt;&lt;br /&gt;Let's solve those problems by parts. The first can be solved with a predicate which proves the goal and doesn't reach a contradiction at the same time:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;reason(Uid, Goal, Assuming) :-&lt;br /&gt;    abductive_reasoning(Uid, Goal, Assuming),&lt;br /&gt;    not(abductive_reasoning(Uid, false, Assuming)).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;But that predicate falls directly to the third problem. To solve this problem, the must intuitive solution is do a search starting with a empty list of assumptions and try incrementing the list size until it reachs a cutoff that you may assume that there is no explanation bigger than that. A simple and clear implementation of that would be:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;max_assumptions(5).&lt;br /&gt;abductive_reasoning(Uid, Goal, Assuming) :-&lt;br /&gt;    max_assumptions(Max),&lt;br /&gt;    abduction(Uid, Goal, Assuming, Max).&lt;br /&gt;&lt;br /&gt;abduction(Uid, Goal, Assuming, Depth) :-&lt;br /&gt;    Depth &amp;gt;= 0,&lt;br /&gt;    max_assumptions(Max),&lt;br /&gt;    Len is Max - Depth,&lt;br /&gt;    length(Assuming, Len),&lt;br /&gt;    abduction(Uid, Goal, Assuming).&lt;br /&gt;&lt;br /&gt;abduction(Uid, Goal, Assuming, Depth) :-&lt;br /&gt;    NextStep is Depth - 1,&lt;br /&gt;    NextStep &amp;gt;= 0,&lt;br /&gt;    abduction(Uid, Goal, Assuming, NextStep).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;It is not the most efficient code to handle code but works fine. You can avoid process a search path again by creating a dynamic predicate and asserting when a branch of the search is truth or false. It may use a lot of memory but it helps to execute faster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-7013528578147849207?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/7013528578147849207/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/07/diagnosticar-4-abductive-reasoning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/7013528578147849207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/7013528578147849207'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/07/diagnosticar-4-abductive-reasoning.html' title='DiagnostiCar (4) - Abductive Reasoning'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-9211036891154446722</id><published>2010-04-11T13:23:00.000-07:00</published><updated>2011-06-26T15:10:35.253-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reasoning'/><category scheme='http://www.blogger.com/atom/ns#' term='abduction'/><category scheme='http://www.blogger.com/atom/ns#' term='prolog'/><category scheme='http://www.blogger.com/atom/ns#' term='swi'/><category scheme='http://www.blogger.com/atom/ns#' term='shell'/><category scheme='http://www.blogger.com/atom/ns#' term='database'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='diagnosticar'/><title type='text'>DiagnostiCar (3) - Question &amp; Answer Interface</title><content type='html'>This post aims to explain how to create an interactive shell so that users may answer questions when the reasoning system need user information. The reasoning engine will access the ask engine by an predicate (I will call it &lt;b&gt;ask&lt;/b&gt;). An naive implementation of the &lt;b&gt;ask&lt;/b&gt; predicate could be:&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;ask(X) :-&lt;br /&gt; write(X), write(' is true?'), nl,&lt;br /&gt; read(yes).&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;It works, but askable predicates will be asked every time the reasoner will need an information, even if it was previously answered. Another specific problem from DiagnostiCar constraints, was how to separate which information is from which user, considering we're at an web application environment. The simultaneous users access to the reasoning engine couldn't simply be locked while an user was giving an answer back, so we must find a smarter way to ask.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To remember info in Prolog, we will need to use dynamic relations to store the asked questions, for those who don't know about dynamic relations I will give a short explanation. If you want to know more, read the "Building Expert Systems in Prolog"&lt;a href="http://www.amzi.com/ExpertSystemsInProlog/xsipfrtop.htm"&gt;[1]&lt;/a&gt; book.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Dynamic relations works similarly as a relational table in a relational database &lt;a href="http://www.swi-prolog.org/pldoc/doc_for?object=section%282%2c%274.12%27%2cswi%28%27%2fdoc%2fManual%2fdb.html%27%29%29"&gt;[2]&lt;/a&gt;, but without a type scheme (although it's strongly recommended to make a convention). The informations required to create a dynamic relation are just it's name and arity. To declare a dynamic predicate you just need to run dynamic(Predicate). Once the relation is a dynamic one, you can store information (&lt;b&gt;assert/1&lt;/b&gt; or &lt;b&gt;asserta/1&lt;/b&gt;) or delete information (use &lt;b&gt;retract/1&lt;/b&gt; and &lt;b&gt;retractall/1&lt;/b&gt;) eg.: &lt;b&gt;assert(foo(bar))&lt;/b&gt;. To query a dynamic predicate you just have to call it (it allows unification!) eg.: &lt;b&gt;?- foo(bar).&amp;nbsp; -&amp;gt; true&lt;/b&gt;. I made the following basic but illustrative example about Prolog DB:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;create_account(Id, Ammount) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(account(Id, Ammount)).&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;deposit(Id, Ammount) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retract(account(Id, PrevAmmount)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NewAmmount is Ammount + PrevAmmount,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(account(Id, NewAmmount)).&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;withdraw(Id, Ammount) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; account(Id, PrevAmmount),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; PrevAmmount &amp;gt;= Ammount,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retract(account(Id, PrevAmmount)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NewAmmount is PrevAmmount - Ammount,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(account(Id, NewAmmount)).&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To test the predicates do some queries:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;?- create_account(1, 1000).&lt;br /&gt;true.&lt;br /&gt;&lt;br /&gt;?- deposit(1, 200).&lt;br /&gt;true.&lt;br /&gt;&lt;br /&gt;?- account(Id, Ammount).&lt;br /&gt;Id = 1,&lt;br /&gt;Ammount = 1200.&lt;br /&gt;&lt;br /&gt;?- withdraw(1, 400).&lt;br /&gt;true.&lt;br /&gt;&lt;br /&gt;?- withdraw(1, 20000).&lt;br /&gt;false.&lt;br /&gt;&lt;br /&gt;?- account(Id, Ammount).&lt;br /&gt;Id = 1,&lt;br /&gt;Ammount = 800.&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The best solution found, to solve the concurrence and the question tracking, was to create predicates to read and write, so the webapp could obtain such information from the expert system. To avoid repeating the same questions, we will have to maintain a table with questions made, one with the answered questions and another to keep the answers from the users. Note that each relation needs to keep an user id to know from which user each information belongs to. The expert system question interface is the following:&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;% asked: UserId | Question&lt;br /&gt;:- dynamic asked/2.&lt;br /&gt;% asked: UserId | Question&lt;br /&gt;:- dynamic askfor/2.&lt;br /&gt;% known: UserId | Question | Fact&lt;br /&gt;:- dynamic known/3.&lt;br /&gt;&lt;br /&gt;% Predicate used by the expert system to ask.&lt;br /&gt;ask(Uid, Question, Answer) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; known(Uid, Question, Answer).&lt;br /&gt;ask(Uid, Question, _ ) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; not(asked(Uid,Question)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; not(askfor(Uid,Question)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(askfor(Uid, Question)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; fail.&lt;br /&gt;&lt;br /&gt;% Predicate used by the web app to input users answers for a given qeustion.&lt;br /&gt;answer(Uid, Question, [Answer | Answers]) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(known(Uid, Question, Answer)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; answer(Uid, Question, Answers).&lt;br /&gt;answer(Uid, Question, []) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; assert(asked(Uid, Question)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retractall(askfor(Uid,Question)).&lt;br /&gt;&lt;br /&gt;% Routine called by the webapp to clean the data from a user whose &lt;br /&gt;% session was expired.&lt;br /&gt;cleanup(Uid) :-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retractall(askfor(Uid,_)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retractall(known(Uid,_,_)),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retractall(asked(Uid,_)).&lt;/pre&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Not that hard! I will try to tell all about the abductive reasoning on the next post. &lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;[1] &lt;a href="http://www.amzi.com/ExpertSystemsInProlog/xsipfrtop.htm"&gt; "Building Expert Systems in Prolog" &lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;[2] &lt;a href="http://www.swi-prolog.org/pldoc/doc_for?object=section%282%2c%274.12%27%2cswi%28%27%2fdoc%2fManual%2fdb.html%27%29%29"&gt; SWI Database &lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-9211036891154446722?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/9211036891154446722/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-3-question-answer.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/9211036891154446722'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/9211036891154446722'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-3-question-answer.html' title='DiagnostiCar (3) - Question &amp; Answer Interface'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-8284160187589071126</id><published>2010-04-11T04:44:00.000-07:00</published><updated>2011-06-18T09:32:43.871-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='operator'/><category scheme='http://www.blogger.com/atom/ns#' term='reasoning'/><category scheme='http://www.blogger.com/atom/ns#' term='abduction'/><category scheme='http://www.blogger.com/atom/ns#' term='prolog'/><category scheme='http://www.blogger.com/atom/ns#' term='swi'/><category scheme='http://www.blogger.com/atom/ns#' term='shell'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='diagnosticar'/><title type='text'>DiagnostiCar (2) - Knowledge Representation Language</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;I'm giving sequence to the last post about the DiagnostiCar expert system. Today I'm going to specify the predicates to represent some domain specific knowledge.&lt;br /&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Prolog has a helpful and clean syntax &lt;a href="http://www.csci.csusb.edu/dick/samples/prolog.syntax.html"&gt;[1]&lt;/a&gt; (at least when you're used to :P) and it's easy to define new operators syntax in SWI &lt;a href="http://www.swi-prolog.org/pldoc/doc_for?object=section%282%2c%274.23%27%2cswi%28%27%2fdoc%2fManual%2foperators.html%27%29%29"&gt;[2]&lt;/a&gt;. Such feature improves the readability of the knowledge base. &lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To make an abductive reasoning we will need at least the logic operators 'and','or' and 'implies' to represent knowledge and chain it.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Our simple language will have the following syntax:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;ul style="text-align: justify;"&gt;&lt;li&gt;&lt;b&gt;known X&lt;/b&gt;. - Says that the sentence X must be true;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;X &amp;lt;- Y - X&lt;/b&gt; is caused by Y;&lt;/li&gt;&lt;li&gt;&lt;b&gt;X &amp;amp; Y&lt;/b&gt; - Is true if X and Y are true;&lt;/li&gt;&lt;li&gt;&lt;b&gt;X v Y&lt;/b&gt; - Is true if X or Y are true ('|' is already used by Prolog);&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;b&gt;assumable X&lt;/b&gt; - X can be assumed to prove something;&lt;/li&gt;&lt;li&gt;&lt;b&gt;askable X&lt;/b&gt; - X can be asked to the user;&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;Note that I'm not using the negation, because some dangerous particularities of Prolog negation as failure.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;It's interesting to note the similarity if we represented with the defaul prefixed notation of Prolog predicades:&lt;/div&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;- known(X)&lt;br /&gt;- &amp;lt;-(X,Y)&lt;br /&gt;- &amp;amp;(X,Y)&lt;br /&gt;- v(X,Y)&lt;br /&gt;- assumable(X)&lt;br /&gt;- askable(X)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Defining opertors increases the readabilty of the knowledge representation significantly. To redefine operators we will use the SWI predicate &lt;b&gt;op/3&lt;/b&gt; &lt;a href="http://www.swi-prolog.org/pldoc/doc_for?object=section%282%2c%274.23%27%2cswi%28%27%2fdoc%2fManual%2foperators.html%27%29%29"&gt;[2]&lt;/a&gt;. We can only run this predicate like:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;:- op(910, xfy, &amp;amp;).&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;The &lt;b&gt;op&lt;/b&gt; predicate has three fields &lt;b&gt;op(+Precedence, +Type, :Name)&lt;/b&gt;. The first one says the precendence of the operator (a number between 0 and 1200). The second is how the functor will be placed within it's "children nodes", for example: &lt;b&gt;xfy&lt;/b&gt; let the functor between the arguments while &lt;b&gt;xf&lt;/b&gt; let the functor after it's argument (remembering that you can only create unary or binary operators). For our representation language we will have:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;:- discontiguous(&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; known/1 ). % The discontiguous predicate tells Prolog&lt;br /&gt;:- discontiguous( assumable/1 ). % that the given predicate can be declared&lt;br /&gt;:- discontiguous(&amp;nbsp;&amp;nbsp; askable/1 ). % unsorted.&lt;br /&gt;&lt;br /&gt;:- op(910, xfy,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;amp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ). % a higher priority with an infix notation&lt;br /&gt;:- op(920, xfy,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; v&amp;nbsp;&amp;nbsp;&amp;nbsp; ). % infix notation and lower priority than &amp;amp;&lt;br /&gt;:- op(930, xfy,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;-&amp;nbsp;&amp;nbsp;&amp;nbsp; ). % infix notation&lt;br /&gt;:- op(940,&amp;nbsp; fx,&amp;nbsp;&amp;nbsp; known&amp;nbsp;&amp;nbsp; ). % prefixed notation and lowest priority&lt;br /&gt;:- op(940,&amp;nbsp; fx, assumable ). % prefixed notation and lowest priority&lt;br /&gt;:- op(940,&amp;nbsp; fx,&amp;nbsp;&amp;nbsp; askable ). % prefixed notation and lowest priority&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Now let's model the car problems knowledge into our language (remember that Upper names are variables):&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;br /&gt;&lt;pre&gt;%&lt;br /&gt;% DiagnostiCar - Knowledge Base&lt;br /&gt;%&lt;br /&gt;&lt;br /&gt;% Observe that implication is seen as the real sense of consequence &amp;lt;- cause.&lt;br /&gt;known problem(crank_case_damaged) &amp;lt;- crank_case_damaged.&lt;br /&gt;known problem(hydraulic_res_damaged) &amp;lt;- hydraulic_res_damaged.&lt;br /&gt;known problem(brakes_res_damaged) &amp;lt;- brakes_res_damaged.&lt;br /&gt;known problem(old_engine_oil) &amp;lt;- old_engine_oil &amp;amp; leak_color(black).&lt;br /&gt;&lt;br /&gt;known leak(engine_oil) &amp;lt;- crank_case_damaged.&lt;br /&gt;known leak(hydraulic_oil) &amp;lt;- hydraulic_res_damaged.&lt;br /&gt;known leak(brakes_oil) &amp;lt;- brakes_res_damaged.&lt;br /&gt;&lt;br /&gt;% Implications in false can be seen as constraints.&lt;br /&gt;known oil(Oil) &amp;lt;- exists_leak(true) &amp;amp; leak(Oil).&lt;br /&gt;known false &amp;lt;- oil(brakes_oil) &amp;amp; leak_color(Color) &amp;amp; Color \= green. &lt;br /&gt;known false &amp;lt;- oil(engine_oil) &amp;amp; leak_color(Color) &amp;amp; Color \= brown &amp;amp; Color \= black.&lt;br /&gt;known false &amp;lt;- oil(hydraulic_oil) &amp;amp; leak_color(Color) &amp;amp; Color \= red.&lt;br /&gt;&lt;br /&gt;% The assumable predicate asserts what will be can be suposed &lt;br /&gt;assumable crank_case_damaged.&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;assumable hydraulic_res_damaged. % to prove something e else.&lt;br /&gt;assumable brakes_res_damaged.&lt;br /&gt;assumable old_engine_oil.&lt;br /&gt;&lt;br /&gt;askable leak_color(Color). % Askable predicate asserts which information will be asked for the user.&lt;br /&gt;askable exists_leak(TrueOrFalse). % The argument of the predicate will be unified with the answer.&lt;br /&gt;&lt;br /&gt;known goal(X) &amp;lt;- problem(X). % The goal predicate is what we want to prove with the expert system.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;That's all for now. On the next post I will explain the question and answer interface using the dynamic Prolog predicates.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;[1] &lt;a href="http://www.csci.csusb.edu/dick/samples/prolog.syntax.html"&gt; Prolog Syntax &lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;[2] &lt;a href="http://www.swi-prolog.org/pldoc/doc_for?object=section%282%2c%274.23%27%2cswi%28%27%2fdoc%2fManual%2foperators.html%27%29%29"&gt; Prolog 'operator/3' documentation &lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-8284160187589071126?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/8284160187589071126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-2-knowledge-representation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8284160187589071126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/8284160187589071126'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-2-knowledge-representation.html' title='DiagnostiCar (2) - Knowledge Representation Language'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-3696267258211466760</id><published>2010-04-07T16:53:00.000-07:00</published><updated>2011-06-06T19:55:47.602-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reasoning'/><category scheme='http://www.blogger.com/atom/ns#' term='abduction'/><category scheme='http://www.blogger.com/atom/ns#' term='prolog'/><category scheme='http://www.blogger.com/atom/ns#' term='swi'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='diagnosticar'/><title type='text'>DiagnostiCar (1) - Introduction</title><content type='html'>&lt;div style="text-align: justify;"&gt;I'm here to tell my expirience with logic programming on a graduation project - DiagnostiCar.&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_wlCqyPb9gG8/S70VeX-KeaI/AAAAAAAAABQ/1tb70R2dUQo/s1600/diagnosticar.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_wlCqyPb9gG8/S70VeX-KeaI/AAAAAAAAABQ/1tb70R2dUQo/s320/diagnosticar.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;The DiagnostiCar core feature was an expert system for giving hints to laypeople about their cars symptoms and their causes and consequences to avoid being deceived by mechanics (an habit of the job). Communication between the system and the users was conceived to happen with multiple choice questions forms and at the end the system gives the possible causes for the observed car characteristics.&lt;br /&gt;&lt;br /&gt;This is a video of the DiagnostiCar expert system interface working:&lt;br /&gt;&lt;br /&gt;&lt;object style="height: 390px; width: 640px;"&gt;&lt;param name="movie" value="http://www.youtube.com/v/OxsAWwmGL88?version=3"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/OxsAWwmGL88?version=3" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="640" height="390"&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;Although there are many other interesting aspects about the DiagnostiCar, let's focus on the shell system. The expert system reasoning system used was abductive reasoning which is a kind of reasoning that tries to find a plausible cause (or a set of them) for a given fact occurrence [1][2]. Find the causes for a given observation is exactly what we need for diagnose car problems (the cause of the observable symptoms).&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;I choose to use Prolog not only because it's default support to first-order logic (although some logic limitations on negation [3]), but it has a nice support for pattern matching and syntax operator definition (most of the Prolog implementations). From the Prolog implementations I've picked SWI Prolog due to it's multi-thread support (necessary for a system running on a multiple user environment), stability and speed (not as fast as the YAP) [4].&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To illustrate the process of development of the expert system we will restrict our diagnosis to the problems consisting of some oil leaks problem. Here are some of the knowledge about car diagnosis I will use as example in the next sections:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;ul style="text-align: justify;"&gt;&lt;li&gt;A crank case, hydraulic oil reservoir and brakes fluid reservoir damaged are problems;&lt;/li&gt;&lt;li&gt;The crank case damaged provokes leak of engine oil;&lt;/li&gt;&lt;li&gt;The hydraulic reservoir damaged provokes leak of hydraulic system oil;&lt;/li&gt;&lt;li&gt;The brakes fluid reservoir damaged provokes leak of engine oil;&amp;nbsp;&lt;/li&gt;&lt;li&gt;If an oil is leaking it dirts the garage's floor.&lt;/li&gt;&lt;li&gt;The brakes fluid is green.&lt;/li&gt;&lt;li&gt;The hydraulic oil is red.&lt;/li&gt;&lt;li&gt;The engine oil is black or brown.&lt;/li&gt;&lt;li&gt;If the engine oil is black, then it needs to be changed.&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;For the next parts, it's good to know the basic about Prolog. A VERY good material about Prolog, it's a the book "Building Expert Systems in Prolog"[5] which guided through the realms of logic programming and the nasty api's documentations. It is also very pratical and full of examples to follow. It's also good see the SWI reference to check some SWI particularities[6].&lt;/div&gt;&lt;br /&gt;[1] &lt;a href="http://en.wikipedia.org/wiki/Abductive_logic_programming"&gt; Abductive Logic Programming &lt;/a&gt;&lt;br /&gt;[2] &lt;a href="http://en.wikipedia.org/wiki/Abductive_reasoning"&gt; Abducive Reasoning &lt;/a&gt;&lt;br /&gt;[3] &lt;a href="http://en.wikipedia.org/wiki/Negation_as_failure"&gt; Prolog Negation as Failure &lt;/a&gt;&lt;br /&gt;[4] &lt;a href="http://www.david-reitter.com/compling/prolog/compare.html"&gt; Prologs Implementation Comparisons &lt;/a&gt;&lt;br /&gt;[5] &lt;a href="http://www.amzi.com/ExpertSystemsInProlog/xsipfrtop.htm"&gt; "Building Expert Systems in Prolog" &lt;/a&gt;&lt;br /&gt;[6] &lt;a href="http://www.swi-prolog.org/pldoc/refman/"&gt; SWI Manual &lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-3696267258211466760?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/3696267258211466760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-introduction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/3696267258211466760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/3696267258211466760'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/04/diagnosticar-introduction.html' title='DiagnostiCar (1) - Introduction'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_wlCqyPb9gG8/S70VeX-KeaI/AAAAAAAAABQ/1tb70R2dUQo/s72-c/diagnosticar.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6447849434617576469.post-26211698888921233</id><published>2010-02-25T15:14:00.001-08:00</published><updated>2010-04-11T04:54:03.850-07:00</updated><title type='text'>First Post</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Spock says lambda, lambda, lambda to CodeCereal:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_wlCqyPb9gG8/S4cLBwExXLI/AAAAAAAAAAk/oR6cPmBGYB8/s1600-h/startrek_cereal_big.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_wlCqyPb9gG8/S4cLBwExXLI/AAAAAAAAAAk/oR6cPmBGYB8/s320/startrek_cereal_big.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6447849434617576469-26211698888921233?l=codecereal.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://codecereal.blogspot.com/feeds/26211698888921233/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://codecereal.blogspot.com/2010/02/first-post.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/26211698888921233'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6447849434617576469/posts/default/26211698888921233'/><link rel='alternate' type='text/html' href='http://codecereal.blogspot.com/2010/02/first-post.html' title='First Post'/><author><name>Daker Fernandes Pinheiro</name><uri>http://www.blogger.com/profile/13246366811991245999</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-5WD2HNXE45Q/TdWfTSkSEXI/AAAAAAAAACg/wL6O3AwG3tk/s220/isimp.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_wlCqyPb9gG8/S4cLBwExXLI/AAAAAAAAAAk/oR6cPmBGYB8/s72-c/startrek_cereal_big.jpg' height='72' width='72'/><thr:total>1</thr:total></entry></feed>
