<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-30125952</id><updated>2011-10-10T10:09:52.569+01:00</updated><title type='text'>Nature's Numbers</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>29</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-30125952.post-115486866681078601</id><published>2006-08-06T12:25:00.000+01:00</published><updated>2006-08-06T14:04:09.753+01:00</updated><title type='text'>Mr Bayes plays silly blighters</title><content type='html'>&lt;i&gt;(Statistics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I've just noticed that a fellow blogger is having &lt;a href="http://rrresearch.blogspot.com/2006/08/mr-bayes-comes-to-lab.html"&gt;a bit of trouble&lt;/a&gt; getting her head round Bayesian analysis. Actually, it seems she's got the hang of it, but it occurs to me that some background on why Bayesian analysis works might be useful. Being a maths geek and all-round nosey person, I am of course going to get involved with attempting to explain the situation.&lt;br /&gt;&lt;br /&gt;Bayesian analysis consists, quite simply, of one method to determine how probabilities behave when our model is accurate plus another method that uses this to refine our model. I'll briefly cover both.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The way things are&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The result I'm going to discuss here is called &lt;a href="http://en.wikipedia.org/wiki/Bayes%27_theorem"&gt;Bayes' Theorem&lt;/a&gt;. It's most easily explained using a visual technique called Venn diagrams, but that involves lots of irritating drawing and uploading so I'll skip it for the moment. Instead I'll briefly run you through the maths.&lt;br /&gt;&lt;br /&gt;The first thing you need to know is conditional probability. Conditional probability is a way of figuring out how likely it is that one thing happens given that another thing has already happened. The important result here is: the probability that A happens given B happens = the probability that A happens and B happens divided by the probability that B happens. I'll write this as P(A|B) = P(AnB)/P(B). Stare at this for a while and it'll hopefully start to make sense.&lt;br /&gt;&lt;blockquote&gt;&lt;u&gt;P(A|B) = P(AnB)/P(B)&lt;/u&gt;&lt;/blockquote&gt;&lt;br /&gt;This allows us to produce an interesting little equality - Bayes' Theorem. Since P(A|B)P(B) = P(AnB) = P(BnA) = P(B|A)P(A), we can write P(A|B) = P(B|A)P(A)/P(B). This is important - memorise it.&lt;br /&gt;&lt;blockquote&gt;&lt;u&gt;P(A|B) = P(B|A)P(A)/P(B)&lt;/u&gt;&lt;/blockquote&gt;&lt;br /&gt;Let's consider an example. Say we're picking black and white balls out of a pot, with replacement. We know there are 10 balls in the pot; what we don't know is the number that are black. Our task is to figure out the number of black balls. I'll need to cover some other topics, but I'd like to point out an interesting phenomenon here: if we guess correctly, the data will match our guess. No amount of data will change our mind. The &lt;i&gt;posterior&lt;/i&gt; probability P(# black | data) will equal the &lt;i&gt;prior&lt;/i&gt; probability P(# black).&lt;br /&gt;&lt;blockquote&gt;&lt;u&gt;If our distribution is accurate, P(distribution|data) = P(distribution)&lt;/u&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The way things should be&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I'm going to leave our concrete example for the moment and briefly cover an important numerical approximation technique. The actual proof thereof takes &lt;a href="http://en.wikipedia.org/wiki/Contraction_mapping"&gt;advanced mathematical skills&lt;/a&gt;, but the basic idea is simple. It is this: if an equation can be formulated as x = f(x), then setting x&lt;sub&gt;n+1&lt;/sub&gt; = f(x&lt;sub&gt;n&lt;/sub&gt;) will result in x&lt;sub&gt;n&lt;/sub&gt; converging on a solution of the equation as n increases.&lt;br /&gt;&lt;blockquote&gt;&lt;u&gt;If x=f(x) then setting x&lt;sub&gt;n+1&lt;/sub&gt; = f(x&lt;sub&gt;n&lt;/sub&gt;) will cause x&lt;sub&gt;n&lt;/sub&gt; to converge to x&lt;/u&gt;&lt;/blockquote&gt;&lt;br /&gt;Let's take an example. The other day my dad asked me to work out a fairly complicated problem in compound interest for him. It turned out that the answer I was after was a solution to the equation 46.40*x&lt;sup&gt;300&lt;/sup&gt; - 23446.40*x + 23400 = 0. Remembering the above mathematical technique, I rewrote the equation as x = ((23446.40*x + 23400)/46.40)&lt;sup&gt;1/300&lt;/sup&gt;. I then plugged in an initial value of 1.05 and iterated a couple of times. The algorithm very quickly converged on a correct solution of 1.0233653112365164.&lt;br /&gt;&lt;br /&gt;How does this apply to our example? Well, if you look at the equations we derived above, you'll notice something: if our distribution is accurate, we have the equation P(distribution) = P(distribution|data) = P(data|distribution)*P(distribution)/P(data). This is precisely the sort of equation that this algorithm is designed to find solutions for.&lt;br /&gt;&lt;br /&gt;Let's get back to the balls. We have eleven possible values for the number of black balls - anything from 0 to 10 - which gives rise to eleven possible binomial distributions that could be giving rise to the data. To start with, let's assume that these distributions are equally likely - P(# black = k) = 1/11 for all possible k. This distribution of distributions will act as our prior probabilities.&lt;br /&gt;&lt;br /&gt;Now let's go out and collect some data. We pick 40 balls and find that 21 are black and 19 are white. What are the associated posterior probabilities? &lt;br /&gt;&lt;br /&gt;Well, first we need to work out two things: the probability of getting that data given a value k for the number of black balls, and the overall probability of getting that data.  The first is fairly standard mathematics, so I won't go into it here - just google on "Binomial Distribution" if you're confused. The answer turns out to be 40!/(21!*19!)*(k/10)&lt;sup&gt;21&lt;/sup&gt;(1 - k/10)&lt;sup&gt;19&lt;/sup&gt;. So, for example, with k=5 this probability is 0.119.&lt;br /&gt;&lt;br /&gt;Now we can work out the overall probability of getting that data - note that P(data) = sum over distributions of P(data n distribution) = sum( P(data|distribution)*P(distribution) ). In the specific example, this turns out to be 1/11*(0 + 1.77*10&lt;sup&gt;-11&lt;/sup&gt; + 3.97*10&lt;sup&gt;-6&lt;/sup&gt; + 0.00157 + 0.0352 + 0.119 + 0.0792 + 0.00852 + 6.35*10&lt;sup&gt;-5&lt;/sup&gt; + 1.44*10&lt;sup&gt;-9&lt;/sup&gt; + 0) = 0.244/11. Cool, huh?&lt;br /&gt;&lt;br /&gt;OK, we now have all the parts we need to generate posterior probabilities. Here goes nothing...&lt;br /&gt;&lt;br /&gt;P(# black = k | 21 black, 19 white) = P(21 black, 19 white | # black = k)*P(# black = k)/P(21 black, 19 white)&lt;br /&gt;= 40!/(21!*19!)*(k/10)&lt;sup&gt;21&lt;/sup&gt;(1 - k/10)&lt;sup&gt;19&lt;/sup&gt; * 1/11 / (0.244/11)&lt;br /&gt;= 40!/(21!*19!)*(k/10)&lt;sup&gt;21&lt;/sup&gt;(1 - k/10)&lt;sup&gt;19&lt;/sup&gt; / 0.244&lt;br /&gt;&lt;br /&gt;Which gives us the following posterior probabilities:&lt;br /&gt;P(# black = 0) = 0.0&lt;br /&gt;P(# black = 1) = 7.27105875099e-11&lt;br /&gt;P(# black = 2) = 1.62678304084e-05&lt;br /&gt;P(# black = 3) = 0.0064179907059&lt;br /&gt;P(# black = 4) = 0.144252566798&lt;br /&gt;P(# black = 5) = 0.489542214277&lt;br /&gt;P(# black = 6) = 0.324568275296&lt;br /&gt;P(# black = 7) = 0.0349423938432&lt;br /&gt;P(# black = 8) = 0.000260285286534&lt;br /&gt;P(# black = 9) = 5.8895575883e-09&lt;br /&gt;P(# black = 10) = 0.0&lt;br /&gt;&lt;br /&gt;If you sum these up you'll find they add up to 1, as necessary. You'll note that our distribution has already started converging on a value for k of either 5 or 6, which is more or less what we'd expect. With sufficient data, we can expect that one distribution will emerge as the clear winner - adding more data is directly equivalent to doing another pass of the approximation algorithm, and could even be formulated in the same way if we were complete masochists.&lt;br /&gt;&lt;br /&gt;Hope that helps everyone else as much as it just helped me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115486866681078601?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://rrresearch.blogspot.com/2006/08/mr-bayes-comes-to-lab.html' title='Mr Bayes plays silly blighters'/><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115486866681078601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115486866681078601' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115486866681078601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115486866681078601'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/08/mr-bayes-plays-silly-blighters.html' title='Mr Bayes plays silly blighters'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115366488822709023</id><published>2006-07-23T14:23:00.000+01:00</published><updated>2006-07-23T15:28:08.290+01:00</updated><title type='text'>Background Page: Genetic Algorithms</title><content type='html'>&lt;b&gt;Description&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Ever since Darwin, we've had a fairly good idea of how the complexity of the world's organisms has appeared. It's the result of a stochastic optimisation process known as natural selection, which turns out to be very good at producing efficient solutions to problems.&lt;br /&gt;&lt;br /&gt;This raises an interesting question: can we harness this effect to produce better designs? What problems is this approach good at solving? How do we tailor the approach to be as effective as possible? These questions are encapsulated and answered in the study of genetic algorithms.&lt;br /&gt;&lt;br /&gt;This isn't &lt;i&gt;strictly&lt;/i&gt; a part of computational biology, but it's a technique that's widely used in this field (if only because CBists have generally come across GAs). See, for example, &lt;a href="http://bioinformatics.oxfordjournals.org/cgi/content/short/22/13/1577"&gt;this paper&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Problem solved&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;How can we use evolutionary processes to solve optimisation problems?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Books/Resources&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;"Evolutionary Computing" - Kenneth A. De Jong. Barely started reading.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Posts&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;None so far.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115366488822709023?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115366488822709023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115366488822709023' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115366488822709023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115366488822709023'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/07/background-page-genetic-algorithms.html' title='Background Page: Genetic Algorithms'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115360465122711165</id><published>2006-07-22T21:25:00.000+01:00</published><updated>2006-07-22T23:04:13.110+01:00</updated><title type='text'>Sequence comparison</title><content type='html'>&lt;i&gt;(Bioinformatics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;OK, so I got word through from the guy in charge of the computer course about my little &lt;a href="http://naturesnumbers.blogspot.com/2006/06/real-life-academic-ethics-dilemma.html"&gt;ethical crisis&lt;/a&gt;. He suggested that, instead of posting actual code, I instead merely include pseudocode for the bits covered by the course. Now why the heck didn't I think of that?&lt;br /&gt;&lt;br /&gt;On with the show. A major part of bioinformatics involves the comparison of different genetic sequences. Say you have a working gene for vitamin C production, plus something that you think might be a broken copy of same. How do you figure out whether they are actually related?&lt;br /&gt;&lt;br /&gt;The basic technique is to figure out how many mutations (insertions, deletions, substitutions) it takes to get from one to the other. There are many many variants on this basic principle - tailor-made modifications designed to account for a variety of changeable factors such as:&lt;br /&gt;&lt;br /&gt;1) are certain substitutions more common than others?&lt;br /&gt;2) how likely are insertions and deletions of various sizes? (cough&lt;a href="http://naturesnumbers.blogspot.com/2006/06/eyes-on-future-dawg.html"&gt;Zipf distribution&lt;/a&gt;cough)&lt;br /&gt;3) do certain subsections match better than others?&lt;br /&gt;&lt;br /&gt;For this post, I'll only cover the most simple case: all three types of mutation are equally likely, and indels (insertions and deletions) are always of length 1.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cutting it down to size&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;OK, so we have a couple of strings (referred to as S and T) to align - we'll think of this as finding a way to mutate S so it ends up looking like T. There's no immediately-obvious way of solving this, so let's try a standard approach: see if we can reduce it to a simpler problem.&lt;br /&gt;&lt;br /&gt;I'm going to look at a few cases and show how they reduce to various easier problems, with examples of situations where each reductive approach can be ideal. Hope people can follow! I will be using the Python &lt;a href="http://docs.python.org/tut/node5.html#SECTION005120000000000000000"&gt;slice notation&lt;/a&gt; extensively, so you may want to familiarise yourself with that first.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Matching&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;OK, so say our two strings are "CAAAT&lt;span style="color: rgb(255, 0, 0);"&gt;AGCAA&lt;/span&gt;" and "CCGCG&lt;span style="color: rgb(255, 0, 0);"&gt;AGCAA&lt;/span&gt;". There's something you should notice here: we don't actually need to do anything with the last 5 digits! We can just say "hey, they match", and stop worrying about them. The problem then reduces itself to trying to align "CAAAT" with "CCGCG". Much simpler.&lt;br /&gt;&lt;br /&gt;Formally: if S[-1] and T[-1] are identical, they may be replaced by S[:-1] and T[:-1] at no cost.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Substitution&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Let's try a different set of strings. Say we have "AAATCCAGC&lt;span style="color: rgb(255, 0, 0);"&gt;A&lt;/span&gt;" and "AAATCCAGC&lt;span style="color: rgb(255, 0, 0);"&gt;T&lt;/span&gt;". Now, if you look at these, you'll see they differ by only the last element. Time for a substitution! If we transform the first string by substituting a T for the A, we'll then just be able to match the rest. Home and dry! This does "cost" us one mutation, but in this case that's a small price to pay.&lt;br /&gt;&lt;br /&gt;Formally: S and T may be replaced by S[:-1] and T[:-1] at a cost of 1 mutation&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Deletion&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Our next two example strings are "GGGTCATCTT&lt;span style="color: rgb(255, 0, 0);"&gt;A&lt;/span&gt;" and "GGGTCATCTT". Now, it should be fairly obvious what you need to do to the first string to get it to match the second: just delete the "A". &lt;br /&gt;&lt;br /&gt;Formally: S may be replaced with S[:-1] at a cost of 1 mutation&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Insertion&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Finally, consider the strings "CCGGAGACCT" and "CCGGAGACCT&lt;span style="color: rgb(255, 0, 0);"&gt;G&lt;/span&gt;". Our instinctive response here is to stick a G on the end of the first string. However, that presents us with a problem: if we're going to start adding stuff to strings, we're going to end up with longer strings. That ruins the whole idea of trying to simplify the problem.&lt;br /&gt;&lt;br /&gt;There's a simple workaround - we just need to combine two operations. If we add a "G" and then immediately match the two "G"s away, we end up with the same result as if we'd simply dropped the "G" off the end of the second string. That's a lot more tractable, and incidentally is pleasingly symmetrical with the aforementioned operation of deletion.&lt;br /&gt;&lt;br /&gt;Formally: T may be replaced by T[:-1] at a cost of 1 mutation&lt;br /&gt;&lt;br /&gt;These are the only options for this model!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Shake it all about&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;OK, so let's combine these four approaches into a single algorithm. What we're going to do is take our pair of strings and apply each of the four abovementioned alterations to them (using matching or substitution as applicable). This will provide us with three pairs of strings, at least one of which will be shorter, which we can then apply the same method to. Eventually both strings will sink to size zero in every single case. We'll then select the sequence of operations  that gets us to this point via the fewest expensive mutations.&lt;br /&gt;&lt;br /&gt;We can write this more formally. Refer to the "distance" (number of mutations) between S and T as D(S,T). Then we have the following:&lt;br /&gt;&lt;br /&gt;D(S,T) = min{ &lt;br /&gt; D(S[:-1],T[:-1]) + s(S[-1],T[-1]) &lt;br /&gt; D(S[:-1],T) + 1&lt;br /&gt; D(S,T[:-1]) + 1&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;where s(A,B) = 1 if A and B are different letters and 0 if A and B are the same (this represents the cost of either substituting B for A or matching them)&lt;br /&gt;&lt;br /&gt;If we implement this as a function that takes two strings and returns the distance between them, we can simply design it so that it &lt;a href="http://en.wikipedia.org/wiki/Recursion#Recursion_in_computer_science"&gt;recurses&lt;/a&gt; through all possible substrings. Puzzle solved!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A problem&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We have, however, missed something. What happens if, say, we're comparing "ATGGCTA" with ""? The string T[:-1] will now no longer exist, and the program will crash horribly when it tries to evaluate D(S,T[:-1]). Doh.&lt;br /&gt;&lt;br /&gt;Fortunately, there's an easy solution. Since the only way of getting from "ATGGCTA" to "" is by deleting all the letters, we can just say that this has a cost of 7 mutations. A similar approach applies if the first string is empty. We'll need to write this into the code:&lt;br /&gt;&lt;br /&gt;if S=="": D(S,T) = len(T)&lt;br /&gt;&lt;br /&gt;elif T=="": D(S,T) = len(S) &lt;br /&gt;&lt;br /&gt;else:&lt;br /&gt; D(S,T) = min{ &lt;br /&gt;  D(S[:-1],T[:-1]) + s(S[-1],T[-1]) &lt;br /&gt;  D(S[:-1],T) + 1&lt;br /&gt;  D(S,T[:-1]) + 1&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A computational note&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If you decide to implement this alignment algorithm for yourself - and I strongly recommend you try it - you'll quickly notice a major problem with the standard recursive implementation of the algorithm. In short, it's damn slow for long strings. More precisely, since each distance calculation spawns three other distance calculations, the time required to run the thing will be somewhere on the order of 3 to the power of the length of the strings&lt;sup&gt;[1]&lt;/sup&gt;. This sucks.&lt;br /&gt;&lt;br /&gt;Fortunately, there's a much nicer approach, albeit one that requires a modicum more memory (try saying that three time fast...). The problem with the recursion is that it's wasting time by repeatedly calculating the same distance. Why not simply precalculate these values, and refer to them as necessary? One way of implementing this is by creating a grid such that element (i,j) of the grid is equal to D(S[:i],T[:j]). This can be populated by filling in the sides adjacent to D("","") and then working outwards from that corner.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A rather fuzzy mathematical note&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;It turns out, if I understand the book I'm currently reading correctly, that the problem solved here is part of a wider class of mathematical problems, which also includes things like determining RNA secondary structure. In particular, they're ones that in some rather abstract sense involve the concept of &lt;a href=""&gt;context-free grammars&lt;/a&gt;, and it turns out that you can solve them all in a similar fashion. &lt;br /&gt;&lt;br /&gt;At present, I have sod-all idea how this works. However, as an incurable maths student, I'm very interested in exploring the theoretical underpinnings involved, and will probably rant on about this in some detail at a later date. Consider yourselves warned :)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The next step&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In this post I've provided a fairly solid discussion of the basic alignment algorithm. The next post on this topic will be fairly brief, and will cover a phenomenon known as BLOSUM matrices that are used to fine-tune the alignment of two protein (not DNA) sequences.&lt;br /&gt;&lt;br /&gt;[1] At some point it will actually be beneficial to have a detailed discussion of calculations of this sort. They come up in the real world a heck of a lot - for example, in cryptography, mathematicians can spend years of their lives trying to reduce an algorithm's runtime by a comparatively tiny-looking amount. The reason they do this is because that "tiny amount" can often be disproportionately important as the codes get harder to crack.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115360465122711165?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115360465122711165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115360465122711165' title='21 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115360465122711165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115360465122711165'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/07/sequence-comparison.html' title='Sequence comparison'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>21</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115332820478160535</id><published>2006-07-19T17:50:00.000+01:00</published><updated>2006-07-19T17:56:44.790+01:00</updated><title type='text'>Back in business</title><content type='html'>I've been away for a while.&lt;br /&gt;&lt;br /&gt;I have an excuse: my computer hates me.&lt;br /&gt;&lt;br /&gt;More precisely, a standard upgrade caused it to crap its pants in at least 6 different ways and counting. X.org died in 4 different ways, 3 of which related to the scourge upon the earth that is NVidia's godawful policy regards open-sourcing their drivers. Xfce (the window manager) died in two ways, one of which involved apparently losing every single tweak I've ever made to its appearance. I'd just got it how I wanted it too.&lt;br /&gt;&lt;br /&gt;This sucks.&lt;br /&gt;&lt;br /&gt;On the other hand, at least I have a (relatively) functional command line now, rather than having to use my dad's old laptop. Time to get back to work.&lt;br /&gt;&lt;br /&gt;I suspect my computer will shortly find a more serious way to die - the unseasonal heatwave is hitting it like a sledgehammer - but let's not contemplate that.&lt;br /&gt;&lt;br /&gt;Think happy thoughts...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115332820478160535?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115332820478160535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115332820478160535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115332820478160535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115332820478160535'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/07/back-in-business.html' title='Back in business'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115204984152611478</id><published>2006-07-04T22:47:00.000+01:00</published><updated>2006-07-04T22:50:41.536+01:00</updated><title type='text'>Surprisingly non-scary</title><content type='html'>The first edition of the newly-formed Bio::Blogs&lt;sup&gt;[1]&lt;/sup&gt; blog carnival is up at &lt;a href="http://pbeltrao.blogspot.com/2006/07/bioblogs-1-editorial-of-sorts-welcome.html"&gt;Public Rambling&lt;/a&gt;. It's fairly short compared to many carnivals, but that's to be expected for a first edition. &lt;br /&gt;&lt;br /&gt;For anyone who's wondering what I'm talking about: a blog carnival is a synopsis of interesting and relevant entries from different blogs, usually organised round a common theme. In this case, bioinformatics. Very cool.&lt;br /&gt;&lt;br /&gt;[1] This is an amusing name if and only if you're a Perl geek&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115204984152611478?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115204984152611478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115204984152611478' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115204984152611478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115204984152611478'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/07/surprisingly-non-scary.html' title='Surprisingly non-scary'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115183940657990699</id><published>2006-07-02T11:50:00.000+01:00</published><updated>2006-07-02T12:23:26.626+01:00</updated><title type='text'>Personal update</title><content type='html'>Sadly I didn't get into the MPhil program. This was precisely what I'd expected, so it's not exactly a shock to the system. &lt;br /&gt;&lt;br /&gt;I now have one major goal: to make them wish they &lt;i&gt;had&lt;/i&gt; taken me. To this end, I'll be posting about more advanced topics, possibly skipping chunks of background material in the process. If I have time, or if anyone has special requests, I'll go back and post about this missing material, but the end goal is to bring myself up to scratch on the various elements of computational biology as quickly as humanly possible.&lt;br /&gt;&lt;br /&gt;Assistance in achieving this goal is greatly appreciated. The small number of people who have found this blog have already been incredibly helpful. Advice on any of the following is gratefully received:&lt;br /&gt;&lt;br /&gt;1) Suggestions for material to look at, for example:&lt;br /&gt;a) books&lt;br /&gt;b) papers&lt;br /&gt;c) articles&lt;br /&gt;d) programs&lt;br /&gt;&lt;br /&gt;2) critical analysis of posts[1], in particular:&lt;br /&gt;a) readability&lt;br /&gt;b) accuracy&lt;br /&gt;&lt;br /&gt;3) Absolutely anything else that you feel would improve either the blog or my understanding&lt;br /&gt;&lt;br /&gt;And once I'm up to scratch on all fronts, and my blog is getting hits from &lt;strike&gt;millio&lt;/strike&gt; &lt;strike&gt;thousa&lt;/strike&gt; hundreds of people who are interested in learning more about CB from the educational resource I've produced, maybe someone from the Cambridge CB dept will wander across it, and read for a bit, and absentmindedly think "shame we didn't accept this guy's offer". &lt;br /&gt;&lt;br /&gt;And I'll be satisfied.&lt;br /&gt;&lt;br /&gt;[1] Please consider this an open invitation to be as bitchy as you please - I won't take it personally :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115183940657990699?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115183940657990699/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115183940657990699' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115183940657990699'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115183940657990699'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/07/personal-update.html' title='Personal update'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115161367133166789</id><published>2006-06-29T21:36:00.000+01:00</published><updated>2006-06-29T21:55:29.273+01:00</updated><title type='text'>A last gift</title><content type='html'>Tomorrow I graduate. I also get kicked out of uni accommodation, with the consequence that I won't be around much over the next week or so. &lt;br /&gt;&lt;br /&gt;As a final pressie for my readers before I vanish, here is the simulation of speciation by genetic drift that I was talking about: http://coalescent.freewebpage.org/popgen/speciation1.py&lt;br /&gt;&lt;br /&gt;This is the longest program I've produced yet, so when I get back online I will probably end up:&lt;br /&gt;1) producing decent documentation&lt;br /&gt;2) possibly converting it to an object-oriented approach to organisms&lt;br /&gt;3) writing several posts discussing its behaviour in various circumstances&lt;br /&gt;4) upgrading the program in a range of interesting ways, such as by adding a graphical interface&lt;br /&gt;&lt;br /&gt;Until then, have fun :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115161367133166789?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115161367133166789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115161367133166789' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115161367133166789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115161367133166789'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/last-gift.html' title='A last gift'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115157781185368925</id><published>2006-06-29T11:32:00.000+01:00</published><updated>2006-06-29T11:43:31.860+01:00</updated><title type='text'>A cry for help</title><content type='html'>Could someone help me here? I'm planning to write a program to model drift-induced speciation. However, to do this I need two statistics that I don't have.&lt;br /&gt;&lt;br /&gt;Firstly, I need to know how the chance of two organisms mating is likely to behave as a function of the distances between their birth locations. Obviously this is going to vary heavily from species to species - eagles will cover more ground than sloths. I would expect the distribution to be an exponential decay, with the parameter representing the organism's migratory ability. Can anyone confirm or correct?&lt;br /&gt;&lt;br /&gt;Secondly, I need to know how the ability of two organisms to mate varies as a function of the accumulated genetic differences between them. On the face of it, this is a daft question - in the real world, some differences will have an impact so much greater than others as to make the distribution virtually meaningless. However, some kind of average would still be handy. &lt;br /&gt;&lt;br /&gt;Again, my inclination would be to model this as an exponential curve, with the parameter being something that a Real Scientist would determine through experiment. It's presumably affected by genome compactness, though, so I can set it to pretty much any value I want for experimental purposes.&lt;br /&gt;&lt;br /&gt;Can anyone help? I appreciate that this is research I should really be doing for myself, but frankly I don't have the first clue as to where to look.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115157781185368925?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115157781185368925/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115157781185368925' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115157781185368925'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115157781185368925'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/cry-for-help.html' title='A cry for help'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115136249155588944</id><published>2006-06-26T23:36:00.000+01:00</published><updated>2006-06-27T12:08:31.690+01:00</updated><title type='text'>How much is too much</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Apropos of my &lt;a href=""&gt;earlier post&lt;/a&gt; about the problems that the basic neutral model hits wrt mutation, I rigged up a script&lt;sup&gt;[1]&lt;/sup&gt; to figure out: exactly how much mutation is required to mess things about?&lt;br /&gt;&lt;br /&gt;Here is the output data taken over 100000 generations&lt;sup&gt;[2]&lt;/sup&gt; with a population size of 100. Anything longer or larger would have been computationally problematic.&lt;br /&gt;&lt;div style="font-family:monospace"&gt;Mutation rate | Fixations&lt;br /&gt;       0.0001 | 12&lt;br /&gt;       0.0002 | 23&lt;br /&gt;       0.0005 | 51&lt;br /&gt;        0.001 | 68&lt;br /&gt;        0.002 | 111&lt;br /&gt;        0.005 | 108&lt;br /&gt;         0.01 | 23&lt;br /&gt;         0.02 | 2&lt;br /&gt;         0.05 | 1&lt;br /&gt;          0.1 | 1&lt;br /&gt;          0.2 | 1&lt;br /&gt;          0.5 | 1&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I have of course graphed this out&lt;sup&gt;[3]&lt;/sup&gt;:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7798/3224/1600/gd4a.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7798/3224/320/gd4a.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And a close-up of the left hand side:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7798/3224/1600/gd4b.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7798/3224/320/gd4b.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now, from this graph it looks very much like the rate of fixation increases linearly then tails away. We've already explained the second phenomenon - it's because the high rate of mutation breaks the otherwise reasonable assumption that the system will tend to homogeneity. The first phenomenon, however, is very interesting. Let's act like proper scientists and see what's going on.&lt;br /&gt;&lt;br /&gt;First we'll try to figure out what the actual rate of fixation is in these cases (note: I've trimmed the boring tail off):&lt;br /&gt;&lt;div style="font-family:monospace"&gt;Mutation rate | Fixation rate&lt;br /&gt;       0.0001 | 0.00012&lt;br /&gt;       0.0002 | 0.00023&lt;br /&gt;       0.0005 | 0.00051&lt;br /&gt;        0.001 | 0.00068&lt;br /&gt;        0.002 | 0.00111&lt;br /&gt;        0.005 | 0.00108&lt;br /&gt;         0.01 | 0.00023&lt;br /&gt;         0.02 | 0.00002&lt;br /&gt;         0.05 | 0.00001&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Now, what's odd about those numbers? Oh, right, for the first few the mutation rate is almost identical to the fixation rate. Cool!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Warning: Empiricism at work&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This equality would appear to hold very well for this case. But there's always the possibility that it's a trick of the light - there could be more factors that just happen to equal 1 in this case. So we're going to have to try more experiments. &lt;br /&gt;&lt;br /&gt;In fact, due to time constraints, I'm only going to try three more. I'm going to change each of the two remaining variables (population size, number of generations - I've already covered mutation rate) in turn.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Population size&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I'll raise the size by a power of 10 (setting mutation at 0.0002 and generations at 100000). If our hypothesis that the mutation rate = the fixation rate is true, I should get around 20 fixations.&lt;br /&gt;&lt;br /&gt;The result: 11. Rate of fixation is therefore 0.00011&lt;br /&gt;Close, but no cigar.&lt;br /&gt;&lt;br /&gt;On reflection, though, it occurs to me that this could be a feature of the fact that, with a bigger population, the whack-a-mole factor of new mutations appearing is going to be more of a problem. As such, I repeated the experiment with a population size of just 10. &lt;br /&gt;&lt;br /&gt;Result: 25. Rate of fixation is therefore 0.00025, which is acceptably close to the mutation rate of 0.0002. At some point I'll have to do a more detailed examination to determine whether all this is valid (at the moment I'm effectively reasoning from anecdotal evidence) but for the moment I'll say that population size apparently does not affect fixation rate and I'll move on.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Number of generations&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I'll return the size to its original value of 100, and raise the number of generations to 1000000, leaving mutation rate at 0.0002&lt;br /&gt;&lt;br /&gt;Result (after much waiting): 177. Should be 200. What the hell, close enough. (Like I say, when I have lots of computer time to play with I'll redo a lot of this)&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Conclusion and explanation&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;OK, so the results, whilst decidedly flakey, appear to broadly support the conclusion that the rate of mutation equals the rate of fixation (for low rate of mutation). But why should this be so?&lt;br /&gt;&lt;br /&gt;Turns out the reasoning is fairly obvious. Recall that, if N is the population size, the probability of any given allele in a generation eventually fixing is 1/N. Now note that, if the rate of mutation is u, there will be u.N mutated alleles per generation. If we assume that the average time taken for an allele to fix is not dependent on the generation it appears in, it follows directly that the rate of fixation will be u.N.1/N = u. QED.&lt;br /&gt;&lt;br /&gt;This post has raised three further questions which I'll have to explore at some point:&lt;br /&gt;&lt;br /&gt;1) How does the fixation curve vary with rate of mutation? In particular, at what point does the negative whack-a-mole effect start to overwhelm the positive effect described last paragraph? (this will probably be a mostly theoretical discussion)&lt;br /&gt;&lt;br /&gt;2) Apropos of the last paragraph, under what conditions does the assumption of constant average fixation time break down? Can we destroy it by, for example, messing around with the population size?&lt;br /&gt;&lt;br /&gt;3) How exactly are we assessing whether an experimental result is "close enough" to the theoretical result? Here I'm going to need to discuss some undergrad-level statistics.&lt;br /&gt;&lt;br /&gt;[1] http://coalescent.freewebpage.org/popgen/gendrift4.py&lt;br /&gt;[2] This sort of thing is why population geneticists don't do much fieldwork...&lt;br /&gt;[3] After two years of the scariest computer projects on God's green earth, it is actually psychologically impossible for me to see data like this &lt;i&gt;without&lt;/i&gt; trying to graph it out. I hold out hope that some day the scars will fade&lt;sup&gt;[4]&lt;/sup&gt;&lt;br /&gt;[4] If the computer course supervisor happens to be reading this, please note that the above was strictly humorous - the projects were great fun&lt;sup&gt;[5]&lt;/sup&gt;&lt;br /&gt;[5] Apart from the extremely dodgy function libraries you gave us, which in two cases made experienced programmers burst into laughter. But let's not go there&lt;br /&gt;[6] This assumption may break down if, for example, the size of the population is increasing. I'll have to do more experiments at some point.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115136249155588944?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115136249155588944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115136249155588944' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115136249155588944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115136249155588944'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/how-much-is-too-much.html' title='How much is too much'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115132879512119319</id><published>2006-06-26T12:13:00.000+01:00</published><updated>2006-06-26T14:56:06.036+01:00</updated><title type='text'>The merits of models</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I've already written a couple of posts as to the relative merits of various models. For example, I talked about the assumptions of the &lt;a href="http://naturesnumbers.blogspot.com/2006/06/hardy-weinberg-law.html"&gt;Hardy-Weinberg model&lt;/a&gt; and what happens if they're broken. I also talked briefly about how &lt;a href="http://naturesnumbers.blogspot.com/2006/06/eyes-on-future-dawg.html"&gt;Dawg&lt;/a&gt; models genetic mutation more realistically than most other programs.&lt;br /&gt;&lt;br /&gt;The specific model that I've been looking at most lately is the neutral model of genetic drift. As such, it seems useful to have a look at some of the major problems with it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Selection&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Firstly, the obvious: no selection. To introduce the concept of selection, we're going to need to:&lt;br /&gt;a) simulate the generation of phenotypes from genotypes (in particular, we'll need to consider dominance issues relating to competing alleles)&lt;br /&gt;b) assign fitness values to those phenotypes&lt;br /&gt;&lt;br /&gt;There are some difficulties with both of these. The issue of dominance can be and has been effectively solved by sufficient real-world research, so I won't discuss that. The assignment of fitness, however, has major conceptual problems. In the real world, organisms don't wander around with a big neon sign over their heads proclaiming "this organism has fitness value 98" or whatever. Fitness is usually an emergent property of the organism's interactions with its environment - if you drop a pale-coloured moth into the middle of an industrial revolution, it instantly becomes less fit.&lt;br /&gt;&lt;br /&gt;I'll leave off discussion of this for the moment, because I intend to go on about it at great length when discussing genetic algorithms. Just be aware that assigning fitness in any meaningful sense is a major pain in the unmentionables. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Fixing&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There's a second flaw with many of the conclusions of the neutral model, and it lies in an assumption I &lt;a href="http://naturesnumbers.blogspot.com/2006/06/genetic-drift-and-neutral-theory.html"&gt;briefly mentioned&lt;/a&gt; earlier - that systems tend to fixation, with one allele type ruling the roost. This assumption works fine if you have a system that proceeds from the first generation by reproduction alone. However, it breaks horribly if random mutation is thrown into the mix.&lt;br /&gt;&lt;br /&gt;Just think about how the poor allele type must feel. It's finally approaching fixation when boom! a mutated variant of itself appears out of nowhere. Then, just when it's got that mutation beaten, another appears. And another. And another. It must be like playing whack-a-mole!&lt;br /&gt;&lt;br /&gt;OK, so anthropomorphising allele types is silly. But the point remains. This assumption about fixation - which is so central to our conclusions - breaks down horribly when the mutation rate gets too high&lt;sup&gt;[1]&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A new model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;But that can't be right surely? After all, large alleles go to fixation all the time in the human population. Don't they?&lt;br /&gt;&lt;br /&gt;No they don't. Well, a few do, but they're usually the ones that, when tinkered with, cause people to do an award-winning impression of &lt;a href="http://www.rendermania.com/images/HDRI/XmenD2.jpg"&gt;Senator Kelly&lt;/a&gt; from the first XMen film. Selection is responsible there. However, what can and does happen under genetic drift is for specific DNA bases to go to fixation. Alleles mutate too fast to fix, but the DNA itself mutates fairly slowly. So a better model is actually to say that each allele contains a large number of bases, with a slow rate of mutation per base.&lt;br /&gt;&lt;br /&gt;Now, this new model actually has very interesting consequences. See, before this, we had no way of determining an allele's history. Say we had a system with just two allele types, A and B, and one of them mutated into a C - we'd have no way of knowing which one it was. But that's just changed - since new alleles will be generally very similar to the allele from which they mutated (usually only differing by a single base), we can make inferences about the historical relationships of the various extant alleles - their family tree.&lt;br /&gt;&lt;br /&gt;This family tree is called the Coalescent, and really deserves a post of its own. See you next time for more :)&lt;br /&gt;&lt;br /&gt;[1] At some point I'll write a script to determine: how high is too high?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115132879512119319?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115132879512119319/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115132879512119319' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115132879512119319'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115132879512119319'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/merits-of-models.html' title='The merits of models'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115126911251644543</id><published>2006-06-25T21:54:00.000+01:00</published><updated>2006-06-25T21:58:32.526+01:00</updated><title type='text'>Well that's the last you'll see of me for a week...</title><content type='html'>Courtesy of Pedro Beltrão in the comments of &lt;a href="http://naturesnumbers.blogspot.com/2006/06/little-background.html"&gt;one of my posts&lt;/a&gt;, I've become aware of the &lt;a href="http://www.nodalpoint.org/"&gt;Nodalpoint&lt;/a&gt; group blog. In particular, their &lt;a href="http://www.nodalpoint.org/node/1623"&gt;list of scientific weblogs&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Lured by the mention of Python in one of the link descriptions, I came across the site of one &lt;a href="http://www.dalkescientific.com/writings/diary/index.html"&gt;Andrew Dalke&lt;/a&gt;, who appears to be everything I ever want to be as far as Computational Biology goes. And for added joy, he linked to a bunch of &lt;a href="http://www.dalkescientific.com/writings/NBN/"&gt;lecture notes&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I'm going for a read. I may be some time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115126911251644543?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115126911251644543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115126911251644543' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126911251644543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126911251644543'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/well-thats-last-youll-see-of-me-for.html' title='Well that&apos;s the last you&apos;ll see of me for a week...'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115126792852860826</id><published>2006-06-25T21:36:00.000+01:00</published><updated>2006-06-25T21:38:48.533+01:00</updated><title type='text'>Ribolicious</title><content type='html'>&lt;i&gt;(Functional Genomics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;I've been meaning to do this for years. It's a fairly obvious step if you have any interest in both genomics and proteomics, but for some reason I never got round to it.&lt;br /&gt;&lt;br /&gt;But that's in the past! Behold, the ribosome - in Python!!! Bwahahahaha!!!!!&lt;br /&gt;&lt;br /&gt;&lt;i&gt;http://coalescent.freewebpage.org/funcgen/synthesis1.py&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;OK, that was an anticlimax. Maybe I need to work on my mad-scientist laugh some more.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115126792852860826?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115126792852860826/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115126792852860826' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126792852860826'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126792852860826'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/ribolicious.html' title='Ribolicious'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115126080062517472</id><published>2006-06-25T18:48:00.000+01:00</published><updated>2006-06-25T19:57:52.103+01:00</updated><title type='text'>Microwhatnows?</title><content type='html'>&lt;i&gt;(Functional Genomics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;It's actually quite hard to read around genomics for any length of time without coming across the concept of microarrays. The problem is, no-one ever seems to explain what they are - you might as just well write &lt;a href="http://www.webamused.com/blogosophy/archives/002064.html"&gt;"and then a miracle occurs"&lt;/a&gt;. Now, after two google searches, six websites and a textbook, I've finally got my head round what they do. Read on, MacDuff!&lt;br /&gt;&lt;br /&gt;Microarrays are used for sequencing DNA. As I hope to God you already know, DNA comes in four flavours, which pair up - an adenine molecule is always linked to a thymine molecule, a cytosine to a guanine, and vice versa. The resulting quaternary (four-element) code is copied by messenger RNA, transported to ribosomes, and there decoded as a protein molecule. It is, of course, fundamental to bioinformatics to know what these sequences of bases look like. However, these aren't exactly things that you can examine with the naked eye, so some scientific wizardry is necessary.&lt;br /&gt;&lt;br /&gt;The way that microarrays sequence DNA is fairly simple. First, you'll need a tray with lots of holes in. Then you'll need to synthesise a bunch of different oligonucleotides. Oligonucleotides are one of many varieties of molecule that can "pair up" with DNA bases as if it was a second strand of DNA. Its critical feature, though, is that it is possible to detect whether or not it has paired up in this fashion with a DNA sequence. Your tray will need to contain lots of different oligonucleotide chains, and you'll need to keep track of which variety you put in which hole (mass production is your friend here).&lt;br /&gt;&lt;br /&gt;Next, take your chunk of DNA. You'll need to make many many copies of it, but fortunately that's fairly easy to achieve (after all, cells do it all the time). Drop some into each hole. &lt;br /&gt;&lt;br /&gt;This is the clever part. If the oligonucleotide in a given hole pairs up perfectly with some of the DNA you just dropped in, it will stick to it. If it doesn't, it won't stick under any circumstances. So, by using the detection technique alluded to earlier, you can tell whether the DNA sequence contains the subsequence that the oligonucleotide pairs to.&lt;br /&gt;&lt;br /&gt;Once you've done all this, you'll have hundreds of bits of data - a long list of oligonucleotides that did stick, and a far longer list of ones that didn't. Most of these subsequences will overlap with each other, so you can "jigsaw" the complete sequence together.&lt;br /&gt;&lt;br /&gt;For illustrative purposes, here's an example. Say you're using oligonucleotides of length 10 bases, and you want to sequence the following bit of DNA: TATACTTACGACCAG. Of course you don't actually know that this is its sequence, but when you drop it into the wells of the microarray the following pairings will occur:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;ATATGAATGC&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;&amp;nbsp;TATGAATGCT&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;&amp;nbsp;&amp;nbsp;ATGAATGCTG&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;TGAATGCTGG&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;GAATGCTGGT&lt;/div&gt;&lt;br /&gt;&lt;div style="font-family:monospace"&gt;TATACTTACGACCAG&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;AATGCTGGTC&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The wells with those oligonucleotides in will light up (literally - the detection process is usually based on fluorescence), and all the others will stay dark. Thus, you now know that the chunk of DNA contained the following subsequences and no others:&lt;br /&gt;&lt;ul&gt;TATACTTACG&lt;br /&gt;ATACTTACGA&lt;br /&gt;TACTTACGAC&lt;br /&gt;ACTTACGACC&lt;br /&gt;CTTACGACCA&lt;br /&gt;TTACGACCAG&lt;/ul&gt;&lt;br /&gt;The alignment of all these chunks is generally done by computer (in fact the entire process is generally automated) but, in this simple case, it doesn't take a genius to figure out what the overall sequence looks like.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Note: I'm in a fairly incoherent mood this evening, and I feel that may have fed over onto my blogging. If anyone has trouble figuring out what I'm on about in the above post, I'll make a special effort to fix it. &lt;br /&gt;&lt;br /&gt;The rest of the time, you're on your own :P&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115126080062517472?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115126080062517472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115126080062517472' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126080062517472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115126080062517472'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/microwhatnows.html' title='Microwhatnows?'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115125377879560282</id><published>2006-06-25T17:28:00.000+01:00</published><updated>2006-06-25T19:40:34.446+01:00</updated><title type='text'>A real life academic ethics dilemma</title><content type='html'>&lt;i&gt;(Bioinformatics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Yesterday, when I was &lt;a href="http://naturesnumbers.blogspot.com/2006/06/eyes-on-future-dawg.html"&gt;talking about Dawg&lt;/a&gt;, I mentioned possibly introducing concepts from a computer project I did for my course. On reflection, this presented a problem.&lt;br /&gt;&lt;br /&gt;You see, my uni reuses the same courses between years. For this reason, there are explicit rules against sharing your project work, either mathematical or computational, with the lower years - if we have to slog through it, why should they get off scot free? As such, I could be causing the university (and myself, if they decide to spread it around) some serious problems. I don't want to do that - it's a nice university.&lt;br /&gt;&lt;br /&gt;So I have a dilemma - how do I cover the material without giving the game away to the next generation? My preferred solution here is to shift the intellectual load onto the course administrator, and I have every intention of slavishly following his directions, but this is actually quite an interesting question in its own right. Does anyone have an opinion on this?&lt;br /&gt;&lt;br /&gt;Possible compromises that I've considered (assuming that the university is actually bothered):&lt;br /&gt;&lt;br /&gt;1) Maintain strict anonymity. Give people no information on which uni I studied at. Avoid becoming famous enough that people who know me personally come across my blog (that part at least should be easy :P).&lt;br /&gt;&lt;br /&gt;2) Attempt a clean-room reimplementation of sorts. Invest in books on the subject and scrupulously avoid covering the questions raised by the computer project unless and until they intersect that syllabus. Reuse no code or mathematics - do it completely from scratch. That wouldn't necessarily solve the uni's dilemma, but would at least mean that I don't have any sort of unfair advantage as far as stuffing up their courses goes.&lt;br /&gt;&lt;br /&gt;3) Give up on covering this topic. That option sucks somewhat, however - sequence alignment is a major cornerstone of bioinformatics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115125377879560282?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115125377879560282/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115125377879560282' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115125377879560282'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115125377879560282'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/real-life-academic-ethics-dilemma.html' title='A real life academic ethics dilemma'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115125043319238849</id><published>2006-06-25T16:32:00.000+01:00</published><updated>2006-06-25T16:47:13.200+01:00</updated><title type='text'>Hardy-Weinberg under stress</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;You may recall that, &lt;a href="http://naturesnumbers.blogspot.com/2006/06/genetic-drift-simulation.html"&gt;a few posts back&lt;/a&gt;, I mentioned checking to see whether the Hardy-Weinberg frequencies still functioned in a genetically-drifting system. The answer would appear to be an unequivocal "yes".&lt;br /&gt;&lt;br /&gt;I finally got round to rigging up a script &lt;sup&gt;[1]&lt;/sup&gt; to calculate the actual and Hardy-Weinberg probabilities for a drifting system and export them to .csv (the most easily-manipulated spreadsheet format). When I plotted them (after working all the damn bugs out of the script, anyway), I was actually seriously impressed at what I saw:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7798/3224/1600/gendriftHW.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7798/3224/320/gendriftHW.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is with two allele types, 100 "organisms" (so 100 pairs of alleles), and 100 generations. I started off the two allele types on even frequencies - 50*AA and 50*BB genomes - which accounts for the spike at the beginning but, subsequently, the Hardy-Weinberg values followed the actual values almost perfectly. I'd expected that this would be the case, but there's a difference between expecting something and actually seeing it on a pretty graph.&lt;br /&gt;&lt;br /&gt;Aren't computers wonderful?&lt;br /&gt;&lt;br /&gt;[1] http://coalescent.freewebpage.org/popgen/gendrift3.py - this one was a complete bastard to get right so I hope you like it...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115125043319238849?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115125043319238849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115125043319238849' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115125043319238849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115125043319238849'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/hardy-weinberg-under-stress.html' title='Hardy-Weinberg under stress'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115115473626908517</id><published>2006-06-24T13:20:00.000+01:00</published><updated>2006-06-24T14:12:16.363+01:00</updated><title type='text'>Eyes on the future: Dawg</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;It's all too easy to become bogged down in the slow grind of learning and lose sight of the prize: scientific competence. As such, in addition to as many references to scientific papers as I can cram into my posts, I'll be sticking up a number of posts on things that are way too advanced for me to even seriously think about. Hey, if I was gonna let that stop me, I'd have quit learning after GCSEs ;)&lt;br /&gt;&lt;br /&gt;Today, I'm going to look at a program called Dawg. This was created as part of the dissertation of the newly-doctorated &lt;a href="http://www.pandasthumb.org/archives/2006/06/thats_dr_reed_c.html"&gt;Dr Reed Cartwright&lt;/a&gt;. According to the comments on that page, the dissertation is unlikely to be made publically available, but a chunk of it has been &lt;a href="http://bioinformatics.oxfordjournals.org/cgi/reprint/21/Suppl_3/iii31?ijkey=quj1SRFSdPjQzdH&amp;keytype=ref"&gt;published&lt;/a&gt; in the Journal of Bioinformatics.&lt;br /&gt;&lt;br /&gt;Why am I looking at this program? Because conceptually it functions in a very similar fashion to my pathetic little genetic drift simulation. Obviously it's a heck of a lot more powerful, and infinitely more biologically plausible.&lt;br /&gt;&lt;br /&gt;So why are smart folk like Dr Cartwright working on stuff like this? Well, I think he says it very well himself in the "Motivation" section of the paper's abstract:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simulations useful for studying the impact of insertions, deletions and alignments on phylogenetic accuracy.&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;In short, it's hard to tell how well our methods work in biologically-realistic situations without biologically-realistic data to feed them. &lt;br /&gt;&lt;br /&gt;The killer feature of Dawg, as I understand it, is that, unlike most such programs, it models insertion and deletion properly. To whit: rather than just inserting or deleting single bases (or ignoring so-called "indels" completely), it handles them in clusters of a biologically-realistic size. A &lt;a href="http://www.cs.unc.edu/~vivek/home/stenopedia/zipf/"&gt;Zipf distribution&lt;/a&gt;, to be precise. And I didn't know what one of those was until 2 minutes ago, but I'm sure Dawg models them wonderfully :)&lt;br /&gt;&lt;br /&gt;I'll end here for the moment, because beyond going "ooh, shiny" there's not a lot more I can say about Dawg. However, I have a plan. As part of a computer project for my maths course, I was required to produce a program that would determine the best sequence alignment of two mutated strings (for example, the same gene in different species). Actually, that was part of what got me so interested in CB again. I still have that program, and it makes a wonderful test case for this stuff, so over the coming weeks I'll talk you through the theory behind it - and then test it on Dawg to see how well it does.&lt;br /&gt;&lt;br /&gt;Watch this space, folks!&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;This is off-topic but, if any of you have trouble compiling Dawg, you're not alone. After consultation with the local compscis, I determined that one such installation problem could be fixed as follows:&lt;br /&gt;&lt;br /&gt;In the file var.h, on lines 116 and 159 (or thereabouts) you'll see a function being used that's called "min". Replace each instance of the word "min" with "std::min".&lt;br /&gt;&lt;br /&gt;I've just noticed that this is actually fixed in the current release (coulda sworn that wasn't the case a week ago - they must have noticed since then).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115115473626908517?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115115473626908517/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115115473626908517' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115115473626908517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115115473626908517'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/eyes-on-future-dawg.html' title='Eyes on the future: Dawg'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115114831136671728</id><published>2006-06-24T11:12:00.000+01:00</published><updated>2006-06-24T17:52:14.383+01:00</updated><title type='text'>Genetic Drift and the Neutral Theory</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;You'll hopefully recall that, in the &lt;a href="http://naturesnumbers.blogspot.com/2006/06/genetic-drift-simulation.html"&gt;last post&lt;/a&gt; on Population Genetics, I provided a short script for simulation of genetic drift. Hopefully you've all been happily playing away with it, trying to figure out the rule for how likely it is that the lone B allele will be fixed instead of one of the A alleles.&lt;br /&gt;&lt;br /&gt;The maths is actually depressingly easy for this one.&lt;br /&gt;&lt;br /&gt;First, assume that one of the allele types in the population will eventually go to fixation. This is not a crazy assumption - you can show quite easily that the average heterozygosity decays exponentially. This of course is equivalent to the assumption that at some point one of the original alleles will be the great-granddaddy of the entire population.&lt;br /&gt;&lt;br /&gt;Now note that each allele has an equal chance of granddaddy-hood. So the chance for any one of them must be 1/(population size). In the case with a population size of 100 and one B allele, the chance of the B type becoming fixed is therefore 0.01. QED.&lt;br /&gt;&lt;br /&gt;I rigged a Python script &lt;sup&gt;[1]&lt;/sup&gt; to estimate the fixation probabilities for certain population sizes (I'll publish it when I get round to sorting out webspace). It takes a ridiculous length of time for large populations, but its first three results were:&lt;br /&gt;&lt;ul&gt;Pop. Size: 10&lt;br /&gt;P(B fixed): 0.090000&lt;br /&gt;Pop. Size: 100&lt;br /&gt;P(B fixed): 0.010000&lt;br /&gt;Pop. Size: 1000&lt;br /&gt;P(B fixed): 0.001300&lt;/ul&gt;&lt;br /&gt;Magic.&lt;br /&gt;&lt;br /&gt;We can see that, in small populations, genetic drift will mutate the group at a high rate &lt;sup&gt;[2]&lt;/sup&gt;, whereas, with larger populations, its pace is glacial. This pretty much matches our intuition - for the effects of too small a population size, just consider Cletus the Slack-Jawed Yokel :)&lt;br /&gt;&lt;br /&gt;[1] See http://coalescent.freewebpage.org/popgen/gendrift2.py - I'd link to it except that the crappy free service I'm using dislikes what it sees as hotlinking.&lt;br /&gt;&lt;br /&gt;[2] It turns out, as a corollary of the heterozygosity calculations alluded to earlier, that the time to fixation is also proportional to population size, further reinforcing the point.&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Neutral Theory&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The Neutral Theory states that the majority of genetic change is caused by this genetic drift. Now, anyone who's ever debated a creationist will spot the obvious problem here - if this were the case, their accusation that "evolution is random" would actually be accurate. Obviously natural selection plays some role. However, the Neutral Theory acts as a brilliant null hypothesis for seeing whether natural selection has occurred.&lt;br /&gt;&lt;br /&gt;And actually it turns out to be fairly accurate in a variety of circumstances. One particular case where it's easy to test the neutral hypothesis is that of silent mutations - changes in DNA that don't actually affect the expressed protein. Long story short, the predictions of the Neutral Theory regarding the number of mutations that will be accumulated in a given time period fit a wide number of cases pretty much perfectly. &lt;br /&gt;&lt;br /&gt;One &lt;i&gt;really interesting&lt;/i&gt; consequence of all this is that we can actually make good guesses as to which DNA sequences in, say, humans are likely to have some unknown purpose. We can do this by asking the palaeontologists when our last common ancestor was with, say, rats and figure out which chunks of DNA have been better conserved than we'd expect. See the paper &lt;a href="http://www.soe.ucsc.edu/~jill/papers/science04.pdf"&gt;"Ultraconserved elements in the human genome"&lt;/a&gt; for more detail.&lt;br /&gt;&lt;br /&gt;Pretty damn cool.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115114831136671728?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115114831136671728/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115114831136671728' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115114831136671728'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115114831136671728'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/genetic-drift-and-neutral-theory.html' title='Genetic Drift and the Neutral Theory'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115114050029559647</id><published>2006-06-24T09:55:00.000+01:00</published><updated>2006-06-24T12:09:00.763+01:00</updated><title type='text'>Ooh, pretty pictures</title><content type='html'>&lt;i&gt;(Structural Biology)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Well, I just went out and bought a couple of textbooks. One of which styles itself as an introduction to proteomics (like genomics except for proteins). As such, the point where I have interesting things to post about in the Structural Biology dept is drawing closer.&lt;br /&gt;&lt;br /&gt;There's one thing I need to cover first, though. Those pretty pictures of molecules that you see in textbooks? I need to be able to produce them. Fortunately there's an easy way to do this.&lt;br /&gt;&lt;br /&gt;See, all these efforts to determine the shapes of various molecules can be summarised in a single file, generally a .pdb file (although there are many other formats). So you can go to, for example, Georgia State University's &lt;a href="http://chemistry.gsu.edu/glactone/PDB/pdb.html"&gt;PDB page&lt;/a&gt;, download a few examples, and muck about with them using a suitable image viewer.&lt;br /&gt;&lt;br /&gt;I haven't experimented much with the variety of viewers on offer, but the one I've been using so far seems good. It's called &lt;a href="http://pymol.sourceforge.net/"&gt;PyMOL&lt;/a&gt; (you can tell I'm a Python nut, can't you...). It's probably not as polished as a commercial viewer would be, but it's open source and pretty damn good at what it does. It also has lots of little extensions and scripting handles that I can't for the life of me figure out how to use. You can do &lt;i&gt;movies&lt;/i&gt; with this thing.&lt;br /&gt;&lt;br /&gt;Have a pretty picture.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7798/3224/1600/haemoglobin.1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/7798/3224/320/haemoglobin.1.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The above picture is of a molecule of haemoglobin - I picked the PDB file pretty much at random from the GSU site mentioned above. &lt;br /&gt;&lt;br /&gt;I currently have the book "Introduction to Protein Science" sitting on my desk next to me. This time, when I glanced at the cover, I thought "hmm... that molecule looks vaguely familiar". Then I took a &lt;a href="http://www.amazon.co.uk/gp/reader/0199265119/ref=sib_dp_pt/203-4692592-9933529#reader-link"&gt;closer look&lt;/a&gt;...&lt;br /&gt;&lt;br /&gt;In the words of Nelson Munz: haha!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115114050029559647?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115114050029559647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115114050029559647' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115114050029559647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115114050029559647'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/ooh-pretty-pictures.html' title='Ooh, pretty pictures'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115110231156174693</id><published>2006-06-23T23:06:00.000+01:00</published><updated>2006-06-24T17:56:48.573+01:00</updated><title type='text'>Genetic drift simulation</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;OK, I know I promised an examination of the Neutral Theory next, but on reflection there's something I need to cover first. The Neutral Theory examines the behaviour of a genetic drift model of allele distribution, so it would be helpful to provide a short program to simulate this.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;(Note that we've moved from examining genotype frequencies to examining allele frequencies. That's because this requires marginally less thought. Once you've got your allele frequencies, I imagine you could use the Hardy-Weinberg law to determine the corresponding genotype frequencies. &lt;br /&gt;&lt;br /&gt;Actually, that's something I'll have to check at some point...)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The following program is written in the free, open-source programming language &lt;a href="http://www.python.org"&gt;Python&lt;/a&gt;. I love this language. It's like QBasic would be if QBasic was cool, or like Lisp would be if Lisp was useable. &lt;br /&gt;&lt;br /&gt;The program is very simple - it just randomly selects a new generation from the old generation, and repeats. This doesn't sound terribly meaningful, but it already raises a few interesting questions. For example, if all the alleles in a generation are the same, that allele type is said to be fixed - within this model it's impossible for the next generation to contain any other allele type. But given a starting population containing a mix of allele types, what's the probability of any one of them becoming fixed?&lt;br /&gt;&lt;br /&gt;I'll give the answer to this next time, but in the meantime why not play around with the program and see if you can figure out the answer yourself?&lt;br /&gt;&lt;br /&gt;The program:&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;#!/usr/bin/env python&lt;br /&gt;&lt;br /&gt;import random&lt;br /&gt;&lt;br /&gt;# Select a starting population. I'm going to go for one that's homozygous except for a single differing allele (a mutation?)&lt;br /&gt;&lt;br /&gt;popsize = 100&lt;br /&gt;alleles = ["A" for ii in range(popsize-1)] + ["B"]&lt;br /&gt;&lt;br /&gt;generations = 1000 # number of times to iterate&lt;br /&gt;&lt;br /&gt;for ii in range(generations):&lt;br /&gt;&lt;ul&gt;newalleles = []&lt;br /&gt;    for jj in range(len(alleles)):&lt;br /&gt;    &lt;ul&gt;newalleles.append(random.choice(alleles))&lt;/ul&gt;&lt;br /&gt;    alleles = newalleles&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;print alleles&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;I'll probably do a better version at some point, with decent input and output control. However, that's way too long-winded for a blog. When I've done it I'll stick it on some webspace somewhere and link to it. I also intend to produce some decent graphs of the variation over time of alleles.&lt;br /&gt;&lt;br /&gt;Note: if you're on Windows, you'll want to strip out the first line ("#!/usr/bin/env python" etc). That's a Unix-only thing.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Marginally better version available here: http://coalescent.freewebpage.org/popgen/gendrift.py&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115110231156174693?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115110231156174693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115110231156174693' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115110231156174693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115110231156174693'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/genetic-drift-simulation.html' title='Genetic drift simulation'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115109795974264000</id><published>2006-06-23T18:29:00.000+01:00</published><updated>2006-06-24T10:18:03.843+01:00</updated><title type='text'>The Hardy-Weinberg law</title><content type='html'>&lt;i&gt;(Population Genetics)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;One of the earliest, and simplest, discoveries of population genetics was the Hardy-Weinberg law. This discusses the behaviour of a population under the following circumstances:&lt;br /&gt;&lt;ul&gt;   &lt;li&gt;The population is effectively infinite&lt;/li&gt;   &lt;li&gt;Mating is completely random&lt;/li&gt;   &lt;li&gt;All the organisms have the same survival chance&lt;/li&gt;   &lt;li&gt;Chromosomes come in pairs, both elements of which behave identically&lt;/li&gt;   &lt;li&gt;There are two alleles competing for the same spot, both of which behave identically&lt;/li&gt; &lt;/ul&gt; Call these alleles A&lt;sub&gt;1&lt;/sub&gt; and A&lt;sub&gt;2&lt;/sub&gt;. Then the three possible genotypes are A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;, A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt; and A&lt;sub&gt;2&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;, where for example A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt; means you have an A&lt;sub&gt;1&lt;/sub&gt; allele on one chromosome and an A&lt;sub&gt;2&lt;/sub&gt; allele on the other. What we're aiming to do is to figure out how the ratios of these genotypes to each other varies over time.&lt;br /&gt;&lt;br /&gt;So, say the starting genotype frequencies are x&lt;sub&gt;11&lt;/sub&gt; for A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;12&lt;/sub&gt; for A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt; and x&lt;sub&gt;22&lt;/sub&gt; for A&lt;sub&gt;2&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;, with x&lt;sub&gt;11&lt;/sub&gt;+x&lt;sub&gt;12&lt;/sub&gt;+x&lt;sub&gt;22&lt;/sub&gt;=1. We now have to do a lengthy calculation to figure out the probability of a randomly-chosen member of the next generation having each genotype. Call these frequencies y&lt;sub&gt;11&lt;/sub&gt;, y&lt;sub&gt;12&lt;/sub&gt; and y&lt;sub&gt;22&lt;/sub&gt;.&lt;br /&gt;&lt;br /&gt;For a child having genotype A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;, the only way to produce it is with an A&lt;sub&gt;1&lt;/sub&gt; allele from both parents. The chance of getting an A&lt;sub&gt;1&lt;/sub&gt; allele from an A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt; parent is 1, the chance of getting it from an A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt; parent is 1/2 and the chance from an A&lt;sub&gt;2&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt; parent is 0. The probabilities work out as follows:&lt;br /&gt;&lt;ul&gt;Parent 1 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;, parent 2 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;: child is alway A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;&lt;br /&gt;Parent 1 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;, parent 2 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;: child is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt; half the time&lt;br /&gt;Parent 1 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;, parent 2 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;: child is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt; half the time&lt;br /&gt;Parent 1 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;, parent 2 is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;: child is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt; a quarter of the time&lt;br /&gt;Either parent is A&lt;sub&gt;2&lt;/sub&gt;A&lt;sub&gt;2&lt;/sub&gt;: child is never A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;So, totalling these up, the probability that a given child is A&lt;sub&gt;1&lt;/sub&gt;A&lt;sub&gt;1&lt;/sub&gt; is going to be:&lt;br /&gt;&lt;ul&gt;y&lt;sub&gt;11&lt;/sub&gt; = 1.x&lt;sub&gt;11&lt;/sub&gt;.x&lt;sub&gt;11&lt;/sub&gt; + 1/2.x&lt;sub&gt;11&lt;/sub&gt;.x&lt;sub&gt;12&lt;/sub&gt; + 1/2.x&lt;sub&gt;11&lt;/sub&gt;.x&lt;sub&gt;12&lt;/sub&gt; + 1/4.x&lt;sub&gt;12&lt;/sub&gt;.x&lt;sub&gt;12&lt;/sub&gt;&lt;br /&gt;&lt;ul&gt;= (x&lt;sub&gt;11&lt;/sub&gt; + 1/2.x&lt;sub&gt;12&lt;/sub&gt;)&lt;sup&gt;2&lt;/sup&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Now, one important thing to note is that x&lt;sub&gt;11&lt;/sub&gt; + 1/2*x&lt;sub&gt;12&lt;/sub&gt; is actually just the probability of getting an A&lt;sub&gt;1&lt;/sub&gt; allele if you select randomly from the population of alleles. So if you call this probability p, you get the result: y&lt;sub&gt;11&lt;/sub&gt; = p&lt;sup&gt;2&lt;/sup&gt;. Similarly, y&lt;sub&gt;22&lt;/sub&gt; = q&lt;sup&gt;2&lt;/sup&gt; where q=1-p. And since (p+q)&lt;sup&gt;2&lt;/sup&gt; = p&lt;sup&gt;2&lt;/sup&gt; + 2.p.q + q&lt;sup&gt;2&lt;/sup&gt; = 1, we have y&lt;sub&gt;12&lt;/sub&gt; = 2.p.q.&lt;br /&gt;&lt;br /&gt;So we have our new frequencies p&lt;sup&gt;2&lt;/sup&gt;, 2.p.q and q&lt;sup&gt;2&lt;/sup&gt;. The obvious question is: what happens in the next generation? Well, that's the cool bit. These probabilities are only dependent on the frequencies of the alleles in the previous generation, &lt;i&gt;not of the genotypes&lt;/i&gt;. And since the frequency of each allele hasn't changed a whit (check if you like), &lt;i&gt;every subsequent generation&lt;/i&gt; will also have this exact same distribution of genotypes.&lt;br /&gt;&lt;br /&gt;That is the Hardy-Weinberg law.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;br /&gt;Of course, the Hardy-Weinberg law is talking about a very idealised situation. It's interesting to look at what happens if we discard each assumption in turn:&lt;br /&gt;&lt;ul&gt;   &lt;li&gt;If the population is not infinite, random fluctuations will start to come into play. The result is the &lt;u&gt;neutral theory&lt;/u&gt; of evolution, which I'll hopefully cover next time.&lt;/li&gt;   &lt;li&gt;If the mating is not completely random, the equilibrium is generally still reached - but it takes longer. For example, if there are two sexes with different initial genotype frequencies, the HW frequencies take &lt;i&gt;two&lt;/i&gt; generations to reach (proof left as an exercise to the reader).&lt;/li&gt;   &lt;li&gt;If not all the organisms have the same survival/reproduction chance, and especially if that chance is linked to the organism's choice of alleles, an allele can be selected out of existence. This is the basis of evolutionary theory, and will be covered at some point.&lt;/li&gt;   &lt;li&gt;If the chromosomes come in threes or above, or there are more than two alleles to choose from, equations similar to the HW law can easily be formed (again, an exercise for the reader).&lt;br /&gt; &lt;/li&gt; &lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115109795974264000?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115109795974264000/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115109795974264000' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115109795974264000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115109795974264000'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/hardy-weinberg-law.html' title='The Hardy-Weinberg law'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115106240605091113</id><published>2006-06-23T14:16:00.000+01:00</published><updated>2006-06-23T18:28:35.930+01:00</updated><title type='text'>A little background</title><content type='html'>I thought I'd give a little background on where my interest in CB comes from. Without further ado:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why I'm interested in &lt;/span&gt;&lt;i style="font-weight: bold;"&gt;CB&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;CB is an incredibly cool field at the moment. Data is turning up by the truckload. New techniques and cool tricks are being developed at a rate that AFAICT is actually accelerating. There's thousands of new ideas and unanswered questions floating around to play with. More will no doubt emerge as computers continue to get more and more powerful. This is all extremely neat stuff, and I want in.&lt;br /&gt;&lt;br /&gt;Even if I were to eventually lose interest in academia, that wouldn't be the end of my involvement - the field of bioinformatics is booming. There's a reason why the Cambridge course syllabus includes three months in industry. This wouldn't be the case if I stuck with my other love of pure maths - pure mathematicians who actually want to do maths are basically stuck in academia.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why &lt;/span&gt;&lt;i style="font-weight: bold;"&gt;I'm&lt;/i&gt;&lt;span style="font-weight: bold;"&gt; interested in CB&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It all started off when I was about 13. A bunch of my friends were attending a Christian youth club, and I went along for the ride. Lovely people. Staunch Young Earth Creationists. Being the sort who asks perpetual irritating questions, I had wonderful fun arguing with them over the years.&lt;br /&gt;&lt;br /&gt;When I hit uni, that stopped for a time. However, a couple years later on, I came across a link to the Kitzmiller vs. Dover trial, which involved a form of creationism called Intelligent Design. ID claims to be mainly mathematical, so I figured that as a mathematics student and a happy believer in evolution it was my bounden duty to read up on it. I did so and discovered that it was as terrible as I'd expected and then some, but in the course of arguing with ID proponents I discovered something that I'd never really realised before: biology is in fact extremely cool.&lt;br /&gt;&lt;br /&gt;At school, they don't tell you much about the quantifiable, rigorous parts of biology - population genetics, evolutionary genetic algorithms, neural networks, protein folding, etc. Where they did tell you about it the information got delivered in a droning monotone, heavily watered down by shedloads of factoids. As a mathematician by inclination, that substantial absence of patterns in what we were doing really put me off, and I went for Chemistry instead (out of the frying pan...). So all this cool stuff you could do was something of a shock.&lt;br /&gt;&lt;br /&gt;As I read, my mathematics background kept coming to the fore - I'd come up with interesting mathematical questions to puzzle over. I then kept discovering that the majority of them had already been answered - mostly by computational biologists. I think it was when I found myself absent-mindedly trying to compose an atom-by-atom image of a complete bacterial flagellum that I realised CB was for me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115106240605091113?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115106240605091113/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115106240605091113' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106240605091113'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106240605091113'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/little-background.html' title='A little background'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115106728842206300</id><published>2006-06-23T13:36:00.000+01:00</published><updated>2006-06-23T13:54:48.430+01:00</updated><title type='text'>Field page: Systems Biology</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Our cells contain some of the most complicated chains of chemical reactions known to man. These are used to produce necessary building blocks, to control motion, and to power the cell, among other things. It would be extremely cool to know how these complex cycles worked.&lt;br /&gt;&lt;br /&gt;That's where Systems Biology comes in. SBists study the reactions that make up the cell's normal behaviour and examine how they interact and interfere. The upshot is a wealth of information into how cells actually function, with a corresponding ton of ideas for us humans to use in our chemical techniques.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So how do all those cool proteins and chemicals translate into an actual working cell?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet&lt;br /&gt;&lt;br /&gt; &lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115106728842206300?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115106728842206300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115106728842206300' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106728842206300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106728842206300'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-systems-biology.html' title='Field page: Systems Biology'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115106616438902049</id><published>2006-06-23T13:17:00.000+01:00</published><updated>2006-08-06T22:06:52.076+01:00</updated><title type='text'>Field page: Population Genetics</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Population genetics concerns the application of mathematical and statistical techniques to the DNA of groups of organisms. As with bioinformatics, there's no real interest in what the sequence does; rather, the focus is on what the distribution of sequences in a taxon signifies about that taxon's evolutionary past.&lt;br /&gt;&lt;br /&gt;The primary outcome of population genetics is to produce what is known as a coalescent - a diagram of how the taxon evolved based on its DNA. Ideally, this would correlate well with any fossil evidence of said evolution. Achieving this is surprisingly difficult, and a vast number of different models have been proposed for each and every aspect of the process.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, we've got sequence information from a number of organisms - now how do they relate to each other?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;"Population Genetics: a concise guide" - John H. Gillespie (very well-written). Only partially read.&lt;br /&gt;&lt;a href="http://scienceblogs.com/gnxp/2006/08/how_long_until_fixation.php"&gt;This thread&lt;/a&gt; at Gene Expression contains a lot of useful book suggestions that I'll have to try out.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/hardy-weinberg-law.html"&gt;The Hardy-Weinberg law&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/genetic-drift-simulation.html"&gt;Genetic Drift simulation&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/genetic-drift-and-neutral-theory.html"&gt;Genetic drift and the Neutral Theory&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/eyes-on-future-dawg.html"&gt;Eyes on the future: Dawg&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/hardy-weinberg-under-stress.html"&gt;Hardy-Weinberg under stress&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/merits-of-models.html"&gt;The merits of models&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/how-much-is-too-much.html"&gt;How much is too much?&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/cry-for-help.html"&gt;A cry for help&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/last-gift.html"&gt;A last gift&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115106616438902049?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115106616438902049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115106616438902049' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106616438902049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106616438902049'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-population-genetics.html' title='Field page: Population Genetics'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115106498925769929</id><published>2006-06-23T13:11:00.000+01:00</published><updated>2006-06-23T13:16:29.266+01:00</updated><title type='text'>Field page: Computational Neuroscience</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Computational Neuroscience is concerned with an age-old question - how does intelligence work? It approaches the issue from a physiological perspective, by examining how neurons behave individually and seeing what effects develop when you wire a load of them up into a neural net. This produces some very cool emergent effects, which are extremely useful in the field of artificial intelligence.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;How do brains work, and what can we do with that knowledge?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet (I have a copy of Feynman on Computation, but it's many miles away and largely unread)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115106498925769929?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115106498925769929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115106498925769929' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106498925769929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115106498925769929'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-computational-neuroscience.html' title='Field page: Computational Neuroscience'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115102819638723648</id><published>2006-06-23T02:56:00.000+01:00</published><updated>2006-06-25T21:39:37.373+01:00</updated><title type='text'>Field page: Functional Genomics</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Functional Genomics is the down-and-dirty field of the CB family. It involves the experimental analysis of DNA molecules, in terms of both sequence and function. As with most areas of biotech, Functional Genomics has taken off in leaps and bounds recently.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What can we experimentally determine about these DNA molecules?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/microwhatnows.html"&gt;Microwhatnows?&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/ribolicious.html"&gt;Ribolicious&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115102819638723648?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115102819638723648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115102819638723648' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102819638723648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102819638723648'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-functional-genomics.html' title='Field page: Functional Genomics'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115102714115917700</id><published>2006-06-23T02:27:00.000+01:00</published><updated>2006-07-23T16:11:40.913+01:00</updated><title type='text'>Field page: Bioinformatics</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ever since Watson and Crick, we've been working to more fully understand the genetic code that is inside each of us. This has led to massive sequencing efforts such as the Human Genome Project. There's a problem, though. The amount of data is actually too big to do much with. It would be somewhat implausible to, for example, test every single bit of DNA to find out what it did.&lt;br /&gt;&lt;br /&gt;That's where bioinformatics steps in. As with mathematical Information Theory, it more or less ignores the actual function of the DNA in question, instead working solely with ideas and concepts that either can be deduced de novo from sequence data or take very little extra knowledge.&lt;br /&gt;&lt;br /&gt;Aside: apparently, depending on one's definition, bioinformatics can be used as a synonym for CB. Obviously that's not how I'm using it here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OK, so we've got all this sequence data - now what can we do with it without actually having to triple our departmental budget for labwork?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;None yet&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/real-life-academic-ethics-dilemma.html#links"&gt;A real-life academic ethics dilemma&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/07/surprisingly-non-scary.html"&gt;Surprisingly non-scary&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/07/sequence-comparison.html"&gt;Sequence comparison&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115102714115917700?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115102714115917700/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115102714115917700' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102714115917700'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102714115917700'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-bioinformatics.html' title='Field page: Bioinformatics'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115102597910890329</id><published>2006-06-23T02:14:00.000+01:00</published><updated>2006-07-02T12:34:23.776+01:00</updated><title type='text'>Field page: Structural Biology</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Description&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The chemical behaviour of proteins is very strongly determined by their physical structure. For example, there are many enzymes that cease to function if their shape is distorted (Wikipedia has a rather nice &lt;a href="http://en.wikipedia.org/wiki/Image:Comp_inhib3.png"&gt;diagram&lt;/a&gt; of this). As such, it's vitally important that we be able to figure out what shape proteins are.&lt;br /&gt;&lt;br /&gt;Unfortunately, this turns out to be horrendously tricky. Proteins fold up in a fashion which is (usually) effectively deterministic, but which is so complicated that it takes the fastest computers and massive amounts of research to figure out what the damn things are going to look like. This research is collectively known as Structural Biology&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Problem Solved&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Given a protein sequence, how will it fold?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Books/Resources&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;"Introduction to Protein Science" - Arthur M. Lesk. Actually, this covers most CB fields, but it's the Structural Biology part I'm interested in. Barely started reading.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Posts&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/ooh-pretty-pictures.html"&gt;Ooh, pretty pictures&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115102597910890329?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115102597910890329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115102597910890329' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102597910890329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102597910890329'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/field-page-structural-biology.html' title='Field page: Structural Biology'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115102426939562393</id><published>2006-06-23T01:38:00.000+01:00</published><updated>2006-07-23T16:12:38.380+01:00</updated><title type='text'>Syllabus</title><content type='html'>As I mentioned in the last post, my course of learning will be mostly guided by the &lt;a href="http://www.ccbi.cam.ac.uk/Education/MPhil/structure.php"&gt;course&lt;/a&gt; I was applying to. They cover the following major areas of CB:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-structural-biology.html"&gt;Structural biology&lt;/a&gt;&lt;/li&gt;  &lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-bioinformatics.html"&gt;Bioinformatics&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-functional-genomics.html"&gt;Functional genomics&lt;/a&gt;&lt;/li&gt;  &lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-computational-neuroscience.html"&gt;Computational neuroscience&lt;/a&gt;&lt;/li&gt;  &lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-population-genetics.html"&gt;Population genetics&lt;/a&gt;&lt;/li&gt;  &lt;li&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/field-page-systems-biology.html"&gt;Systems biology&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt; The discerning reader may note that Computational Neuroscience has apparently nothing to do with the other subjects, and the relevance of Systems Biology is somewhat tenuous. That's the great thing about CB - it's got bits in from all over :)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Other Areas&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There's some CB-related stuff I want to discuss that for one reason or another isn't mentioned in the syllabus. I'll list it here:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Statistics&lt;/li&gt;  &lt;li&gt;Genetic Algorithms&lt;/li&gt; &lt;/ul&gt; &lt;span style="font-weight: bold;"&gt;Field pages&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'll be creating a page for each course/field*, which will briefly summarise what these fields are about as I understand them. These summaries will contain (at a minimum) the following information:&lt;br /&gt;&lt;ul&gt;   &lt;li&gt;What problem the field is intended to address&lt;/li&gt;   &lt;li&gt;A brief description of how the field accomplishes this&lt;/li&gt;   &lt;li&gt;A list of the books I have read or am reading on the subject&lt;/li&gt;   &lt;li&gt;A list of the posts I've written on the subject&lt;br /&gt; &lt;/li&gt; &lt;/ul&gt; The field pages will be heavily edited as time goes by. In particular, the lists of books and posts will necessarily be extremely short or nonexistent to start with.&lt;br /&gt;&lt;br /&gt;* On reflection, since I'm not going to actually be exploring these subjects in the context of a degree course, it's more appropriate to describe them as fields than as courses.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Uncategorised&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Some posts do not fit into the above structure. In particular, personal posts have no home to call their own. They'll be listed here, at least until there's too many for this to be convenient.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/welcome-to-my-world.html"&gt;Welcome to my world&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/little-background.html"&gt;A little background&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/real-life-academic-ethics-dilemma.html"&gt;A real-life academic ethics dilemma&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/06/well-thats-last-youll-see-of-me-for.html"&gt;Well, that's the last you'll see of me for a week&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/07/personal-update.html"&gt;Personal update&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naturesnumbers.blogspot.com/2006/07/back-in-business.html"&gt;Back in business&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115102426939562393?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115102426939562393/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115102426939562393' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102426939562393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102426939562393'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/syllabus.html' title='Syllabus'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-30125952.post-115102195955747542</id><published>2006-06-23T01:05:00.000+01:00</published><updated>2006-06-23T01:36:49.383+01:00</updated><title type='text'>Welcome to my world</title><content type='html'>This is just a quick introductory post to let anyone reading this know what the point of this blog is.&lt;br /&gt;&lt;br /&gt;The long version: I'm a maths student who recently developed a fetish for computational biology. The problem here is that, whilst I love the concept, I know very little about the actual details of the subject. To rectify the situation, I need to put myself through some kind of vaguely focused course of study. &lt;br /&gt;&lt;br /&gt;Option A was to take a course in the subject. I had my eye set on one in particular. However, today I got my finals results and discovered that I probably don't have a good enough class of degree to get into the course. Hence option B.&lt;br /&gt;&lt;br /&gt;Option B is to voraciously read everything in sight, to do as much experimentation as possible (the advantage of CB being that this can often be done at a standard desktop), and to write it all up. That's what this blog is for. It will be a horribly-unstructured mass of subjects that strike me as complex, interesting or otherwise blog-worthy. I guarantee the accuracy of absolutely nothing in here, and any errors are certainly the fault of myself, rather than my sources.&lt;br /&gt;&lt;br /&gt;The short version: ooh! Cool computerey stuff!&lt;br /&gt;&lt;br /&gt;I'll wait for a few posts before I sort out the blogroll, start trackbacking stuff, and otherwise make my presence felt. This is because I have an occasional regrettable tendency to start something then baulk when I realise that it's going to require actual effort. If that happens here, I'd rather people not know about it in the first place :)&lt;br /&gt;&lt;br /&gt;Disclaimer: I do have other web identities, as well as a real-world one. I will attempt to keep this identity as separate as possible from the others. The limiting factor here will be the fact that my starting point in study of CB is going to be the syllabus of the &lt;a href="http://www.ccbi.cam.ac.uk/Education/MPhil/"&gt;course&lt;/a&gt; I was applying to.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/30125952-115102195955747542?l=naturesnumbers.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://naturesnumbers.blogspot.com/feeds/115102195955747542/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=30125952&amp;postID=115102195955747542' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102195955747542'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/30125952/posts/default/115102195955747542'/><link rel='alternate' type='text/html' href='http://naturesnumbers.blogspot.com/2006/06/welcome-to-my-world.html' title='Welcome to my world'/><author><name>Coalescent</name><uri>http://www.blogger.com/profile/00951149322490275133</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry></feed>
