Friday, December 5, 2008

Polya-style write-up: representing a language with binary strings with even 0s and odd 1s

This question came up in class during the lecture where we were introduced to DFSA: Find a regular expression that denotes the language of binary strings with an even number of zeros and an odd number of ones, and prove that it denotes that language. My Polya-style write-up will attempt the latter part, using the regex:

R = (00+11+(01+10)(00+11)*(10+01))*(1+(01+10)(11+00)*0)

This seemed like a good problem to apply the Polya process to, since at the time I was working on it, it was the first such proof I had done. Hence, I was working a priori in a way, and it wasn't just a case of following a familiar, mechanical process, and so I made more interesting mistakes.

Understanding the Problem:


My first insight into this type of proof was that I would have to prove two things to show that R denotes the language in question (I'll call it L' from now on, for convenience). I would have to show that all strings generated by R are in L', and that all strings in L' can be generated by R. The way I like to think of this now, that makes it even more clear, is that to prove L(R) = L', I can show that L(R) is both a subset and a superset of L'.

An easy way to convince myself that neither criterion alone is enough would be to consider regexes like 001 which satisfies the former, or (0+1)* which satisfies the latter. It's abundantly clear that neither regex denotes the language L' though.

Making a Plan 1.0:


My first attempt at showing that L' is a subset of L(R), involved taking an arbitrary string from L' and then breaking it up into two smaller strings based on finding the rightmost 1. I think my thinking at the time was that those two substrins would be easier to deal with, since I would be left with a substring containing only 0's, and a substring where the final symbol is known.

Carrying out the Plan 1.0:

Recorded below is my initial attempt at solving the problem, warts and all. I've put it in small font so as not to clutter things up too much, so feel free to zoom in if reading my mistakes in detail interests you):

To prove that all strings, s, in L' can be generated by R. For an arbitrary s:

Find the rightmost 1. We know at least one must exist, since 1 is the smallest odd number.

Case 1: the rightmost one is followed by an odd number of zeros.
Then s = a0(00)*, where a is a string with odd 1s and odd 0s.
a is a member of L((00+11+(01+10)(00+11)*(10+01))*)
a is an arbitrary string with even zeros and ones.
let a_0 be the first pair of letters in a (if a is empty, then by the kleene star, the regex can form it, otherwise it must have even length)
Subcase 1: a_0 is 00 or 11. (00+11)* is a subset of the regex, so a_0 can be formed.
Subcase 1: a_0 is 01 or 10.
Then subsequent pairs of letters are either in (00+11) or (01+10). If we eventually reach one in the latter set, then it is formed by (01+10)(00+11)*(10+01) which is a subset of the regex.
We know that eventually it does reach one in that set though, since by our assumption a has an even number of zeros and ones. We started with an odd number of both, and every time a pair in (00+11) is encountered, the parity does not change.

So if a is not of length zero, it can either be expressed as (00+11)b or (01+10)(00+11)*(01+10)b where b is some string. Since b is preceded by a string with even zeros and ones, and therefore must have the same property (if j is even, and j + k is even, then k must be even, for natural j and k).

We can apply the same process to substring b, and continue the process until we get the empty string (which we must since the lengths of the suffix strings are a set of decreasing natural numbers).

Case 2: The rightmost 1 is followed by an even number of zeros.

Then s can be expressed as a1(00)*, where a is some string with an even number of zeros and ones.
Since L(1(00)*) is a subset of L(1+(01+10)(11+00)*0)
Subcase 1: 1 is followed by zero zeros. L(1) is a subset of L(1+(01+10)(11+00)*0) so it can be formed.
Subcase2: 1 is followed by a non-zero, even number of zeros. Then it is in L(10(00*)0), which is a subset of L(1+(01+10)(11+00)*0)

By the proof above, a can be expressed by the regex (00+11+(01+10)(00+11)*(10+01))
*.

You'll notice that the proof is certainly incomplete. After some time I gave up, realizing that I wasn't getting anywhere with my approach. Specifically, I didn't really have enough information about the prefix I had formed, so I realized that the place I had chosen to divide the string probably wasn't appropriate.

After this, I went back to look at the regular expression in question, and tried to better understand it, and perhaps understand the author's motivation in writing the regex that way (since I state without proof that there are many other equivalent REs).

Forming a Plan 2.0:

Reading the RE again, I thought about the division of the string into a substring with parity of 0s and 1s, and a substring with parity of only 0s. That division really is the essence of the RE, and in retrospect, it was probably foolish of me to ignore that, and think that my own division was more clever.

My second attempt at proving L' to be a subset of L(R) was much more in accord with the spirit of the regular expression.

Carrying out the Plan 2.0:


Preamble:
Find the smallest substring that is a suffix to x and has an odd number of ones and an even number of zeros. We know that one such substring exists, since s itself has this property by assumption. s is finite and so has a finite number of substrings - a subset of s's substrings is therefore also finite and has a largest member (and a unique one, since only one substring is a suffix to x and has length n for natural n).

call this substring b, and let a be the substring that when concatenated with b forms s.

Left side
Then a has an even number of zeros and ones. (Since s must have an odd number of ones and an even number of zeros. b posses this property, and we will maintain it, since adding an even number preserves parity). a must also have an even length, since its length is the sum of its zeros (even) and the sum of its 1s (even).

If a is not the empty string, let a_0, a_1, a_2... be the smallest non-empty substrings of a, each having an even number of 1s and 0s, such that:
a_0a_1a_2.... = a

then for each a_n, either:

a_n is 00 or 11
or a_n begins with one of 01 or 10. (If it began with 11 or 00 and had length greater than 2, it could be "split" into 11 or 00 followed by a shorter string)
then a_n ends with either 01 or 10 (by the same logic as before - if it ended in one of 11 or 00, the end could be split off)
and between the opening and closing (01+10) there may be some number of 00 or 11 pairs, since these preserve a_n's indivisibility.

the first case corresponds to the regular expression (00+11) and the second to (01+10)(00+11)*(10+01).
So a is in their union, or L((00+11+(01+10)(00+11)*(10+01))*

Right side:
Turning our attention to b:
Case 1: b is 1
Case 2: b is not 1.
then b begins with either 01 or 10. This is true, since if it began with 00 or 11, then those two characters could be "split" from b and we would be left with a smaller suffix with even 0s and odd 1s. Also, we know that b is of length at least 3, since its length is larger than 1 and odd.
then b ends with a zero. If b ended with a 1, then a smaller suffix of s with odd 1s and even 0s would exist (that being 1).
so b is equal to one of 01 or 10, followed by an even number of characters which we divide into pairs (only those pairs whose first member has an even index - that is to say, each character appears in only one pair), followed by a zero.
assume at least one of the above mentioned pairs is 01 or 10.
then if we locate the rightmost one of those pairs, we see that there exists some substring 01(00+11)*0, with even zeros and odd 1s, which is smaller than b
this contradicts our assumption, so the pairs must consist only of 00 and 11.

So we may describe b as either 1, or one of 01 or 10, followed by some (possibly zero) number of pairs of zeros or ones, followed by a zero
The above description is described by the regular expression (1+(01+10)(11+00)*0)

Dénouement
Since s is the concatentation of a and b, it can be described by the concatenation of regular expressions for a and b. So

(00+11+(01+10)(00+11)*(10+01))*(1+(01+10)(11+00)*0)


Proving L' to be a subset of L(R) was the main hurdle I faced in this proof, and showing L' to be a superset I found to be fairly simple, and so I won't devote much time, or even a very formal proof, to it.

My plan in showing that R generates good strings was to divide it into subsets (which I think is a natural thing to do, given all the "+" operators that are used. I state that if L(a) L(b) and L(c) all have some common property, then L(a+b+c) has that property (call this Lemma A). I also state that if the number of some symbols in L(a) has parity, then the same is true for L(a*) (Lemma B). I state these without proof (mostly because I don't consider a proof to be necessary here, and for both statements to be self-evident, not because writing a proof would be difficult). These will be the driving force behind my proof.

First I wish to show that L(00+11+(01+10)(00+11)*(10+01)) always generates strings with even 1s and 0s. This is true of L(00) and L(11) by observation, so I only need to prove it for L((01+10)(00+11)*(01+10)). (00+11)* generates strings with parity of 0s and 1s by Lemma B, and adding 2n 0s and 2k 1s to a string does not change the parity, so consider the language L((01+10)(01+10)). Each of the two strings in the concatenation has exactly 1 one and 1 zero, so their concetantion has exactly 2 ones and 2 zeros.

So, by Lemmas A and B, (00+11+(10+01)(00+11)*(10+01))* generates strings with even 1s and 0s.

Next I wish to show that L(1+(01+10)(11+00)*0) contains strings with odd 1s and even 0s.

This is true of L(1) by observation, so I only need to show it for L((01+10)(11+00)*0) by Lemma A. By the reasoning earlier, (11+00)* can be ignored since it won't change the parity of the string, so we consider L((01+10)0). In call cases, this generates a string with 1 one, 2 zeros. So, the property holds.

So an arbitrary string generated by the regular expression contains a prefix (possibly empty) with even 0s and even 1s, and the remaining string has odd ones and even zeros. So, the total number of ones in any string is odd, and the total zeros is even.

Looking Back:

My biggest mistake in my first approach was not paying close enough attention to (or rather, not attributing enough importance to) the regular expression itself (rather than just the language), and perhaps the authors intentions in writing it the way they did.

Another stumbling block I encountered while writing this proof was formalizing some of my observations. At many turns I found myself having to state things informally. I was able to improve the expressiveness of my proof somewhat by speaking more of the language denoted by a regular expression, rather than the regular expression itself. For example, I think it's more convincing to say something like L(01+10)1 = {011, 101} which contains only elements with an even number of ones and an odd number of zeros, rather than saying simply that (01+10)1 always generates strings with that property because it "chooses between" 01 and 10 which both have odd zeros and odd ones, and then appends a 1 (maybe I was committing the pathetic fallacy with my regular expressions).

Ahem. Sorry, that ended up being an ungodly length. I don't know if Blogger has functionality for any of that "after the jump" trickiness, but if it does I can't find it. I apologize to the people who read this blog (the empty set?). I think "The Empty Set" would be a great name for a band. Just saying.

Friday, November 21, 2008

Happy Birthday, Magritte!

At the end of the lecture we were left with a tantalizing little puzzle to think about (which is great for me, because it gives me something to think about during the walk home - I feel much more efficient that way). Prof. Heap, near the end of class, told us that the language with an equal number of zeros and ones couldn't be expressed by any regular expression, so I took it upon myself to think of a proof on the way home.

Let L be said language, and let Q be a DFSA that expresses L. Let n be the number of states in Q (there must be a finite number, by the 'F' in DFSA).

Let x be the string consisting of n+1 0's concatenated.
I assert that there must be at least n+1 states between the state at x, and the accepting state. This string is able to reach the accepting state, we know, if n+1 1's are appended to it.

We can prove this result by contradiction. Let j and k be strings consisting entirely of 1's, both of whose lengths are greater than 0 and less than n+1, and such that |j| < |k|, and sigma(xj1) = sigma(xk1) = q0. Then let q1 = sigma(q0, 1), q2 = sigma(q1, 1), q(m+1) = sigma(qm, 1)

let p (running out of letters here) equal (n+1) - |k|. then qp must be the accepting state, since it represents n+1 0's, concatenated with |k| + [(n+1) - |k|] = n+1 1's. However, this state also accepts the string that is n+1 0's concatenated with |j| + [(n+1) - |k|] < n+1 1's. Here, the number of 1's and 0's are obviously not equal, so we've arrived at a contradiction.

Then the DFSA passes through at least n+1 states. However, this contradicts our assumption that the DFSA has only n states. Therefore, L cannot be represented by a DFSA, and by extension, cannot be represented by a regular expression.

Man, that was a kinda ugly proof, in no small part because of the alphabet soup. Maybe it would just be easier to explain my insights informally. My idea was simply that, to represent the language would require an infinite number of states (I found it easier to envisage as a DFSA than a regex), since there would need to be at least one state for each difference in 0's and 1's (intuitively, anyways).

If there were such a thing as a deterministic infinite state automata (hey, it doesn't sound as crazy as being nondeterministic), it seems like it would be pretty easy to write. It could just be a ladder climbing infinitely in both directions with one symbol taking you up and another taking you down, and the accepting state being right in the middle.

Friday, November 7, 2008

Test 2

Everything's just been hunky-dorey.


...

I guess that wouldn't be an acceptably meaty Slog post, would it? Really though, I've just been sailing through. I love the material, I love the assignments and the problem sets, I have nothing to complain about.

I just came back from term test 2, so I suppose I'll talk about that briefly, for lack of anything better to pontificate on. It went pretty well, except for the last question, which was about proving the termination of a loop. I think my logic was correct, but I leapt into the proof without looking first, and so I know the approach I took was inelegant (and probably had some gaps). For some reason, I decided to make a function that takes valid loop indices as its domain, and spits out the value of n at that index. Then I proved that it was decreasing, and always natural. I only mentioned loop invariants in passing, which was kind of silly.

It's a mistake that I've made a lot in tests, of getting antsy about the time limit, and feeling obliged to start writing stuff down as soon as I see the question, not wanting to "waste time" planning out a proof. For that reason, I often end up writing out base cases that turn out to be redundant, or make some claims without the appropriate intermediate steps, and turn my paper into a roadmap of arrows, inserting lines in the cracks where they're needed. So I suppose something to work on in the future would be remembering to take time to make a quick mental plan of where I want the proof to go, rather than blindly filling in a predicate and some base cases, and hoping that they'll lead me to inferences, that will lead me to inferences, that will lead me to the conclusion I want.

Friday, October 17, 2008

Midterm, A1, etc.

As I was writing up a solution to Problem Set 3, I realized I'd been neglecting my SLoG so I should give an update on my adventures in 236.

The mid-term went well. I was surprised that it seemed to be all on induction, but that was fine with me since I feel very comfortable with it by now (which I wouldn't have thought possible a year ago). I just now saw the grades posted, and am pretty happy that I'm one of the five who aced it (we should form a posse), and am delighted that the number of people enrolled in the class is a power of 2. I decree that no-one is allowed to drop the course now (unless it's an exodus of 64 people).

I got my A1 back, and was pretty happy with how it turned out. I'd be lying if I said I wasn't saddened when some cruel TA who sipped the wine of life and found it to be bitter said that my cherished procedure for question 2 was "not the simplest", but "okay". Ouch. On the other hand, they were quite generous in giving me full marks for question 3, when my part a was needlessly long and circuitous and I made a small error in reasoning in part b. So I guess I came out even?

(Actually, after going back and re-reading the assignment handout, I realise that I probably came out ahead. I talked with Prof. Heap briefly about question 2 last week and he mentioned that at least one TA was taking off marks for not starting with a base case of 0, which I did, while he thought that 2 was the only appropriate base case. I had thought that the premise was that the menus could differ by no more than one meal, when in fact the handout said "exactly one" meal. So really, I should have lost a mark there, if not for my merely "ok" procedure)

I'd still be curious to know how many different approaches people used in question 2, and how they were distributed. I've enjoyed reading the TAs' posts on the message boards where they do something similar, going over the solutions to in-class problems or problem sets, along with an analysis of how many students solved it, common mistakes that were made, and alternate solutions.

I think that's all there is to report. It's a sad irony that the course that is, by far, my favourite this semester, is the only one where I have a platform to complain. If I had a SLoG for, say, STA247, let me tell you...

Tuesday, September 30, 2008

"Interesting" being a relative term

The deadline for A1 has passed, and I feel pretty confident about the answers I submitted. I did notice a few interesting discrepancies between the sample solutions and mine though:

1) The sample solutions consisted of fewer than half as many lines as mine, which reaffirms my fear that my proofs are too long. I think this was partially - in question 3 specifically - from my proof taking a circuitous path. In general though, I may have tried to be more precise than was necessary. I think switching to a more, um, prosey (my proofs are anything but prosaic) style cut down on their size a good bit, but I'm still working on cutting out more redundant or obvious stuff.

2) My solution to question 2 was completely different from the sample solution, which I thought was pretty cool. The technique in the sample solution is a bit easier to follow, but I like mine (which consisted of splitting the nth cycle into 2^n-1 pairs, and inserting a pair of new values in the middle of each one) just fine. I'd be really curious to know how many people used my solution, and how many used the other, and if there are even more techniques that work that people submitted.

3) The solution to question 3b made me realized that I had made the careless mistake of conflating "cannot be expressed as a ratio of natural numbers" and "irrational". This didn't actually meaningfully affect my proof, which would apply if I went back in time and replaced the latter term with the former, so I hope I won't be punished too harshly for my slip-up.

Also, no matter how many times I look at the solution to question 3b, I can't make sense of it. I can't decide whether it's more likely that this is the result of a mistake or transcription error on the part of the author, or if I'm just missing something stunningly obvious.

Speaking of missing the stunningly obvious, I almost fell for the pseudo-proof of the six-sidedness of hexagons during yesterday's lecture. It wasn't until after a minute or two that I thought to myself "it sure is neat that he proved that without using any properties of hexagons", and then the inevitable "wait a minute...". I'm embarrassed to admit that I briefly fell for a proof that could easily be adapted to "prove" that all horses are the same colour.

Monday, September 22, 2008

Rethinking my proof style

I was surprised by the sample solutions that were posted recently for Problem Set 1. I had assumed that the proofs we were doing would follow the format from 165 (Assume, indent... then... then... assume, indent... then... then... dedent... then... dedent... then for all...). However, the sample solutions posted seemed much less formal than those from 165, and made much use of prose over symbols. I think I like the idea of writing a proof in paragraph form, given the difficulties I've been having with Assignment 1. In some ways, I've been having more problems expressing myself clearly than in actually understanding and finding a solution for the problems. My proofs to the first two questions look like someone spilled a bowl of alphabet soup on my monitor - way too many subscripts and variables. I think one of the reasons for that was that, if I wanted to discuss a subset of a certain set, without doing anything to the set itself, I would still have to declare and justify the existence of both, whereas if I were doing a more prosaic proof, I could describe the properties of the set and then move on to what interested me, only giving an explicit representation to the things that I was going to use.

If that makes sense.

I'm almost tempted to rewrite my proofs to the first two questions in the style of the sample solutions. I'm loath to erase the <100 lines I've already committed to them, though. I know I'll definitely be adopting a less formal technique for the last two questions, and the second problem set though.

Friday, September 19, 2008

Weeks 1 and 2

"Slog", "Blog", such ugly guttural words. Anyways.

I'm glad to be back. I've been looking forward to CSC 236, because I so enjoyed 165 last year. It's strange - I did pretty awfully in MAT137 last year, but when I'm doing logic and proofs I just feel right at home. Maybe it's because every math class I've taken in my life has focused on "calculations of the grocery-bill variety" (a phrase cribbed from Uncle Petros and Goldbach's Conjecture which I reread over the summer), whereas 165 and 236 have felt like they required a lot more independent thought, and not just memorization and repeated application of formulas and procedures. For that reason, I found solving proofs in 165, and now 236, a lot more satisfying. But maybe that's just me explaining away my incompetence at calculus.

I feel like I've done pretty well with the material during these first two weeks. It's taken me a while to warm up to induction since last year, but at this point I feel quite comfortable with it. The explanation the book gives in section 1.2, reconciling the principles of simple and complete induction with induction as proof technique by expressing the natural numbers for which a predicate is true as a subset of the natural numbers, and then using the principle of induction to show that that set is a superset of the natural numbers (and hence, exactly equal to them).

My only difficulty in solving the first problem set came in the second question, trying to decide the degree of rigour with which I would need to justify a certain statement. Specifically, I had partitioned the set of pairs for {0, 1... n+1}, Dn+1 = {{1, 2}, {1, 3}, {2,3}... {n, n+1}} into Y: {s \in Dn+1, n+1 \in s} and N: {s \in Dn+1, n+1 ~\in s}, and was trying to prove that the cardinality of Y was equal to n, which seemed obvious to me, but somewhat messy to prove. My solution was just to add an appendix to the end of the proof, with what ended up being about a page of writing, elucidating the reasoning behind those two lines, with a note to the marker that they could refer to it if they found the lines alone insufficient.

Wasn't that an interesting story? There are more where that came from. Stay tuned for next week.