<?xml version="1.0" encoding="ISO-8859-1"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xml:lang="en-US">
	<title>Reflections in the water</title>
	<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php" />
	<modified>2012-02-07T02:41:49Z</modified>
	<author>
		<name>Aishwar</name>
	</author>
	<copyright>Copyright 2012, Aishwar</copyright>
	<generator url="http://www.sourceforge.net/projects/sphpblog" version="0.4.8">SPHPBLOG</generator>
	<entry>
		<title>New Site</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080605-113825" />
		<content type="text/html" mode="escaped"><![CDATA[To any reader of this blog, I have changed the main site and have created 2 new blogs - personal and tech (and I am <b>discontinuing</b> this one). Please visit the home page at <a href="http://www.ai-projects.info/" >http://www.ai-projects.info/</a>. The link to the <b>new</b> blogs may be found there.<br /><br />For anyone lazy :), here are the direct links to the 2 blogs (though I really recommend you visit the main page at least once):<br /><br />Personal Blog: <a href="http://www.ai-projects.info/blog/" >http://www.ai-projects.info/blog/</a><br />Tech Blog:<a href="http://www.ai-projects.info/tblog/" >http://www.ai-projects.info/tblog/</a>]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080605-113825</id>
		<issued>2008-06-05T00:00:00Z</issued>
		<modified>2008-06-05T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Wow! I am surprised.</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080405-204437" />
		<content type="text/html" mode="escaped"><![CDATA[I am pretty happy and surprised, that a lot of the blog posts are being rated now (hence read, I believe!). I would appreciate anyone reading to post some comment in addition to the rating (this way I know how I can improve post quality).<br /><br />It has been a little busy. But after April 12, 2008, I should have more time to blog. :)<br /><br />I have some interesting things in mind to post about - I am doing some research into forecasting methods (including tarot cards! NO, I am just kidding ;). I am talking about the mathematical type of forecasting).<br /><br />Also, in the side I am developing &quot;Business Insight - a tracking and forecasting system&quot;. It is under development and there is a working version, but I am not planning to put up a link till its more mature. I am working with a small company to test the system (its win-win, they get tracking and forecasting, I get system testing)! yay!]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080405-204437</id>
		<issued>2008-04-05T00:00:00Z</issued>
		<modified>2008-04-05T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Fuzzy String Search</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080314-190714" />
		<content type="text/html" mode="escaped"><![CDATA[From now on, I am going to drop the prefix AONTW, since I think it has served its purpose - to make me post regularly :)<br /><br />Recently, I was thinking of building an issue tracker. One of the features I thought would be cool to have, was to see previous issues related to the current issue (it may possibly give us ideas for the fix, or it may show us the cause of the present problem). I was thinking of implementing this in a search type method (search against the previous issues).<br /><br />Doing exact word matches is quite easy in any database. I wanted this to be a little more flexible, being able to pick up words that may be related/mispelled etc. As a starting point, one algorithm to be talked about is the Levenshtien distance. What this algorithm does is it computes the minimum number of insertions, substitutions, and deletions required to convert one string to another. For e.g. mispelling &quot;String&quot; with &quot;tring&quot;, would cause the algorithm to give a distance of 1, as you would expect (add &#039;S&#039; to get string 1 from string 2).<br /><br />This algorithm may not do as good a job with getting related words, since the different forms of words (like tense, tension, tensed etc) would get scores based on the number of letters different from the base word. There may be other words like &quot;hence&quot; (for the case above) which would get a lower distance score (hence would be determined a closer match). Other methods or implementation changes on the algorithm could help solve this.<br /><br />But moving on, I will describe the algorithm here. Lets say you have 2 strings A and B. Let string A be TALLY and let string B be TOLL. Lets take a guess on the minimum number of changes needed - I would guess 2 - replace the the O with the A, and insert an Y in the end, to transform TOLL into TALLY. Now I will walk through the algorithm:<br /><br />We will create a 6 x 5 matrix D - 6 is the (length of String A + 1) and the 5 is the (length of String B + 1).<br /><br />We will populate row 0, col i with value i; and col 0, row j with value j.<br /><br /><pre>for i = 0 to length(A) // Note size of Array is 1 + Length(A)<br />  D[i,0] = i;<br />for j = 1 to length(B) // Note size of Array is 1 + Length(B)<br />  D[0,j] = j;</pre><br /><br />After this we will parse through the matrix using nested for loops (lines 1 and 2) and determine the values for each spot,<br /><br /><br /><pre>1 for i = 1 to length(A) {<br />2   for j = 1 to length(B) {<br />3     if A[i-1] = B[j-1] then cost = 0; Else cost = 1;<br />4     D[i,j] = min(D[i-1,j]+1,D[i-1,j-1]+cost,D[i,j-1]+1);<br />5   }<br />6 }<br />7 return D[i,j]; // This is the minimum number of steps needed</pre><br /><br />The cost is computed in line 3. As may be seen, the cost is 0 when letter j is the same as letter i. In our example, one case where this would occur is, for the position [1,1], since the first letter of A (&#039;t&#039; ) matches the first letter of B (&#039;t&#039; ).<br /><br />D[i-1,j]+1 represents a deletion operation.<br />D[i-1,j-1]+cost represents the substitution operation (if the letters match the cost of substitution is 0 - see line 3).<br />D[i,j-1]+1 represents the insertion operation.<br /><br />From my interpretation, the value at D[x,y] represents how many operations are needed for transforming the first x characters of String A into the first y characters of String B. Let me illustrate this with the following:<br /><pre><br />    T O L L<br />  0 1 2 3 4<br />T 1 0<br />A 2<br />L 3<br />L 4<br />Y 5<br /></pre><br />At location 1,1 we can get from the T in TOLL to the T in TALLY doing 0 changes. Moving on,<br /><pre><br />    T O L L<br />  0 1 2 3 4<br />T 1 0<br />A 2 1<br />L 3<br />L 4<br />Y 5<br /></pre><br />At location 2,1 we can get from the TA in TALLY to T in TOLL doing 1 change, that is leaving the first T as is, and then deleting the A. As you may observe, D[i-1,j]+1 represents deletion.<br /><br />Similarly D[i,j-1]+1 represents insertion.<br /><br />When calculating a new D[i,j] we want to pick the optimal solution at each point, this way we will have an overall optimal solution. Doing a quick fill-up of the table above,<br /><pre><br />    T O L L<br />  0 1 2 3 4<br />T 1 0 1 2 3<br />A 2 1 1 2 3<br />L 3 2 2 1 1<br />L 4 3 3 2 1<br />Y 5 4 4 3 2 &lt;-- Number of changes needed to change all of TALLY into TOLL<br /></pre><br />The lower right corner represents the value of transforming whole of String A into String B (consistent with our interpretation from before). From this we can determine that the minimum number of changes needed is 2, which confirms what we had guessed at the start of this post!]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080314-190714</id>
		<issued>2008-03-15T00:00:00Z</issued>
		<modified>2008-03-15T00:00:00Z</modified>
	</entry>
	<entry>
		<title>AONTW: Block Ciphers</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080304-194606" />
		<content type="text/html" mode="escaped"><![CDATA[I was looking into the TEA (Tiny Encryption Algorithm) article in Wikipedia. It mentioned about Fiestel Ciphers, which I have found out is a type of block cipher. Here I will share my findings:<br /><br />Source: <a href="http://imps.mcmaster.ca/courses/SE-4C03-07/wiki/mccombi/blockciphers.html" target="_blank" >Block Ciphers</a><br /><br /><b>Block Ciphers</b><br /><br />Block Cipher means that the algorithm encrypts block by block. A block is some specified length of bits. This number is usually 48, 56, 64,80,128,256 etc.... A block cipher produces the same cipher text for the same block of plain text when using the same key (this is not the case with stream ciphers, that is why this note is here!). Also, since block ciphers have a 1 to 1 mapping, the decryption algorithm is the inverse of the encryption algorithm, i.e. E(x) = y; E^(-1)(y) = x; given some key k used;<br /><br />There are 3 variations of the block cipher. They are:<br /><br /><b>Iteration Block Ciphers:</b> Apply the same algorithm over and over again on the resulting cipher text, thus encrypting it further and further. This is what the usually talked number of rounds is about. In each round, a new key may be chosen or the same key may be used.<br /><br /><b>Fiestel Ciphers:</b> The plain text is split into two halves. During each round, the the right half and the key are passed into the encryption algorithm. This is then XORed with the left half. The two halves are then swapped. The same process is repeated.<br /><br /><b>Cipher Block Chaining:</b> The cipher text is not only a function of the key and the encryption algorithm but also of the blocks previously encoded. In this method, a block of cipher text is XORed with the next block of plain text that is to be encoded. The first block is the initialization vector, which is a block of dummy text. Overall, this means that to decrypt some block of the plain text, the prior block has to be decrypted as well. This carries the danger that if some block is lost, then all subsequent information is lost as well!<br /><br />The following images are from Wikipedia, they do an excellent job in illustrating Cipher Block Chaining<br /><br /><img src="images/Cbc_encryption.png" width="512" height="207" border="0" alt="" /><br /><img src="images/Cbc_decryption.png" width="512" height="189" border="0" alt="" />]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080304-194606</id>
		<issued>2008-03-05T00:00:00Z</issued>
		<modified>2008-03-05T00:00:00Z</modified>
	</entry>
	<entry>
		<title>AONTW: Encryption: The Basics</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080302-183023" />
		<content type="text/html" mode="escaped"><![CDATA[Missed a week, but moving on.. :P<br /><br />I have had interest in cryptography, and have lightly scratched the top of its surface many times in the past. So, I thought I&#039;d be a bit more focussed and do something this time (the inspiration to do this again, probably came from reading Digital Fortress - note, I do realize it does not describe the most sophisticated/complex theories, never the less, it was entertaining :)).<br /><br />Source for the info below: <a href="http://download.pgp.com/pdfs/Intro_to_Crypto_040600_F.pdf" target="_blank" >An Introduction to Cryptography</a><br /><br /><b>The Basics</b><br /><br />As most would realize encryption is the process of &quot;coding&quot; some plain text message into some unreadable &#039;cipher text&#039;. This cipher text is the encrypted message, this is then decrypted back to get the plan text message.<br /><br />Symmetric Cipher<br /><br />It means that the &quot;key&quot; used for encryption is the same as the key used for decryption<br /><br />Asymmetric Cryptography (Public-key Cryptography)<br /><br />It means the key for encryption and decryption are not the same. In public key cryptography, you publish your encryption key to the public. Anyone can use this information to send encrypted messaged to you. But you hold the private key which you would use to decrypt the message.<br /><br />The good thing about this setup is the need to share the key between the sender and receiver is eliminated. All communication involve the public key, and no private key is ever transmitted or shared - I guess what this means is there is no longer a need to ensure a secure medium of transmission, any public infrastructure would satisfy.<br /><br />As you may realize, keys are what create a specific cipher text for a given plain text. Generally, the size of the key is a good indicator of the security of the text. Of course, it also depends on the algorithm (more so).<br /><br />One interesting thing is, public key cryptography key size and conventional cryptography key size are unrelated. A 80 bit conventional key has the equivalent strength of a 1024 bit public key<br /><br />With that I&#039;ll end this session of AONTW!<br />Wish everyone a happy work week ;)]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080302-183023</id>
		<issued>2008-03-03T00:00:00Z</issued>
		<modified>2008-03-03T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Atleast One New Thing a Week (AONTW): Video Encoding</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080218-183413" />
		<content type="text/html" mode="escaped"><![CDATA[Lets see how long I keep this up... :)<br />I think I&#039;ll post something new I learned every week (mostly weekends). Here goes the first. This is about Video Encoding. Here are somethings I learned in the process of research.<br /><br />Sources:<br /><a href="http://en.wikipedia.org/wiki/Quantization_%28image_processing%29" target="_blank" >Quantization (Image Processing)</a><br /><a href="http://gentoo-wiki.com/HOWTO_Mencoder_Introduction_Guide#The_Basics" target="_blank" >HOWTO Mencoder Introduction Guide</a><br /><a href="http://www.webopedia.com/" target="_blank" >Webopedia</a><br /><br />Video codecs and formats are not the same thing. MPEG 4 for e.g. is a video format, Xvid is a codec. Codecs create the actual videos.<br /><br />Then there are multimedia containers. The container is what will contain the encoded video and audio. You can put anything into the container format (as long as it supports it - e.g. video and audio). One e.g. of a container format is AVI.<br /><br />Some programs to do video encoding:<br /><br /> - VirtualDub (gui + commandline)<br /> - mencoder (commandline)<br /><br />Some terminology/concepts useful when using these programs:<br /><br /> - Quantization: a lossy compression technique achieved by compressing a range of values to a single quantum value.<br /><br /> - I-frame: also known as the Key frame, these frames contain information frame information without reference to any other frames (think of it as 1 snapshot in a movie; this will make more sense as you read P-frame and B-frame(s) below). Hence I-frames take the most bits to store, but improve the video quality<br /><br /> - P-frame: P-frames follow I-frames and contain information that has changed since that I-frame (such as color information and content change). Hence, they depend on the I-frame to fill in their data. P-frames are also aptly called delta-frames.<br /><br /> - B-frame: B-frames or bi-directional predictive frames rely on the frames preceeding and following them. They contain data of what has changed between the 2 frames.<br /><br /> - GOP: stands for Group of Pictures<br /><br />The [I/P/B]-frame quantization values range between 1-31. The higher the number, the more compression (hence more loss of information --&gt; smaller file size --&gt; lower quality)<br /><br />Bitrate is how much bits per second to store the data (higher means more bits are used to store per second of data, meaning more information stored).<br /><br />Based on the above 2 factors (at least at a basic level), the quality and file size of videos can be controlled. The challenge is to find the right values that would optimize the quality to fit your space needs (possibly time needs as well, i.e. how long you have to encode)<br /><br />Of course, one thing to be remembered is the encode&#039;s quality would depend on the source&#039;s quality. It wouldn&#039;t matter if you had quantization values of 1 and bitrate of 2000Kbps if the source is very poor)<br /><br />Also, there are other options you can play with that would have an effect in the quality and time taken like motion detection etc..but think of this as a basic starting point :))]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080218-183413</id>
		<issued>2008-02-19T00:00:00Z</issued>
		<modified>2008-02-19T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Time to go Version 4</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080203-105941" />
		<content type="text/html" mode="escaped"><![CDATA[Time to upgrade]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080203-105941</id>
		<issued>2008-02-03T00:00:00Z</issued>
		<modified>2008-02-03T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Terms of the trade: in Business Intelligence</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080115-204932" />
		<content type="text/html" mode="escaped"><![CDATA[I started searching about Business Intelligence...this slowly led to other things...one of which, was data warehousing. I have quoted one part from Wikipedia, which I found easy to understand:<br /><br />Source: <a href="http://en.wikipedia.org/wiki/Data_warehouse" target="_blank" >http://en.wikipedia.org/wiki/Data_warehouse</a><br /><br /><blockquote>In OLTP — online transaction processing systems relational database design use the discipline of data modeling and generally follow the Codd rules of data normalization in order to ensure absolute data integrity. In this approach, each of the more complex information items is resolved into a set of records in multiple tables, each of which satisfies the normalization rules. Codd defines 5 increasingly stringent rules of normalization and typically OLTP systems achieve a 3rd level normalization. Fully normalized OLTP database designs often result in having information from a business transaction stored in dozens to hundreds of tables. Relational database managers are efficient at managing the relationships between tables and result in very fast insert/update performance because only a little bit of data is affected in each relational transaction.<br /><br />OLTP databases are efficient because they are typically only dealing with the information around a single transaction. In reporting and analysis, thousands to billions of transactions may need to be reassembled imposing a huge workload on the relational database. Given enough time the software can usually return the requested results, but because of the negative performance impact on the machine and all of its hosted applications, data warehousing professionals recommend that reporting databases be physically separated from the OLTP database.<br /><br />In addition, data warehousing suggests that data be restructured and reformatted to facilitate query and analysis by novice users. OLTP databases are designed to provide good performance by rigidly defined applications built by programmers fluent in the constraints and conventions of the technology. Add in frequent enhancements, and too many a database is just a collection of cryptic names, seemingly unrelated and obscure structures that store data using incomprehensible coding schemes; all factors that while improving performance, complicate use by untrained people. Lastly, the data warehouse needs to support high volumes of data gathered over extended periods of time and are subject to complex queries and need to accommodate formats and definitions inherited from independently designed package and legacy systems.<br /><br />Designing the data warehouse data Architecture synergy is the realm of Data Warehouse Architects. The goal of a data warehouse is to bring data together from a variety of existing databases to support management and reporting needs. The generally accepted principle is that data should be stored at its most elemental level because this provides for the most useful and flexible basis for use in reporting and information analysis.</blockquote><br /><br /><br />Also this is on data mining,<br />Source: <a href="http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci211901,00.html" target="_blank" >http://searchsqlserver.techtarget.com/s ... 01,00.html</a><br /><br /><blockquote><br />Data mining is sorting through data to identify patterns and establish relationships.<br /><br />Data mining parameters include:<br /><br />    * Association - looking for patterns where one event is connected to another event<br />    * Sequence or path analysis - looking for patterns where one event leads to another later event<br />    * Classification - looking for new patterns (May result in a change in the way the data is organized but that&#039;s ok)<br />    * Clustering - finding and visually documenting groups of facts not previously known<br />    * Forecasting - discovering patterns in data that can lead to reasonable predictions about the future (This area of data mining is known as predictive analytics.)</blockquote><br /><br />Now, flowing down to the real deal - Business Intelligence:<br />Source: <a href="http://searchdatamanagement.techtarget.com/sDefinition/0,,sid91_gci213571,00.html" target="_blank" >http://searchdatamanagement.techtarget. ... 71,00.html</a><br /><br /><blockquote>Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.</blockquote>]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080115-204932</id>
		<issued>2008-01-16T00:00:00Z</issued>
		<modified>2008-01-16T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Mainframe SAS Keywords: in Post Later :)</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080114-202951" />
		<content type="text/html" mode="escaped"><![CDATA[Keywords:<br /><br />* TSO<br />* JCL<br />* ESP<br />* SAS<br />* Data set(IBM Mainframe)<br />* BIA<br />* Mainframe]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080114-202951</id>
		<issued>2008-01-15T00:00:00Z</issued>
		<modified>2008-01-15T00:00:00Z</modified>
	</entry>
	<entry>
		<title>IP Tables: in Linux</title>
		<link rel="alternate" type="text/html" href="http://www.ai-projects.info/SphpBlog/index.php?entry=entry080113-222530" />
		<content type="text/html" mode="escaped"><![CDATA[<strong>Saving this for a later read</strong><br /><br />Seems like a good article, I feel too tired to read at the moment :P, I&#039;ll just post the link here for now:<br /><br /><a href="http://iptables-tutorial.frozentux.net/iptables-tutorial.html" target="_blank" >http://iptables-tutorial.frozentux.net/ ... orial.html</a>]]></content>
		<id>http://www.ai-projects.info/SphpBlog/index.php?entry=entry080113-222530</id>
		<issued>2008-01-14T00:00:00Z</issued>
		<modified>2008-01-14T00:00:00Z</modified>
	</entry>
</feed>


