Biology DNA 1.01

Sunday 18/5/2014, 23:45

I am not sure if I should be worried more about GMOs or this new freak technology! This new technology, which is an attempt at increasing the alphabet count in the DNA was published this month in Nature: A semi-synthetic organism with an expanded genetic alphabet.

Does an increase in alphabet count really make a difference? I am a native speaker of Sindhi language, spoken by about 17 million people in current Pakistan, and 2.5 million in current India. Sindh was divided in two parts 1947 during partition of India based on religious and political grounds, which resulted in violence and cross border migration, with majority of Hindu Sindhis migrating to India, and Muslim Sindhis to new Pakistan. Sindhi language has been written in several different scripts in different periods.

Sindhis in Pakistan now write in Perso-Arabic script, which has a varying number of letters in the alphabet, depending on how one counts it. Children in schools are taught that the alphabet has 52 letters, but essentially they are taught more than 52 - a small letter, a phonetic combination of two letter, and 6 accents makes it to 60. The language has a large number of consonants, that are not present in other languages, and in general cannot be pronounced properly by non-natives who learn them as adults.

Sindhis in India write in Devanagri script (based on Sanskrit script), which has 51 letters in the alphabet, and some accents/vowels that I cannot count as I'm not fluent in it.

Sindhi was also written formerly in Sindhi script, which is now called Khudawadi script, and has 61 letters.

Sindhi can be written in any of these scripts without losing or gaining any additional phonetic information, or more importantly the meaning of any words or sentences, i.e. if someone was reading it out loud, I will not be able to tell what script they were reading. Perhaps I could tell if I could see their eyes, because Perso-Arabic script is written Right-to-Left and Devanagri is written Left-to-Right like English, but that does not change the meaning of what is being read, whatsoever.

Many years ago, I developed a keyboard layout for Sindhi (and Urdu), and strictly speaking, the Perso-Arabic alphabet is much more extensive as some letters must have three different forms, depending on their position in a word (initial, terminal, or in the middle), and if they are adjacent to certain other characters.

All that does not impact on what can be said in Sindhi. Mathematically, a lot more words can be created in Sindhi than in English, but the truth is that none of the known languages are anywhere near saturation point in terms of possible words.

 Theoretically an infinite number of words can be created from any language, even if it had only one letter in the alphabet and possibility of inserting a space; it would just not be useful for us humans who struggle with counting and processing large number of symbols, such as binary code, which is essentially a two letter (0 and 1) alphabet, and computers have no problem processing it as they are designed to do it. And we have managed to capture all the information in this binary based alphabet, and process all the commands on our digital computers (this includes the traditional computers, tablets, phones, and processors in an amazing variety of electronic products such as cameras, cars, printers, watches, boilers controllers, etc.).

Nature has chosen an alphabet of 4 letters, which we have transcoded as ATCG, based on their chemical names Adenine, Thymine, Cytosine, and Guanine. I might be completely wrong on this one, but as far as I know, all life on earth is based on these 4 letters in their DNA, and I have been under the impression that even the DNA fragments that rain on earth from the outer space are also based on the same 4 letters, making it the universal alphabet of life. With these 4 letters, nature has created all the astounding variety of life, the one that we know of that it exists, the one we do not know of that it exists, the one that has existed in the past, and the one that is to come in the future. Nature has created life in all its different forms, as it deemed it necessary, and we as humans have not beaten any of nature's technology in any dimension yet. For example, the human brain is accepted to be the most complex system in the known universe, and we are nowhere near understanding it, let alone beating it as yet. Our navigation systems do not beat the bird brains. Our best cameras do not beat the natural eyes. Our best sensing technologies do not beat natural electroreception and electrolocation. Our radar technologies do not beat the echolocation and sensing of the bats. By nature's standards, our entire technology is extremely crude, and our entire knowledge, all combined, is extremely limited.

I strongly believe that we yet have to learn the number system that the nature uses, all the number systems that we have devised so far are just a crude approximation of what nature uses, which brings us back to the letters in the alphabet - we also use different symbols to represent numbers, and many other symbols for punctuation, which strictly speaking increases the number of characters used in a language, for example in Sindhi, close to 80.

The closest thing to DNA encoding and processing that we humans have achieved is acquiring and storing information in digital format, processing it, and giving some output, whether for providing us with some information, or controlling some physical device. I have been very lucky to have learned the assembly language as my first programming language when I was in high school, without even knowing that it was the assembly language, disguised as RPN programming language for HP calculators. On PC I learned to program in GW Basic and C in late 80s / early 90s, and in many more languages as they emerged later on. What many people do not realise is that all those high level languages are just an interface to make programming easier for humans; in the end, it gets translated to assembly language - which is very close to the machine code that a given processor understands. This may come as a surprise to many that machine code is very different from one processor to another, and the assembly language is also different for each processor. The compiler for a high level language can break down an easy to understand language like C (ok, relatively easy) to the specific language of the target processor, which can only run on that particular processor. But a C code asking the processor to add 2 to 3 does return 5 on each of them.

Now for understanding a given segment of machine code, theoretically, we do have all the tools we need. We know exactly what every single instruction does, there's a manual for it. One would assume that spending good time, to look up every instruction in the manual, we could understand what a given segment of machine code does. Unfortunately this is not as simple, even though there are tools such as disassemblers, which translate the machine code  back to assembly code (understandable by humans), i.e. we do not have to look up every instruction manually, we can just ask the computer to do it for us. But those who have tinkered with this process know how painful it can be... often not even possible in the time one is willing to spend on it. This is because some of the data is stored same way as the executable instruction code, and this data also gets translated back as instructions, and on top of that the variables lose the sensible human understandable names we give them. This all happens for something that theoretically we should know everything about as we invented it, and technically speaking, everything is available if we really want to know it. And we are speaking about extremely miniscule lengths of code compared to what's in the DNA.

Now for the DNA, we don't actually know anything, e.g. what does it actually do, how does it represent the information, how does it represent the instructions, how do they get processed...i.e. we DO NOT HAVE THE MANUAL, and we are not even close to decoding it. In fact, we don't even know if it stores data and instructions like the machine code does in a digital device. This is probably hard to understand for general population, as they have been hearing terms like genetic engineering, and seeing fantasy fiction movies that pretend to actually understand all this. Even many of the scientists who work in these fields do not know about how little they actually know because they do not have a good background in computer programming and digital hardware. To draw a comparison, there is a very small number of individuals worldwide, who understand programming at every level from processes in the microchips to high level programming languages, then we have a large number of high level coders. Next we have script kiddies, people who know how to use code developed by others, to the extent that they can cause serious damage to even large corporate systems. Then we have power users, who know how to configure operating systems, install various software, etc. And finally, just before the complete IT illiterates, we have the common users, who know where to find a given software and how to use it. In my view, the whole genetic engineering is currently at a stage which is between the IT illiterates and common users. They may have found some programs, and learned how to click them in some cases, install or uninstall them if it's as easy as clicking on the install or uninstall icon, nothing more.

Now while we are at such a stage that we do not really understand at all how the biological computer works, we do not know what the 4 letters in the nature's alphabet do, where the words are, where the sentences are, where the data is, where the instructions are, do we really need extra 2 letters? There would be many scientists and professionals in the bio engineering fields that would argue that they know a lot. Here's a test question for them, does any of them know what smallest change in the DNA (without years of trial and error tinkering with the DNA) will make someone's eyes purple? Or someone's toes a centimeter longer? Make the skin colour green? These are all very trivial tasks for someone (fictional at the moment) who really understands how the DNA works, compared to making someone more intelligent for example. And all that, and much more can be achieved by just using those 4 letters. So why extra 2 letters then?

The only valid reason that I can agree at this point is that it may help decode the workings of the DNA by using these 2 extra letters as markers - however, it does pose the problem that you are not really learning the properties of the unmodified DNA; only time will tell how helpful these 2 extra letters can be in this regard.

However, the most easily achievable target is commercial monopoly, which will happen, just like it has happened for GMOs. The GM crops were initially marketed as being of superior quality in one way or another from a commercial point of view, e.g. having a higher yield. That part of research wasn't as high tech as one would like it to be, many improvements came by refining the seed selection to produce healthier varieties of crops, or introducing DNA changes on trial and error basis. But that was the extent of the research, a bigger tomato was the end result, nobody bothered to see the effects on the people consuming those bigger tomatoes, which is indeed the main problem with all GMOs being introduced in the food chain. What many do not realise is that the improvements were not the only property these GM crops had. The companies worked hard to make them sterile in natural environments, i.e. they had to make the farmers dependent on purchasing the seeds. Seeds for many natural crops have been lost from public domain as a direct result of this. When they found that not all the farmers have fallen for this, they used their influence on the governments to make it illegal to grow natural crops!

So where do the 2 letters fit in? Even though the organisms seem to survive the introduction of these two new letters, they are not able to produce the artificial letters themselves like they can produce the natural four letters. They need to be manufactured and supplied externally through the immediate environment. Commercially, this means that you do not only have to pay for the seeds to the monopolised industry, but you have to pay for the chemicals that enable these organisms to grow as well.

How bad is it? To me, it's very bad on multiple levels. First, it's going to be a very strong monopoly on the food resources, and it would be very difficult to break those chains. Even worse, if people are forced to eat these GMOs, they may become completely dependent on them and may not be able to survive without these chemical cocktails. Worst of all, what effects will it have on humans and other life in long run? Nobody is even equipped to do any research on this, and yet, for the sake of some people trying to get rich, this is likely to happen at some point. I can imagine these companies claiming cures for diseases if people's DNAs get injected with these artificial codes, which make them completely dependent on chemicals forever, like air, water, food...  Are we entering the age of mutants?



Information Technology Up in the cloud

Friday 16/5/2014, 1:45am

Adobe Outage

When I first heard of Adobe only renting their software out and hosting it on their cloud, I saw many problems associated with this approach. They have produced some great software over a long period, and one could purchase and own it. I have always wanted to buy Photoshop, but left the idea on the shelf due to its high cost. But when I heard about this software renting idea, I felt quite disappointed and wished I had purchased the software while I could as Adobe wasn't selling it anymore. A Google search revealed that some vendors such as Amazon were still selling older version of Photoshop CS6 (the last version that you can own). I didn't want to have to use Photoshop CC (Creative Cloud, the rental software), so I quickly purchased a copy.

Since last couple of days, I can't log in to Adobe's website for a download. For the first day, I thought may be their site is down, and will be back up soon as one would expect from such a large company. But I couldn't log in today either, so I thought may be something was wrong with my account. So I searched for a generic username and password (you can find these on BugMeNot website if you want to bypass the annoying account registration forms on a website where they aren't really necessary) but they didn't work either. So I did a search to see if other users were having similar issues as Adobe's own website didn't say anything. What I found is quite interesting, and I'm happy that I purchased a copy of Photoshop that I can own, not rent.  The "Is It Down Right Now" website showed that Adobe's login page is indeed down. But the user comments on this page are really interesting, as what I didn't realise earlier is that it's not just the login on Adobe's main website which is not working; their whole cloud system is down, which means there are lots of frustrated users who can't use their rented software/services, many of whom use it professionally and are losing their business/deadlines.  PCWorld has also published a brief report on this outage. The Creative Cloud's status page shows that basically everything is down!

Overview of services not working

This shows that how unstable the cloud based services and software can be, even when it is being maintained by world class engineers for large corporations like Adobe. This is not an isolated instance and such service outages are more common on cloud based systems for smaller organisations. Despite the well known issues with such systems, many organisations and government bodies are moving to cloud based IT infrastructures; the decisions are made by managers who are not made aware of the issues and only told how much (theoretical) savings can be made, but all these savings are then offset by lost work time in episodes of system outages during which the staff cannot do any work because everything these days is computer based. In a previous organisation that I worked for, the staff was getting regular breaks ranging from half an hour to several hours due to such outages, which prove to be very frustrating if there is any urgent work to be done, but on the bright side, the staff gets regular paid breaks to socialise which may reduce stress levels at work.

Last updated on Saturday 16/8/2014, 1:45am

Locations of visitors to this page