30


Rethinking the Transcription of Ottoman Texts:

The Case for Reversible Transcription


Copyright 2008 Walter G. Andrews, Murat Inan, Sevim Kebeli, Stacy Waters. All rights reserved.

The “digital revolution” has changed everything we do from graphic arts and publishing to cinema and video games, from “text messaging” on cell phones and email to “surfing” the internet and on-line research. The possibilities for employing digital technologies have been proliferating at a dazzling rate. The effects of this revolution have even seeped slowly into the world where highly trained Ottomanist scholars struggle to preserve, make available, and interpret the often hand-written texts of the past. However, even though most Ottomanist scholars have adopted computers whole-heartedly for writing and research, not much thought has been given to reassessing the fundamentals of what we scholars do in the light of the many possibilities presented by the existence of digital texts and tools.

In Ottoman textual studies and especially in the preparation and publication of texts much of what we do has been determined by historical forces and the tools available to us. For example, the Turkish cultural revolution and language reform brought with it the need to convert the cultural record of the Ottoman past into a form accessible to people who were unfamiliar with the Arabic script. As a result, a wide variety of transcription schemes were developed to represent the Arabic script letters used by the Ottomans and some form of Turkish pronunciation in a subset of the new Turkish alphabet. When publication in Turkish began to use a Latin script exclusively, typesetting in the Arabic script became a lost art, publishing in the Arabic script was officially discouraged, and only a tiny cadre of experts retained the ability to read Arabic script documents with any ease.1

One consequence of this situation is that the “authoritative” text of Ottoman documents and especially literary works has grown increasingly distant from the original, Ottoman manuscripts. Because the Arabic/Ottoman [A/O] script does not indicate some vowels, because Turkish vowels written as “long” Arabic vowels are transcribed as “short”, and because several A/O script vowels and consonants have multiple possible readings, there is no way to create a readable (letter for letter) transliteration of an Ottoman text. Every Ottoman text transcribed into a Latin alphabet contains major interpretive interventions by the transcriber. There is no possibility of reading back from any current form of transcription to the original text. This means that there is no straight-forward way for either the transcriber or later readers to verify the accuracy of a transcription other than by tedious and often prohibitively difficult reference to the original manuscript. This means that scholars using edited and transcribed texts are forced to rely heavily—and perhaps even excessively—on the judgments of transcribers. Moreover, in the case of a literary culture where resemblances in the shapes of words and letters were a significant part of the rhetorical universe, the inability to reference the A/O script text in any way risks a substantial loss of information.

Prior to the digital age, employing the traditional tools of print publication, there was obviously no practical way to create and publish a transcribed edition, an A/O script edition, and, perhaps, a more broadly accessible modern Turkish edition of the same manuscript. Even the publication of a Latin script edition alone has become relatively expensive and is usually heavily subsidized by the Turkish Government.

But the world has changed. New possibilities abound and it is time to re-think the future of textual scholarship.


1. Text edition in a digital world: reversible transcription

Imagine this: you are editing or just transcribing an Ottoman text. When you are finished you can, in a matter of minutes, produce a version of the text in your favorite Latin transcription scheme [or in several schemes]. But you can also produce an A/O script version and a version transcribed into modern Turkish. The A/O script version is created directly from your Latin transcription and follows the original text line by line and letter by letter. It can be laid alongside the original to see if your transcribed text matches the original exactly. People using your edition or transcription can access both the Latin and A/O script versions. Any of these versions can easily be output as camera-ready copy for printing.

In addition, all versions of the text are digital and so are searchable in a variety of ways. For example, one could easily locate words and phrases and identify where they occur by page and line in all versions including the original manuscript (or manuscripts). If the transcription process included editing, tagging, and manuscript comparison details, one could display transcriptions of multiple significant manuscripts of an individual text both in transcription and in the A/O script. The digital text we display could also account for marginal notes and be keyed to a digital image of the original manuscript(s) for immediate display of the manuscript source for any line. And these are just the kinds of things we can conceive of now. The only thing we can say for sure today is that in the digital future scholars will manipulate texts and textual data in ways that we cannot even imagine.

The Ottoman Texts Archive Project (OTAP) group has been discussing the possibility of producing digital A/O script texts directly from transcriptions of manuscript sources since 2002-2003. The issue of “reversible” transcription was raised publically at an OTAP seminar for faculty and advanced students in Ottoman textual studies held at Bilkent University in 2006. Since then, an OTAP team has been working to develop prototypes for a reversible transcription scheme that could be applied to Ottoman texts (and documents) of every kind from any period in Ottoman history. The general procedure we are now proposing for applying a reversible transcription scheme is quite simple and straightforward.


1. The first step is the creation of a transcription by an expert transcriber in much the same way that transcriptions have been made for many years. The only difference is that the expert would use a reversible transcription alphabet which will be more detailed and will conform to an absolute standard (although this does not mean that the final outputs will need to be similarly standardized).

2. The second step will be to employ a simple conversion file [a SED file or MS WORD macro in the OTAP scheme] to convert the transcription back to a digital A/O script text that can be compared to the original manuscript for verification and correction.

3. The third step is the correction process. If the reversed text does not exactly match the manuscript, the transcription must be changed until an exact match is achieved.

4. The fourth step is the production of desired outputs which could include the usual Ottoman transcription, a modern Turkish transcription, and an A/O text.


Reversible Transcription: Flow Chart



The only steps in this scheme that involve substantial time and effort on the part of the transcriber/editor are the first transcription and the correction/verification (steps 1 and 3). Everything else is done mechanically through the use of conversion files that can be adapted to produce any desired output.


2. A prototype scheme for reversible transcription

During the past year (2007-2008), University of Washington graduate students Murat Umut İnan and Sevim Kebeli have been working with the Ottoman Texts Archive Project under the guidance of Prof. Walter G. Andrews and Digital Humanities Specialist Stacy Waters to develop a prototype scheme for reversible transcription. İnan has been experimenting with the text of a seventeenth century manuscript copy of a mesnevi [narrative poem in rhyming couplets] entitled Şāh u Dervīş composed by the poet Güftī-i Edrenevī (d. 1677) in 1650 and copied by a scribe named Yahyā. Kebeli is part of a group guided by Prof. Selim S. Kuru, which is preparing a modern edition of Namık Kemal’s pioneering 19th century novel İntibāh. By focusing on texts widely separated by time, genre, and literary sensibility the OTAP group hopes eventually to encounter (and solve) many of the problems that reversible translation presents.

In this paper we are going to describe the development process we have gone through up to this point. At this stage, we do not claim to have created and tested a comprehensive system nor do we have at our disposal all of the tools necessary to make reversible transcription as functional as we believe it will be in the near future. Our hope is that among the readers of this article we will find Ottomanist scholars willing to join us in the project by applying reversible transcription to their texts and by helping us to test, refine, and develop the scheme. For now we would like to demonstrate the steps by which we have created a prototype for approaches to reversible transcription.


a. The Güftiī manuscript and transcription


At the time we started thinking about reversible transcription, İnan had already transcribed the Güftī text for his Master’s thesis and had subsequently converted it to Unicode [UTF 8] characters and tagged it for display on the web.2 Below is a sample of the final web version of the transcribed text:



65

a

Ḳaṣr˖ı ẖayāle dil˖i bī-iẖtiyār


b

Oldı bu āyīn ile süllem-şümār


66

a

Çarẖ-ile min-baʿd idüp āştī


b

İtmeye ḳaṣd˖ı ʿalem-efrāştī


67

a

Āẖir olup gūşe-nişȋn˖i ferāġ


b

Urmaya ḥırmān ile cismine dāġ


68

a

Olmaz ise şemʿ˖i emel reh-nümūn


b

ʿĀşıḳa bes şuʿle˖i dāġ˖ı derūn


69

a

Olmasa gül zīver˖i destār ger


b

ʿĀşıḳa yetmez mi olan dāġ˖ı ser

In order to test a reversible transcription system on this text, İnan returned to the manuscript and retyped the passage above using MS Word in our first prototype of a reversible alphabet.


THE MANUSCRIPT [couplets 65-68]3



65


66


67


68


69



The resulting reversibly transcribed text typed in a version of OTAP’s lower ASCII digraph character set looks like this:


{K!}a+{s!}r{..}{i.}+ {h-}a+y{a=}le* di+l{..}i+ b{i}{-}!i{h-}ti+y{a=}r++

{!O}ld{i} bu {a==}y{i=}n !iile* s{u:}+le+m{-}{s,}{u:}+m{a=}r++


{C,}a+r{h-}i+le* mi+n{-}ba+{@}d {!i}d{u:}{p} {a==}{s,}t{i}++

{!I.}tme+ye* {k!}a+{s!}d{..}{i.}+ {@}a+le+m{-}!efr{a=}{s,}t{i}++


{A==}{h-}i+r {!o}lu{p} {g}{u=}{s,}e*{-}ni+{s,}{i=}n{..}i+ fe+r{a=}{g.}++

{!U}rma+ya* {h!}{i.}+rm{a=}n {!i}le* ci+smine* d{a=}{g.}++


{!O}lma+z {!i}se* {s,}e+m{@}{..}i+ !eme+l re+h{--}n{u:}+m{u=}n++

{@}{A=}{s,}{i.}+{k!}a* be+s {s,}u+{@}le*{..}i+ d{a=}{g.} {..}{i.}+ de+r{u=}n++


{!O}lma+sa* {g}{u:}+l z{i=}ve+r{..}i+ de+st{a=}r {g}e+r++

{@}{A=}{s,}{i.}+{k!}a* ye+tme+z m{i} {!o}la+n d{a=}{g.}{..}{i.}+ se+r++

The digraph codes clearly make the result look formidably complex. The reason for using them is to provide a stable base (always the same) from which any output is possible and to use only characters that are common to every platform and that are accessible by any standard program. One can generally interpret the characters by noting, for example, that {k!} => or {s,} => ş and so on. What differentiates reversible coding from a “normal” transcription is the special transcription characters that identify the exact nature of the A/O script character that is being transcribed. In order to clarify this, let us look at some of the base principles that will apply to any reversible transcription system.


1. The system must identify any character information added by the transcriber/editor. For example, in the A/O script many vowels do not appear and are added to Latin alphabet transcriptions by the transcriber. In the transcription above all such additions (which we refer to as “conjectural vowels”) are followed by a “+” sign. When the transcription is reversed, these will be stripped out. Thus, the word transcribed as {k!}a+{s!}r{..}i+ [ḳaṣr˖i] will reverse to {k!}{s!}r [قصر].


2. The system must accurately distinguish between vowels and consonants that appear to be similar either in the transcription or in the original A/O script text. For example, we must be able to distinguish between “ol” and “evvel”, which means that there must be a special symbol for the initial “o/ö” in Ottoman Turkish words (in the Güftī case {!o} which reverses to او as in the second mısra above). Given that Ottoman Turkish incorporates Turkish, Persian, and Arabic elements this need for specificity requires a much larger set of alphabet choices than the normal transcription alphabet provides.


In the case of İnan’s sample, the number of special characters was kept to a minimum:

1. The “vocalic he [ه]” is represented as a*/e*.


2. There is a special “i” {i} used for Turkish endings represented by a “long” Arabic script “yā” (as in the question particle “mi” in couplet 5, mısra 2 above). This anticipates special characters for the vowels of Turkish words when they are represented by A/O script characters. (We will elaborate on this in a later section.)


3. Special hyphens: {-} for when elements are combined that are written separately in the A/O script [bī-iẖtiyār/b{i=}{-}i{h-}ti+y{a=}rبي اختيار ]; {--} when the elements are written together [reh-nümūn/re+h{--} n{u:}+m{u=}n رهنمون].


4. A special {p} that reverses to “b/ب” when it is final as in “olub/p”. This anticipates the possibility of a set of special characters that accurately reverse the “stopped or final” unvoiced consonants in Turkish pronunciation where the transcriber wants to reflect pronunciation. [ For example, Mehmet/Mehmed, olub/olup, reng/renk, etc.].


After transcribing the text, İnan saved his Word document as a plain-text file (in order to eliminate the invisible formatting commands that Word adds to a document). He then created two SED files: one to remove the “tags” that directed the formatting of his file and one to convert the transcribed file back to the A/O script.

SED is a special “stream editor” that automatically modifies files. It takes text input, performs designated operations on it and outputs the modified text. For example, İnan’s SED file for removing the formatting tags in his Güftī file [gr.sed] looks like this:


s/<couplet>\(.*\)//g

s/<\/hem>\(.*\)//g

s/<hem>\(.*\)//g

s/<hemno>\(.*\)//g


Below are a few lines from the much longer file (see appendix ??? for the whole file “intrev_trns.sed”) that actually reverses the transcription characters:


[deleting the vowels added by the transcriber]

s/e+//g

s/a+//g

s/i+//g

[reversing the transcription characters to A/O script]

s/{h!}/\&#1581;/g4

s/{h-}/\&#1582;/g

s/h/\&#1607;/g

s/H/\&#1607;/g

[The format is “replace/symbol 1/with symbol 2/globally”. In this case the second set of symbols are the UTF 8 (Unicode) “big numbers” symbols for Arabic script characters. In the deletion case, note that the symbol is being replaced by nothing.]


İnan then ran his text through both SED files to produce the following html output that could be displayed as a browser page or saved as a text document.5


قصر خياله دل بى اختيار

اولدى بو آيين ايله سلم شمار


چرخله من بعد ايدوب آشتى

ايتميه قصد علم افراشتى


آخر اولوب کوشه نشين فراغ

اورميه حرمان ايله جسمينه داغ


اولمز ايسه شمع امل رهنمون

عاشقه بس شعله داغ درون


اولمسه کل زيور دستار کر

عاشقه يتمز مى اولن داغ سر



b. The İntibāh Text and Transcription


In order to broaden the historical range of our experiment, Sevim Kebeli, at the same time, engaged in a preliminary exploration of a reversible transcription of Namık Kemal’s novel İntibāh first published in 1876. Like the Güftī text, the İntibāh text had already been transcribed into a Latin Unicode font by a University of Washington team preparing a new edition of the text under the direction of Prof. Selim S. Kuru. Although some (but not all) of the first standard transcription could have been converted automatically to reversible characters, the test required retyping the text to create a fully reversible base transcription. The first (Unicode) transcription of the test passage was as follows:

Eger bedr ise eṭrāfına bir ṣarı hāle daġıdır ki bizim gibi mehtābıñ da bir ʿālem olduġunu bilenler daẖī felekde aḳ beñizli bir ḳız pencereden aşaġı sarḳmış. Ṣırma ṣaçlarını çehresiniñ eṭrāfına daġıtmış. Zemīniñ ārāyiş-i rengā u rengini temāşā ediyor ẓann eylese taʿyyīb olunmaz.


The retyping of the text required a return to the text and a close analysis of the disparities between the visible (A/O script) original text and the phonetics of the transcribed text. The original text of the passage is the following:6



The following is the reversibly transcribed text typed in a combination of the OTAP lower ASCII digraphs and Unicode transcription characters:7


!Ege+r be+dr {!i}se* eṭrāfı+na* bi+r ṣarı hāle* {d!}aġıdı+r ki* bi+zi+m

gi+bi me+htābı+ñ da* bi+r ʿāle+m {!o}ld{u::}ġu+n{u::} bilenle+r da+ẖī fe+le+kde* aḳ be+ñi+zli bi+r ḳız pe+nce+re*de+n aşaġı ṣarḳmı+ş. Ṣı+rma*

ṣaçla+rını çe+hre*si+ni+ñ eṭrāfı+na* {d!}aġı+tmı+ş. Ze+mīni+ñ

ārāyi+ş{--}i+ re+ngā {u:} re+ngini te+māşā {!e}diyor ẓa+nn+ eyle+se* ta+ʿyy+īb {!o}lu+nma+z.

The transcription was then reversed to the following output using a SED file as in the case of the Güftī transcription.


اگر بدر ايسه اطرآفنه بر صاري هآله طاغيدر كه بزم

گبي مهتآبک ده بر عآلم اولديغني بيلانلر دخي فلكده اق

بکزلي بر قيز پنجره دن اشاغي صارقمش. صرمه

صاچلريني چهره سنک اطرآفنه طاغتمش. زمينک

آرآيش رنگآ و رنگيني تمآشآ ايدييور ظن ايلسه

تعييب اولنمز.


In this case, we have left the reversed transcription as it appeared in an early stage before the SED files for Güftī (which actually represent a later stage) and İntibāh could be reconciled. One can observe, in this text, the kinds of problems that we encountered at every stage of the project. For example, the usual form of a final “yā” in Arabic (ي ) has two dots below, which is not proper for final “yā” in Ottoman, and the final and medial “long” elifs came out with “medd” above them. These are only problems because there is no special Unicode font that applies directly to Ottoman Turkish. They are also the kind of problems that we have so far been able to solve at every stage.

The İntibāh text, however, raises issues that are far more serious because they directly involve the history of Turkish phonology. This is not our area of expertise, although like anyone else involved in Ottoman literature, we are aware that over the long period during which Western Turkish was represented in the A/O script there were significant shifts in the vowels (and some consonants) in Turkish words. Because these shifts in pronunciation were not generally accompanied by shifts in the spelling of Turkish words, by the time we get to the İntibāh text (and actually quite some time before that), what is visible on the written (or printed) page and the way it was pronounced are often quite different. As a result, we began to see that we would need special characters that could output both phonologically accurate transcriptions and still reverse to match the accepted orthography of the original texts.

This was a significant issue that Kebeli encountered in reversing the İntibāh text. Her solution was to create special characters for vowels and consonants whose pronunciation (and hence transcription) varied from their A/O script representation. For example, the word اولديغنى in the second line is written ōldīġ[ı]nī ([ı] is a conjectural vowel) but pronounced olduğunu. In this case, she suggested a special vowel {u::} that would convert to “u” in transcription and reverse to “yā” (ي ). Another example involved the letter “ṭā (ط) which would retain its usual pronunciation in Arabic words such as “eṭrāfına (اطرافنه) in lines one and four but would be pronounced as “d” in Turkish words such as “ṭāġ => dağ” (طاغ) in line four. Her solution was a special “d” character ({d!}) that would reverse to “ṭā (ط).

c. Toward a “complete” scheme:


The extensive work done by Kebeli and İnan, isolated for us a set of special problem-types and solutions that a more complete reversible transcription scheme would need to take into account. As far as we can see at the moment, these include:

1. Identification of elements added by the transcriber [e.g. the “conjectural vowels”, dashes, apostrophes, etc.].

2. Initial Turkish vowels indicated by two letters in the A/O script [e.g. the initial vowels in “olmak” اولمق or “itmek” ايتمك ].

3. Initial “short” Arabic vowels that are indicated only by “elif” [e.g. the initial vowel in “uṣūlاصول].

4. Turkish vowels represented by “long” vowel letters in the A/O script.

5. A/O script letters that are used in special ways in Ottoman [e.g. the vocalic “he” (ه) which is always “final”, the final “ya” without the two dots, when it is pronounced “d”].

6. Final voiced consonants that stop to unvoiced [e.g. b => p, d => t, c => ç].

7. Turkish vowels that are not pronounced as they are written in the A/O script [e.g. “ya” when it vowel harmonizes to “ı” or “u” or “ü”].

8. Reversible transcription of passages in Arabic, which requires the ability to reverse transcriptions containing Arabic grammar as well as to output acceptable standard Ottoman transcriptions of Arabic.

9. Arabic and Persian diacritics and special characters [e.g. the “dagger elif”, “elif maksura”, “tā marbuta”, “shedda”, hamza when it indicates the Persian izafet over “vocalic he” or “ya”, the vowels following the Persian digraphw” as in “ẖī şخويش].

10. Punctuation marks in both the earlier manuscripts and later (19th c.) printed texts were not explored in this study but they are certainly something that a complete scheme should eventually account for.

11. The scheme should account also for such elements as marginal notes, crossed-out words, multiple spellings of words, editorial marks, etc.


A more general concern involved questions about how we would represent the extensive character set that would be necessary to address all of these points. Because the base text needs to be in lower ASCII symbols—which eliminates both some Modern Turkish characters [e.g. ı, ü, ö, ğ, ç, ş, etc.] and all of the extended Ottoman transcription characters [e.g. ġ, , ]—we began by using the lower ASCII digraphs that we use for preserving the base texts in our OTAP archive. As can be seen by the examples above, the digraphs rapidly become unwieldy and unreadable when they are expanded to represent all of the characters that reverse transcription requires. Moreover, entering each digraph involves correctly combining at least 3-4 characters, which is time-consuming and can result in many simple errors.

The solution to this problem turned out to be close at hand. By chance, OTAP researchers were also in the process of exploring the use of a variety of text-analysis tools for the study of Ottoman texts. What we found was that many useful tools are not, at present, Unicode-compatible. This drove us to a simple solution that employed numbers to differentiate the non-ASCII characters [for example: ç => c2, => h2]. This scheme had significant advantages. The characters were easy to convert, they did not contain any characters that might serve as control characters in other programs (which is the case with the digraphs) and they were readable in their raw form. For example, one of the programs we were working with was a web-based program (Tag Cloud) that creates word lists with numbers of occurrences from lower ASCII texts and then demonstrates relative frequencies visually by representing the vocabulary in different font sizes in a tableau (Tag Crowd).8 The following is an example of a Tag Crowd image generated from Necātī’s gazels.



As can be seen, the text is readable in this form with a few exceptions (e.g. ‛işḳ6i2k2which is a little obscure because of the use of the number “6” for “ayin”). Our present reversible transcription “numbers” scheme is quite different and vastly more complex than the numbers version we created for text-analysis but the underlying principle is the same.

We began our experiment with the “numbers scheme” by converting a rough modern Turkish transcription of a passage from the Dream Book of Murad III done by Dr. Özgen Felek as part of her dissertation and forthcoming edition. We chose it because the manuscript was both very readable and fully voweled. The original transcription was as follows (with some expansion of abbreviations used by Dr. Felek and reformatting to make the lines match the lines of the original text):


Seadetlü pederüm beyne’l-yakaza müşahede oldı kim

bir at gelür idi ve cemi’ alatı mükemmel yalnız gelür yanında kimesne

yoktur amma bir atdur ki görülmüş değüldür nida geldi kim bizim sana

bahşişimizdür biz dahi binerüz andan nida gelür ki bunun yiyeceği

duadur tesbihdür zikirdür benüm seadetüm alup bizi kıbleden yana

götürür Ferman sultanumundur


We then consulted an image of the manuscript and added numbers converting all upper ASCII characters and then converted the upper ASCII Modern Turkish characters to numbered lower ASCII characters. The next stage was to correctly number all of the vowels represented by A/O script letters. The conjectural vowels were left un-numbered which marked them for deletion in the reversal process. All of this was done according to the charts found in Appendix ???. The image of the original text we used is the following:










The reversible transcription is the following:


Se'2a1detlu6 pederu0m beyne '4l-3yak3az3a0da0 mu0s2a1hede0 o2ldu8kim

bir a3t gelu7r i2di6 ve cemi1'2 a2la1ti6 mu0kemw2el ya7ln6i0z gelu7r ya7ni0nda0 kimesne0

yo6k3t7ur a5mw2a1 bir a3tdur ki8 go7ru0lmu0s2 degu0l nida1 geldi6kim bizim san6a7

bah5s2i1s2 mizdu0r biz da7h5i6 bineru0z a5ndan nida1 gelu7r ki8 bu6nun6 yiyecegi6

du'2a1dur tesbi1h3du0r z5ikirdu0r benu0m se'2a1detu0m a3t a3lu6b7 bizi6 k3i0ble0den yan2a7

getu6ru0r ferma1n sult3a1numun6 dur

The reversible transcription text was then reversed using a SED file with the following result:


سعا دتلو پدرم بين اليقظه ده مشا هده اولديكم

بر آت كلور ايدى و جميع آ لا تى مكمّل يالکز كلور ياننده كمسنه

يوقدر ا مّا بر آتدر كه كورلمش دكل ندا كلديكم بزم سکا

بخشيش مزدر بز داخى بنرز ا ندن ندا كلور كه بونک ييجكى

دعا در تسبيحدر ذكردر بنم سعا دتم آت آلوب بزى قبله دن يكا

كتورر فرما ن سلطا نمک در


In order to complete the experiment, we also created a small SED file that automatically generated the standard Ottoman transcription text below. A similar SED file could be made to produce a modern Turkish version as well.

Seʿādetlu pederüm beyne 'l-yaḳaẓada müşāhede oldukim

bir at gelür idi ve cemīʿ ālāti mükemmel yalñız gelür yanında kimesne

yoḳtur ammā bir atdur ki görülmüş degül nidā geldikim bizim saña

baẖşīş mizdür biz daẖi binerüz andan nidā gelür ki bunuñ yiyecegi

duʿādur tesbīḥdür ẕikirdür benüm seʿādetüm at alup bizi ḳıbleden

yaña getürür fermān sulṭānumuñ dur


Our experimental work shows that it is quite possible to create and employ a reversible transcription system that will serve as a base text from which an A/O script text, an Ottoman transcription text, and a Modern Turkish text can be generated automatically. This base text will also preserve significant linguistic and orthographic data that is lost in the present system of transcribing and will result in more accurate and reliable texts. Also, until the developers of text-analysis and concordancing tools create Unicode compliant versions, the base reversibly transcribed text can be used universally and the results can automatically be converted to a standard transcription. Moreover, the reversible transcription alphabet is entirely in lower ASCII characters and needs no special fonts or characters, which means that, in theory at least, a text could be transcribed on any standard keyboard including those found on devices such as mobile phones and PDAs.

The only negative aspect of the reversible transcription scheme—other than asking scholars to transcribe in a new way—appears to be the large number of characters it requires and the daunting complexity that first meets the eye. It will undoubtedly take a transcriber more time to create a reversible transcription because it will require that the transcriber think about and identify the function of every letter transcribed.

Our view is that these are not as formidable barriers as it might seem for a number of reasons:

  1. We have organized the alphabet codes into logical categories that make it easier to remember commonly used characters. [See Appendix ??? below.]

  2. Our experience is that one quickly becomes familiar with the system and speed in entering the text increases with practice. The number of characters might be large but for scholars who have already mastered the Latin, A/O, Arabic, and Persian Alphabets this should not be an insurmountable challenge.

  3. We have already produced a prototype MS Word macro that will convert Unicode Transcription and Modern Turkish characters automatically, so it would be possible to transcribe in two passes: one that creates a rough translation (essentially Modern Turkish with A/O transcription characters for the consonants) that is then converted, and a second pass that adds the coding for the vowels and special characters.

  4. The pay-off is directly proportional to the time spent on the transcription. More time is spent because more data is encoded and that data is all recoverable. The more complete transcription allows not only multiple outputs but kinds of textual studies that have hitherto been impossible or prohibitively difficult.


As we mentioned at the outset, the work described in this article is a first step, a study to determine if it is possible and practical to do reversible transcription. We believe, after having come this far, that it certainly is. However, we are not in a position to develop, test, and refine a reversible transcription scheme without the participation of a much larger team of scholars who would willing attempt using and testing the scheme. Larger-scale projects will undoubtedly discover problems with the present scheme but (also undoubtedly) a larger group of expert transcribers will be able to come up with innovative solutions that our small group could never discover. We welcome and encourage the participation of scholars who will use our scheme and coding to produce reversible texts or sample parts of texts. We will work with them to solve any problems they might encounter and will gladly convert their texts for them. We also intend to make our conversion files [SED files] and charts available for downloading so that scholars can work with them on their own.

The digital revolution makes it possible for us to work together from anywhere on the globe to create projects that none of us could do on our own. So we ask our colleagues to consider this article an invitation to a joint project from which we all can benefit. In order to facilitate experimentation on the part of other scholars, we will provide access to the macros and SED files we are using on the OTAP website.

APPENDICES









Appendix 1:

Number Coded Reversible Transcription Charts



A. The Characters and Codes

OTAP

Reversible Transcription Character Chart

CHARACTER

DESCRIPTION

CHARACTER/

UTF 8 NUMBER

REV. NOS.

CODE

A/O SCRIPT

UTF 8 NUMBER

INITIAL VOWELS




[a]




[initial a (short)]

a/A

a5/A5

ا +1575

[initial ā (Arabic)]

+257 / 256

a2/A2

آ+1570

[initial a (Turkish)]

a/A

a3/A3

ا +1570

[e]




[initial ė (double)]

+117/+116

e2

اى +1575; +1610

[initial e] (short)

e

e5/E5

ا +1575

[ i ]




[initial i (double)]

i

i2/I2

اي +1575+1610

[initial i (short)]

i

i4/I4

ا +1575

[ ı ]




[initial ı (double)]

+305 / 304

i3/I3

اي +1575+1610

[initial ı (short)

+305 / 304

i5/I5

ا +1575

[o]




[initial o (double)]

o

o2/O2

او +1575+1608

[initial o (short)]

o

o4/O4

ا +1575

[ö]




[initial ö (double)]

+246 / 214

o3/O3

او +1575+1608

[initial ö (short)]

+246 / 214

o4/O4

ا +1575





[u]




[initial u (double)]

u

u2/U2

او +1575+1608

[initial u (short)]

u

u4/U4

ا +1575

[ü]




[initial ü (double)]

+252 / 220

u3/U3

او +1575+1608

[initial ü (short)]

+252 / 220

u5/U5

ا +1575

FINAL/MEDIAL VOWELS




[a]




ā [long]

+257 / 256

a1/A1

ا +1575

a [A/O script character]

a

a7 / A7

ا +1575

ǎ [as in خواب]

+462/+103

a8

وا +1575 +1608

[vocalic he]

h

a0

ه +1607

[dagger elif]


a6

ٰ +1648

[elif maksur]


a9

ى +1610

[e]




ė

+117

e2

ى +1610

[vocalic he]

h

e0

ه +1607

[ i ]




i [long]

+299 / 298

i1/I1

ي +1610

i [Turkish]

i

i6/I6

ي +1610

i [final]

i

i6/i6[space]

ى +1609

[“ki” he]

h

i8

ه 1607

ǐ [as in خويش]

01DO/012D

i9

وى +1608 +1610

[ ı ]




ı

+305 / 304

i0


ı [Turkish]

+305 / 304

i7/I7

ي +1610

ı [final]

+305 / 304

i7/I7[space]

ى +1609

[o]




o [long]

+333 / 212

o1/O1

و +1608

o [Turkish]

o

o6

و +1608

[ö]




ö

+246 / 214

o0


ö [Turkish]

+246 / 214

o7

و +1608

[u]




u [long]

+363 / 362

u1

و +1608

u [Turkish]

u

u6

و +1608

u [final as yā]

u

u8

ى +1609

[ü]




ü

+252 / 220

u0


ü [Turkish]

+252 / 220

u7

و +1608

ü [final as yā]

+252 / 220

u9

ى +1609

CONSONANTS




b

b

b

ب +1576

b [stops to “p”]

b

b7

ب +1576

c

c

c

ج +1580

c [stops to “ç”]

c

c7

ج +1580

ç

+231 / 199

c2 / C2

چ +1670

d

d

d

د +1583

d [stops to “t”]

d

d7

د +1583

d [reverse to ṭ]

d

d6

ط +1591

+7693 / 7692

d3 / D3

ض +1590

f

f

f

ف +1601

g

g

g

ك +1603

g [stops to “k”]

g

g7

ك +1603

ğ /{g=}

+287 / 286 ?

g6 / G6

ك +1603 [REV. ONLY]

ġ /{g.}

+289 / 288

g4 / G4

غ +1594

h

h

h

ه +1607

+7717 / 7716

h3 / H3

ح +1581

+7830 / 7722

h5 / H5

خ +1582

j

j

j

ژ +1688

k

k

k

ك +1603

+7731 / 7730

k3 / K3

ق +1602

l

l

l

ل +1604

m

m

m

م +1605

n

n

n

ن +1606

ñ

+241

n6 / N6

ک +1705

p

p

p

پ +1662

r

r

r

ر +1585

s

s

s

س +1587

ş

+351 / 350

s2 / S2

ش +1588

+7779 / 7778

s3 / S3

ص +1589

ŝ [under-bar]

+818

s5 / S5

ث +1579

t

t

t

ت +1578

[tā marbūta]


t7

ة [0629] ?

+7789 / 7788

t3 / T3

ط +1591

v

v

v

و +1608

y

y

y

ي +1610

z

z

z

ز +1586

+7827 / 7826

z3 / Z3

ظ +1592

ż

+380 / 379

z4 / Z4

ض +1590

+7829 / 7828

z5 / Z5

ذ +1584

ʻ [‘ayin]

+703 / 8217 / 39?

'2

ع +1593

ˀ [hamza]

+704

'3

ء +1569

SPECIAL CHARACTERS




separation dash

-

-1


connection dash

-

-2


Arabic article dash

-

-3


izafet

+903 / 726

-4

none

hamza with “he/yā”


4

ٔ 0654

shedda


w2

ّ 1617








B. The Consonants

Number Code: reversible Characters Chart 3 [suggested consonants]

CONSONANTS

CLASS

CEDILLA

DOT UNDER

DOT OVER

BAR UNDER

TILDE

OTHER

NUMBER

2

3

4

5

6

7

b






[stops to “p”]

c

ç





[stops to “ç”]

d





[stops to “t”]

f







g



ġ


ğ [reverse ک]

[stops to “k”]

h





j







k






l







m







n





ñ


p







r







s

ş


ŝ



t




[reverse to ṭ] d

[tā marbuta]

v







y







z


ż




C. The Vowels

Number Code: reversible Characters Chart 2 [suggested vowels]

VOWELS

CLASS

UMLAUT

LONG

INITIAL

[double

character]


INITIAL

[double

character]

INITIAL

[short

single

character]

INITIAL

[short

single

character]

TURKISH

VOWEL

[medial,

final]

TURKISH

VOWEL

[medial,

final]

FINAL

and

other

FINAL

and

other

NUMBER

0

1

2

3

4

5

6

7

8

9

o/O

ö/Ö

ō/Ō

o/O

ö/Ö

o/O

ö/Ö

o

ö



u/U

ü/Ü

ū/Ū

u/U

ü/Ü

u/U

ü/Ü

u

ü

u

ü

i

ı

ī/Ī

i/İ

ı/I

i/İ

ı/I

i

ı

ki” he

ī [خويش]

a/A

[voc. he]

ā/Ā

ā/Ā

a/A


a/A

[dagger elif]

a

ā [خواب]

[elif maksura]

e/E

[voc. he]


ė



e/E





Notes:


Examples of vowels



Note that the large proliferation of vowels is required by the fact that Ottoman Turkish represents vowels by letters in cases where the vowels are not necessarily “long” by prosodic standards. By giving these vowels special representation, the scheme allows transformation into “normal” Ottoman transcription systems while still retaining the exact A/O script spelling of the words. Otherwise a reversible transcription would require Turkish words to be spelled with “long” vowels, for example, āldī for aldı or ōldī for oldu. Attending carefully to the A/O script vowels of Turkish words will eventually allow a study of the spelling (and the history of spelling) of Turkish in the A/O script.


Appendix 2:

Special Cases

A. Coding Arabic

Transcribing Arabic in an Ottoman Text

Example 1:

Ottoman Transcription:

Sübḥāne men tevaḥḥade bi’l-kevni ve’l-baḳā

Sübḥāne men tecerrede ʻan vaṣmeti’l-fenā

Reversible Transcription:

Su0bh3a1ne men tevah3w2ade bi'4l-3kevni ve '4l-3bak3a7

Su0bh3a1ne men tecerw2ede '2n vas3met7i '4l-3fena7

Reversed:

سبحان من توحّد بالكون و البقا

سبحان من تجرّد عن وصمة الفنا


Example 2:

Transcription:

bismi’llāhi’r-raḥmāni’r-rahīm

Rev. Trans.:

bismi '4llahi '4r-3rah3ma6ni '4r-3rah3i1m

Reversed:

بسم الله الرحمٰن الرحيم

Notes:

4 => elif/ ا

-3 => omit character

4*- => [elif lam ( ال ) with “shems” letters where * = any “shems” letter: d, ḍ(d3), r, s, ş(s2), ṣ(s3), ŝ(s5), t, ṭ(t3), z, ẕ(z5), ż(z4), ẓ(z3), n]




b. Special Characters

Among the special characters required for a reversible transcription scheme are the following:


1. Separation dashes:

-1 = a dash that indicates compounds that are written as separate elements in the A/O script. For example: “bī-çāre” when written as “ چاره بى.

-2 = a dash that indicates compounds that are written as connected elements in the A/O script. For example: “‛ālem-penāh” when written as “عالمپناه.

-3 = the dash in the Arabic definite article “al-“.


2. The izafet dash:

-4 = the izafet as in “şeb-i ġam (s2eb-4i g4am)” ( غم شب ).


3. Arabic diacriticals:

'4 = hamza with “he and ya (ه and ى ) as in “sīne-yi pāk (si1ne0'4 pa1k)

(پاك سينهٔ ).

w2 = shedde as in “tecerrüd (tecerw2u0d) تجرّد .

[Note: there are many more possibilities for Arabic diacriticals that could be added if desired.]


4. Other:

The “zero-width non-joiner” is a character (\&zwnj) that follows the “vocalic he (a0 and e0)” to be sure that they are always reversed to a final “he”. It is a space that does not have any width.



NOTES:

1 One notable exception to this is the posthumous publication in the A/O script of İbrahim Kutluk’s edition of the Tezkiretü’Ş-Şuarā of Kınalızade Hasan Çelebi (TTK Yay. XVIII:4, 1989).

2 The sample couplets transcribed are taken from an MA thesis entitled Edirneli Güfti Ali ve Şah u Derviş Mesnevisi (Istanbul: Boğaziçi University, Institute for Graduate Studies in Social Sciences, 2006) prepared by Murat Umut İnan.

3 The manuscript number 599/1 is held in the ‘Yapı Kredi Sermet Çifter’ Research Library in Istanbul, Turkey. A copy of the full manuscript is also appendixed to İnan’s thesis (above note 1).

4 The numbers, for example #1581, are the Unicode “big numbers” codes for A/O script letters.

5 The syntax for the SED program which is run from the command prompt of Windows is as follows: THE INPUT (the Güftī text) + THE SED MODIFIERS (gr.sed and intrev_trns.sed) =>THE OUTPUT (the reversed text). The actual command used was “sed –f gr.sed grset__utf.txt | sed –f intrev_trns.sed > grset_utf.html”.

6 The “vav” (و) marked in red is a typographical error in the printed text, which remains in the draft of the transcription but would be removed in a further editing process. [The phrase should clearly be “reng-ā-rengini”.] This does, however, raise the issue of how transcribers should deal with obvious orthographic or printing mistakes. In a digital environment, such mistakes can be recorded and kept for study without requiring that they be included in any output of the text.

7 This is different from the transcription of the Güftī text, which used digraphs for all letters.

8 Tag Cloud, Tag Crowd.