Tag Archives: FLEx

Copying between fields in two writing systems in FLEx

Last month I taught the FLEx II course at CoLang, held at UTA. It was very interesting trying to teach to about 45 students, coming for a great variety of backgrounds, but I think we all learned something. There was one thing that I taught, which I thought deserved further write-up, so I’ll do that here.
The problem is not immediately obvious, unless you spend lots of time thinking about how FLEx does what it does, in particular how writing systems work. But when you want to copy data from a field that is encoded in a particular writing system, into a field that is encoded in another writing system, you can’t just bulk copy and get results you might expect. The reason is that the data itself is tagged for which writing system it is in, and not just the field. So you can, theoretically, have Spanish data in a field that is supposed to have English. But this is actually a strength, as it allows you to tag one word or sentence in its correct language, even if it is surrounded by another language, all within the same field (like if you write a note in English, but include in the note words in another language). All data is tracked by writing system, and you don’t loose that information when you copy from one field to another.
So, the task we were working on, which you may need to do some day, was copying data from one writing sytem, to use as a base for another. For instance, if you have data in a practical working orthography, and you want to also have an IPA field, you may notice (as I hope is true) that much of your orthography transfers directly over to the IPA. And for those things that don’t, there should be regular changes (like substituting [ɸ] for ‘ph’). This task is just begging to be done through bulk editing. Why type all that over again, just to change a few phonemes here and there? Why not make most of the systematic changes systematically? But we can’t just copy a practical orthography field into an IPA field, since the data would still be encoded as the practical orthography, even if the field should contain IPA data. So here’s what you should do.
First off, I assume you have your two writing systems set up; I’m using Mbo and the IPA variant in these screenshots:

Writing sytems

Go to the Lexicon pane:

To Lexicon

Then Select Bulk Edit Entries:

To Bulk Edit entries

To be able to operate on both fields, you need to make them both visible. Click on the “Configure which columns to display” button:

To Show Columns

Then click on More Column Choices (unless your IPA field is in the list, in which case you just select it):

To More column choices

In the dialog that comes up, you selct the Lexeme form field (or whatever field you’re copying to). Yes, it is already on the right; we want to display the Lexeme form field twice, once in each of two writing systems:

Select Lexeme Form

Click Add:

Click Add

Initially you will probably have the same writing system for each of the two fields:

Both fields in same WS

change one to the IPA variant:

Pick IPA WS

then I like to move that second field up, so it will display next to the other one. While the field is selected, click on the up arrow:

Move WS field up

Then keep clicking until it is in place:

Move WS field up 2

Now that you have everything situated, click on OK:

Click OK on WS

That should take you back to the bulk edit pane, where you should see your IPA field. Assuming you’re just starting to work in this field, it should be empty:

IPA Empty

Then you Bulk edit, like normal, by selecting the Source Field (the one with data in it):

From LF

and the Target Field:

To LF-IPA

As always, you want to preview your bulk edit, to make sure it’s doing what you expect:

Preview Bulk Copy

And you should see blue arrows going from nothing to data, which matches the field next to it:

Preview Bulk Copy results

If you don’t like what you see, just click clear, and fix whatever was wrong. But if you like it, click Apply:

Apply Bulk Copy

And then you’ll have both fields filled with the same data (and no more blue arrows):

Apply Bulk Copy results

But if you select data in the IPA field, the indicator above will show that the data is NOT in IPA, but still in the other writing system:

Wrong writing system

So this is the problem we need to fix. To do this, we’re going to Bulk Replace:

Select Bulk Replace

Select the target field (Just the one we want to change writing systems on, the IPA on in this case):

Select WS_IPA

Then click on Setup…:

Setup Bulk Replace

This will give you Bulk Replace Setup dialog:

Bulk Replace Setup dialog

Where you can select in the “Find what:” box, then (click “more” if you have to, and) select Format/Writing System/xyz –whatever writing system you copied your data from:

Select From WS

My experience is that at this point, FLEx will figure out what you’re trying to do, and set the other field for you. You can verify this by seeing “Format: <Writing system name>” under each field:

Bulk Replace Setup w Langs

You don’t need to add anything to the empty fields in this box; you want to find everything. So you can just click OK. Unfortunately, I don’t see anything when I click “Preview” here, so we just trust that we’ve set it up correctly (you backed up your data before starting this, right? If not, stop and do it now.), and hit Apply.
Back in the bulk replace field, we can verify that the data in the IPA field is now indicated as IPA:

Mbo_IPA

While the orthographic field is still in the orthographic writing system:

Mbo

At this point, you can go through your IPA field and convert orthographic letters to IPA equivalents, either systematically through bulk replace (if appropriate) or manually. Then you can enjoy your dictionary database with both orthography and IPA in your entries!

Creating Tone fields in Fieldworks 7.0.6~beta7 (not useful for WeSay 0.9.28.0+)

Creating Tone Fields by the Method Native to FLEx –The Better Way

(N.B.: this entry started with FW7.05~b5 and WS0.9.28, though I’m finishing it on FW7.06~b7 and WS1.1.11. Some of the screenshots may look different between these versions, but I haven’t noticed any difference in functionality with regard to these fields.)
After creating custom fields in this way for tone and plural forms, I found that tone fields are already accounted for in FLEx, though not particularly transparently. There is a set of pronunciation fields, which can be inserted here:

This option puts the set of pronunciation fields in the record you’re editing, not the whole database. It gives tone, as well as a couple other fields. It looks like this in FLEx:

What’s nice about this is that you can do this a number of times, for the same entry. This gives you the chance to have a number of pronunciations, in different contexts –which is important in phonology, especially with regard to tone. The “Location” field is an empty, customizable field, so I presume we could put things like “Before a High Tone” or “phrase finally” or whatever there, then know that that pronunciation is valid for that context. Filling in some bogus data, we see the following in FLExː

Under the Hood

The above results in the following in the appropriate entry of the LIFT file:

<pronunciation>
<form lang=”gey”><text>ba</text></form>
<field type=”cv-pattern”><form lang=”en”><text>CV</text></form>
</field>
<field type=”tone”><form lang=”en”><text>?H</text></form>
</field>
</pronunciation>
<pronunciation>
<form lang=”gey”><text>bad</text></form>
<field type=”cv-pattern”><form lang=”en”><text>CVC</text></form>
</field>
<field type=”tone”><form lang=”en”><text>?HF</text></form>
</field>
</pronunciation>

So each pronunciation has a form/text set of nodes, and fields with type attributes for each of the visible fields with data in FLEx. Note that these fields are formatted exactly the same as the fields we created earlier here and here, that is

<field type=”NameofFieldinFLEx”>
<form lang=”LanguageCode”>
<text>Field Contents</text>
</form>
</field>

The only difference here is that the fields are under a <pronunciation> node, and not directly under the entry itself. But the fact that these fields are grouped together under repeatable pronunciation nodes should mean that we can organize contextually dependent pronunciation (tone or segmental) fields.

Sorting on Pronunciation Fields

I tried sorting on individual pronunciation nodes in FLEx, but wasn’t immediately impressed. I tried sorting the above fields for those with CVC in the cv-pattern, and this is what I got:

One can see that the entry is filtered, not the set of pronunciation fields. When working with Toolbox, it was possible to filter on either of a repeated field within an entry. Recalling that this was only when sorting on that field (therefore producing a record for each of the multiple fields), I tried that in FLEx, and it worked:

Note that there is only one pronunciation field listed, and the pronunciation form and tone fields listed are those that correspond to the CV field that was selected in the filter.
This data structure would also allow one to select only particular tone patterns, such as with an XPath expression like pronunciation[/field[@type=’cv-pattern’]/form/text = ‘CVC’]/field[@type=’tone’]/form/text to get the information in the tone field under only those pronunciation nodes that also have CV fields with ‘CVC’ in them.
Unfortunately, I haven’t been able to see these fields in WeSay (yet, I hope: see this bug report). Which is sad, because this is otherwise the best way to indicate tone in FLEx.

===Poetic Interlude===
I wrote most of the above several months ago, and had forgotten that I had worked this much out, until I ran into the problem of bulk editing on these fields. A quick Email to <Flex_Errors at sil.org>, and a fairly rapid response later, and I was back in business. When I went to write it up, I found the above in my drafts folder…
===End of Interlude===

So I’ve been doing a lot of data collection in the last couple months using the above paradigm, keeping different tone fields separate by their sibling location fields. I have XSL transforms to add this data to a LIFT file, and some reports to pull it out later, but how to mess with it in the mean time, should I need to? To get bulk editing on these fields to work, I needed two things:

  1. to sort on ‘pronunciation’ or one of it’s children (this I had apparently already figured out, but forgotten)
  2. to select the right columns for viewing in the bulk edit view.

Selecting the right columns for viewing in the bulk edit view

In case it isn’t obvious, the visible columns in the bulk edit view determine what fields you can act on. If “Lexeme” isn’t visible, you can’t copy to or from it, or modify it with a regular expression. So first, you need to make the fields you’re looking for visible, which is done through a dialog you can access by clicking in the upper right corner, with tooltip “Configure which columns to display”:

When you click on this, you get a menu of a number of (recently selected?) fields. To access other fields, to change column ordering, or to select language options, select “More column choices…” at the bottom:

This gives you access to the following dialog, where you can find fields not on the above list, select which of a number of writing systems you want to see (and therefore Bulk Edit). The Arrows on the right allow you to move the fields up and down (moving columns left and right on the Bulk Edit screen):

One trick that may not be obvious is that the ‘Tone’ field under ‘Pronunciation’ is available here as ‘Tones’. I presume this is because there are potentially a number of different Tone fields (as in my case). This is the same for ‘Location’ > ‘Locations’ and ‘CV Pattern’ > ‘CV Patterns’.

Sorting on Pronunciation Columns

Once all the fields you’re interested in are in the “current columns” (right) side of that dialog, you can select a column to sort on (showing light blue triangle). Selecting ‘Pronunciations’ gives three lines for this entry, and proclaims “Pronunciation” at the top of the page for slower ones like me.

If you’re in a context where you want to sort on two of these fields (if one doesn’t uniquely sort them, as the screenshot above), you can select one, then shift-select another, which will give a secondary sort (and a smaller triangle) as in the following:

Here the location is the first sort, then the tone. Note that the pronunciation form isn’t sorted (a…z…k…a), though the duplicate HAfter-sg field for titi is (correctly) showing up as another pronunciation/tone field (with pronunciation/form atíti nɛ) –showing that sorting by any of the pronunciation fields gives this layout.

Bulk Editing Pronunciation Fields

Getting back to the point of it all (for me, anyway), with this configuration it is now possible to bulk copy to/from these fields:

Locations didn’t show up for me under “Bulk Replace”; I’m not sure why, though that sounds familiar –perhaps I didn’t configure it right, or maybe that’s a bug.

Summary

Though tone fields created under pronunciation fields is not currently helpful for WeSay collaboration, it seems a much more principled way of treating tone data in FLEx, since it natively allows for varied contexts, CV patterns, segmental morphophonemics impacting the frame (since each pronunciation field has a form field, which can include the lexeme, frame, and any segmental interactions between them). In addition these fields are accessible to FLEx filtering and sorting, including bulk edit operations.
Given the complexity of this configuration, I would not recommend what I have described to the computer non-savvy (e.g., users more comfortable in WeSay). But for those comfortable manipulating these configurations, FLEx can be a powerful tool for manipulating tone data.

Round-tripping LIFT data through XLingpaper

Rationale

The LIFT specification allows for interchange between lexical databases we use, such as in FLEx and WeSay. As an XML specification, it is also subject to XSL transformation, and can be converted to XML documents that conform to other specifications, such as XLingPaper, an XML specification for writing linguistics papers. I described before a means to get data out of FLEx into XlingPaper, but that required a script generating regular expressions which were then put into a FLEx filter by hand (metaphorically speaking). Computers should be able to automate this, and so (following my “If computers can do a particular task, they should” motto) I developed a script to take that regular expression generator, and feed those expressions to an XSL stylesheet to produce XlingPaper XML from the LIFT XML automatically.
The other half of the rationale is that I hate exporting data from a database to a paper or report, seeing and error, and not being able to fix it once. Either I fix it in the paper and the database, or else in the database, then re-export to the paper. So a way to get data from LIFT to XlingPaper and back seemed helpful for drafting linguistics papers, even if one wasn’t dealing with the volume of reports I’m looking at generating.

Tools

One major caveat for this work is that these tools (FLEx, WeSay, and XLingpaper) are in active development, so functionality may vary over time. The tests in this post were run with the following:

  1. FLEx 7.0.6.40863 (for Linux)
  2. WeSay 1.1.9 (for Linux) –This doesn’t enter directly into these tests, but the LIFT files used often sync back and forth between these two programs.
  3. xsltproc from a standard Ubuntu Linux install (i.e., compiled against libxml 20706, libxslt 10126 and libexslt 815)
  4. GNU bash, also from standard Ubuntu Linux (i.e., version 4.1.5)
  5. GNU diffutils, also from standard Ubuntu Linux (i.e., version 2.8.1)
  6. XMLMind Xml Editor, version 5.1.0
  7. XLingPaper, version 2.18.0_3

All of these tools are free (or have a free version) and available online from their respective sources, and most are open source.
The scripts I’ve written (to generate reports and call the XSL transforms) are not yet publicly available; I hope to have them cleaned up and more broadly tested before long.

Test Goals

I want to see if I can

  1. Get data from LIFT to XLingPaper format,
  2. Modify the XLingPaper document in XXE (which keeps it in conformity to the XLingPaper DTD),
  3. Get it back into LIFT and imported to FLEx,
  4. Show that the FLEx import made all and only the changes made by modifying the XLingPaper document (i.e., no other data loss)

To do this I will be using an output of diff between two versions of the XLingPaper document (original and modified), and another diff between two versions of the LIFT file (originally exported, and exported after input). To achieve #4, I will show that the two diffs show all and only the same changes to data entries (the modifications to the XLingPaper doc are the same as the changes to the FLEx database, as evidenced by its export to LIFT). Fyi, this LIFT file has 2033 entries, and takes up almost 2MB (plain text), so we’re not talking about a trivial amount of data.

Test procedure

  1. Backup Wesay folder (this is real [gey] data I’m working with, after all…)
  2. Export “Full Lexicon” from FLEx, and copy it to gey.ori.lift
  3. Run report (vowel inventory) on exported gey.lift (This creates Report_VowelInventory.gey.xml)
  4. Open created report in XXE
  5. Modify and save (because XXE changes format –this helps diff see real changes, not those irrelevant to xml)
  6. Save as Report_VowelInventory.gey.mod.xml, and modify one example of each field we’re interested in, including @root (at this point both files have been saved by XXE, for easier comparison).
  7. Run `diff Report_VowelInventory.gey.{,mod.}xml` (results below)
  8. Run `xlp-extract2lift Report_VowelInventory.gey.mod.xml .` (This creates Report_VowelInventory.gey.mod.compiledfromXLP.lift)
  9. Backup FLEx project (just in case, as there’s real data here, too)
  10. Import Report_VowelInventory.gey.mod.compiledfromXLP.lift to FLEx project, selecting “import the conflicting data and overwrite the current data (importing data overrules my work).” and unticking “Trust entry modification times” (This is important because if that box is selected entries won’t import unless you have also changed the ‘dateModified’ attribute on an entry –which I generally don’t).
  11. Export again, producing a second LIFT file exported by FLEx (one before, and one after the import)
  12. Run `diff gey{,.ori}.lift`
  13. Compare diffs to see fidelity of the process.

Test results

Here is the diff showing the changes between the original report and the modifications:

$ diff Report_VowelInventory.gey.{,mod.}xml
11c11
< >Rapport de l’Inventaire des Voyelles de [gey]</title

> >Rapport de l’Inventaire des Voyelles de [gey]MOD</title
23c23
< >Kent Rasmussen</author

> >Kent RasmussenMOD</author
42c42
< >Voyelles</secTitle

> >VoyellesMOD</secTitle
65c65
< >mbata</langData

> >mbataMOD</langData
89c89
< >pl: mabata</langData

> >pl: mabataMOD</langData
113c113
< >fissure, fente</gloss

> >fissure, fenteMOD</gloss
137c137
< >mke / wake</gloss

> >mke / wakeMOD</gloss
155c155
< externalID=”ps=’Noun’|senseid=’hand_0d9c81ef-b052-4f61-bc6a-02840db4a49e’|senseorder=”|definition-swh=’mkono

/ mikono'”

> externalID=”ps=’Noun’|senseid=’hand_0d9c81ef-b052-4f61-bc6a-02840db4a49e’|senseorder=”|definition-swh=’mkono

/ mikonoMOD'”
171c171
< externalID=”ps=’Noun’|senseid=’orange_2924ca57-f722-44e1-b444-2a30d8674126’|senseorder=”|definition-fr=’orange'”

> externalID=”ps=’Noun’|senseid=’orange_2924ca57-f722-44e1-b444-2a30d8674126’|senseorder=”|definition-fr=’orangeMOD'”
180c180
< externalID=”root=’paka’|entrydateCreated=’2011-08-05T10:57:05Z’|entrydateModified=’2011-09-27T11:24:32Z’|entryguid=’44dcf55e-9cd7-47a9-ac66-1713a3769708’|entryid=’mopaka_44dcf55e-9cd7-47a9-ac66-1713a3769708′”

> externalID=”root=’pakaMOD’|entrydateCreated=’2011-08-05T10:57:05Z’|entrydateModified=’2011-09-27T11:24:32Z’|entryguid=’44dcf55e-9cd7-47a9-ac66-1713a3769708’|entryid=’mopaka_44dcf55e-9cd7-47a9-ac66-1713a3769708′”

As you can see from this diff output, I changed data in a number of different types of fields, including the report title, author, sectionTitle, langData (from citation), langData (from Plural), glosses in each of French and Swahili, and the last three are root and definitions, which are not visible in the printed report, but stored in an ExternalID attribute (recently added to XLingPaper to be able to store this kind of info, without having to put it elsewhere in the structure of the doc).

And here is the diff showing the changes between the original LIFT export and the one exported after importing the LIFT file with modifications:

$ diff gey{,.ori}.lift
2601c2601
< <form lang=”swh”><text>mkono / mikonoMOD</text></form>

> <form lang=”swh”><text>mkono / mikono</text></form>
10776c10776
< <gloss lang=”swh”><text>mke / wakeMOD</text></gloss>

> <gloss lang=”swh”><text>mke / wake</text></gloss>
15871c15871
< <form lang=”gey”><text>pakaMOD</text></form>

> <form lang=”gey”><text>paka</text></form>
23529c23529
< <form lang=”gey”><text>mbataMOD</text></form>

> <form lang=”gey”><text>mbata</text></form>
27587c27587
< <field type=”Plural”><form lang=”gey”><text>mabataMOD</text></form>

> <field type=”Plural”><form lang=”gey”><text>mabata</text></form>
31657c31657
< <form lang=”fr”><text>orangeMOD</text></form>

> <form lang=”fr”><text>orange</text></form>
32416c32416
< <gloss lang=”fr”><text>fissure, fenteMOD</text></gloss>

> <gloss lang=”fr”><text>fissure, fente</text></gloss>

Summary

  1. The first several MOD’s to the paper (to titles, etc.) are not in the second diff, since only example data is extracted into the LIFT file to import (this is what we want, right?).
  2. The other mods –root, citation, plural, gloss-swahili, gloss-french, definition-french and definition-swahili– all survived.
  3. No other changes existed between the exported LIFT files.

Discussion

Because FLEx exported essentially the same LIFT file (of 2033 entries and almost 2MB, remember), with all and only the changes made in XXE, I presume that there were no destructive changes to the underlying FLEx database, and this procedure is safe for further testing. I did not go so far as to diff the underlying fwdata file, as I probably wouldn’t understand its format anyway, and I wouldn’t know how to distinguish between differences in formatting and content (while it is also XML, I don’t understand its specification or how it is used in the program –which is not a bad thing).
Speaking of what I don’t know, I should be clear that my formal training is in Linguistics (M.A. Oregon 2002), not in IT. I’m doing this because there is a massive amount of linguistic data to collect, organize, analyze and verify, and I want to do that efficiently (the fact that this is fun is just a nice byproduct). In any case, I have certainly not followed best practices in my bash or XSL scripting. So if you read the attachments and think “this guy doesn’t know how to code efficiently or elegantly,” then we’re already in agreement on that. And you’d also be welcome to contribute on improvements. 🙂

Acknowledgements

I wouldn’t have gotten anywhere on this project without the work of many others, particularly including those that are giving of their own time and resources (which surely could have been spent elsewhere) on FLEx, WeSay, and the LIFT specification itself. Of particular note is Andy Black, who encouraged me to take another stab at XSLT (after telling him I’d tried and given up a few years ago), and who has provided invaluable and innumerable helps, both in the development of the XLingPaper specification, and in particular issues related to these transforms. Most of what is good here has roots in his work, though I hope no one holds him responsible for my errors and inelegance.

Problem adding custom fields in WeSay 0.9.28.0 for import to Fieldworks 7.0.5~beta5

I thought I had a system for making fields in WeSay, which would then be automatically imported into FLEx, as described here. But just today, I noted that the inability to configure those fields in FLEx is more serious than I had thought. Looking in the FLEx help, one sees:

You cannot change the location or writing system after the custom field is created.

Since custom fields created first in WeSay then imported into FLEx are created in FLEx on import, there is no way to set these options once imported. Complicating this situation, apparently FLEx isn’t taking that (at least writing system) info from WeSay during import. I noticed this when I was moving some data around, and had plural data in an English language field (and this is not an English dictionary…) I went back to WeSay to check the config there; here is the WeSay config for the plural field, above the plural field display in FLEx:

So even though I told WeSay that I just want [nlj] data in this field, on import FLEx set the field as “all analysis, then all vernacular” (which doesn’t seem to match this screenshot, but the point is that I have five language to choose from when inputing data, when I should have just one).
The reason the data is in the ‘en’ field is my fault — I bulk copied to the plural field without checking which writing system I was copying into. But I made this error because I presumed that there was only one language field (this is data, not analysis!) for it to go to. So it is reasonable to imagine others might do so as well, with a bunch of junk language fields that can’t be easily gotten rid of.

Summary

  1. This isn’t the end of the world. One can always create the fields again in FLEx, with the right options, then move all the data over with bulk edit. Needing to do so just negates the value of creating the custom fields on import, unless you don’t care what languages will be available to that field.
  2. Since WeSay clearly has the correct info, hopefully the FLEx team will see this as a bug, and correct the import to take language choice from WeSay (assuming it can understand that [nlj], in this case, is a vernacular writing system, and the systems for categorizing writing systems in WeSay (which selects on a per system basis) and FLEx (which groups languages in categories of first/all vernacular/analysis only/then the other) can be harmonized and/or made to understand one another.
  3. In the mean time, I would advise against importing custom fields from WeSay. We’ll need to take the extra steps to create the fields in each program, and hopefully get it documented clearly enough that each will see the other’s fields the first time around.

Creating a Custom Field II: in Fieldworks 7.0.5~beta5 for WeSay 0.9.28.0

Today I’m going to walk through creating a custom field in Fieldworks, and see how it looks in LIFT and in WeSay.

Fieldworks’ ‘Custom Fields’ Dialog

Creating custom fields in fieldworks is easy, if you know where to look. I created a Tone field via Tools/Configure/Custom fields:

Clicking there produces the Custom Fields dialog box, where one can set up the new field:

Here I have already added Tone and Plural fields. As far as I can tell, there are pros and cons to this method:

  1. Fields added to every record in the database (though I don’t think they take up space, at least in LIFT, until there is data in the field).
  2. Only one of these can appear in a record. I didn’t even notice this until I tried another kind of field (to come), but this may or may not be important to what you’re doing. If you want a couple tone fields for different environments (syntactic, tonal, or whatever), you would need to make them each here, or use another method (description to come).

This is what they look like in FLEx before they have been filled in (Note that I selected different options for the language of these fields):

These fields from the entry in the above screenshot didn’t show up in the LIFT file, since they were empty, but another took the following form (between lexical-unit and senses):

<field type=”Plural”>
<form lang=”gey”>
<text>baadisi</text>
</form>
</field>

And here it is in WeSay:

I saw it immediately on opening WeSay this time, since I had the field already configured earlier, like this:

Note that “Name in file” and “Name for display” are both “Plural”. This makes it a bit easier on the config, since you don’t have to keep track of a different name for the WeSay user to see as in the LIFT file (which is what you see in FLEx).
In the WeSayConfig file, you see this:

<field>
<className>LexEntry</className>
<dataType>MultiText</dataType>
<displayName>Plural</displayName>
<enabled>True</enabled>
<fieldName>Plural</fieldName>
<multiParagraph>False</multiParagraph>
<spellCheckingEnabled>False</spellCheckingEnabled>
<multiplicity>ZeroOr1</multiplicity>
<optionsListFile></optionsListFile>
<visibility>Visible</visibility>
<writingSystems>
<id>gey</id>
</writingSystems>
</field>

Note the fieldName and diplayName values each as ‘Plural.’
When adding (and therefore and naming) a new field in FLEx, that name would show in the same place as Plural in <field type=”Plural”> (the ‘type’ attribute of the field node) for that field in the LIFT file. That would be what you would need to put in the “Name in file” field of the Configuration Tool/Fields dialog above (or in the fieldName field of the WeSayConfig file), in order to see it in WeSay.
A couple caveats for creating custom fields for collaboration between FLEx and WeSay in this manner:

  1. You can’t use spaces. One of the first custom fields I made in FLEx was “Noun Class of Plural.” When I tried to create the corresponding field in WeSay, I got something like this:

    I recall FLEx being perfectly happy writing the field ‘type’ attribute with spaces into the LIFT file, but there was no way to get such a WeSay field, either through the config tool, or through editing the config file by hand. Not that I could find, anyway; perhaps a developer can contradict me here if there is.
  2. A related point is that when creating the field in FLEx first, one is obligated to then create the field in WeSay, or you won’t see it there (the data should still be preserved, but that’s not the kind of collaboration I’m looking for).

But when creating a custom field in WeSay first (As I described here), FLEx creates the field that you created in WeSay automatically. There was a limitation on the options (relative to creating a custom field in FLEx), but going in that direction removes one configuration step for each custom field. So that would depend on the kind of flexibility you need (I haven’t needed those options, yet).
Probably the first issue where I would want those grayed out options would be for fields with option lists. Even in FLEx, the instructions say to set up the options (or at least the list) first, then the field that references them. When trying to collaborate with such a field in WeSay, that would all need to be done first. But I haven’t figured out yet how to get such a field into WeSay, or if the option list fields from WeSay (e.g., POS and SemDom) can go into FLEx, or if they are incompatible data types. If someone figures that one out, please let us all know; if I get time to work on it, I’ll post here.

Notes for creating fields in FLEx’s ‘Custom Fields’ dialog to be used in WeSay

  1. Don’t use spaces in the field name.
  2. Plan on also creating the custom field name in WeSay, with the FLEx field name in the WeSay Configuration Tool’s “Name in file” field.
  3. Don’t use this method for fields that might need to appear more than once per sense/entry, or else make one for each possible iteration you need.
  4. Use this method if you need broader configuration of FLEx custom fields.

Creating Custom Fields in WeSay 0.9.28.0 for Fieldworks 7.0.5~beta5

I’ve been working with custom fields in FLEx and WeSay enough to feel the need to figure out what is really going on. The goal is to be able to straightforwardly create custom fields in one or the the other that are editable and round-trip-able in the other. To do this, I’m going to look into the interface of each program, and see what impact adding fields has on the LIFT (and config, for WeSay) file. Today I’m making a field in WeSay, and seeing what it looks like there, and then in FLEx.

The WeSay Configuration Tool

The WeSay config tool looks like this (once you click on ‘Fields’ then ‘New Field’):

Once you save and exit, you get a section under the <fields> node in the WeSayConfig file that looks like this:

<field>
<className>LexEntry</className>
<dataType>MultiText</dataType>
<displayName>*newField</displayName>
<enabled>True</enabled>
<fieldName>newField</fieldName>
<multiParagraph>False</multiParagraph>
<spellCheckingEnabled>False</spellCheckingEnabled>
<multiplicity>ZeroOr1</multiplicity>
<optionsListFile></optionsListFile>
<visibility>Visible</visibility>
<writingSystems>
<id>en</id>
<id>fr</id>
<id>hav</id>
</writingSystems>
</field>

Adding Data in WeSay

Returning to WeSay, one can add some bogus info to this field in one of the records:

Closing out WeSay and looking at the LIFT file, we see the following under this entry (between <lexical-unit> and the first <sense>):

<field type=”newField”>
<form lang=”fr”>
<text>BogusNewfield</text>
</form>
</field>

What this Means

Putting this all together, we see that

  1. The ‘Name in file’ from the WeSay Config Tool corresponds to the field/fieldName node in the WeSayConfig file.
  2. Both of the above correspond to the LIFT entry/field ‘type’ attribute (once data is entered):
    ‘Name in file’ = (xyz.WeSayConfig)/configuration/components/viewTemplate/fields/field/fieldName = (xyz.lift)/lift/entry/field/@type
  3. ‘Name for display’ from the WeSay Config Tool is the label the WeSay user sees on the field, which corresponds to the contents of the field/displayName node, i.e., (.WeSayConfig)/configuration/components/viewTemplate/fields/field/displayName
  4. Therefore, the name a WeSay user sees for a field will not necessarily relate to anything in FLEx. This is because the WeSay label is related to the proper LIFT field in the WeSayConfig file (which FLEx doesn’t see), and not in the LIFT file, which is what FLEx imports. So in setting up custom fields, we need to pay attention to what the config tool says for the ‘Name in file’, not the ‘Name for display’ (Note that it is ‘*newField,’ and not ‘newField,’ in the WeSay user interface. The asterisk, which is visible in WeSay, is only present in displayName in the WeSayConfig, not in either of fieldName from the WeSayConfig or field/@type from the LIFT file.)

Importing to FLEx

I was happy to see that the field created in WeSay shows up under FLEx custom fields (after importing the WeSay LIFT file):

Note that Location, Type, and Writing System(s) are all grayed out. There may be some way of modifying these settings in FLEx once they have been set in WeSay, but isn’t obvious at first glance. Here is the field in the lexicon editor:

I had to select ‘Show Hidden Fields’ to be able to see it the first time for some reason. But then I deselected it, and the field remained visible.
Note that the label in FLEx is ‘newField,’ without the asterisk, which comes from the type attribute of the field in the LIFT file. As far as I can see, there is no Distinction between file and display names in FLEx. This is appropriate for at least the following two reasons:

  1. FLEx seems to deal fine with spaces in field names (I’ve had problems with this in WeSay).
  2. FLEx users should be able to handle whatever complexity the field names throw at them. WeSay, on the other hand, needs to control carefully what the user sees, and it’s relationship to the LIFT field in question. For instance, the form in lexical-unit in a lift file is displayed as “Word” by default in WeSay, since people are putting words into it. But when I analyze those words into roots, it is nice to be able to change that field’s display name to “Root” in WeSay, without having to change the underlying LIFT structure. This flexibility of the display name can help keep the WeSay user from getting confused without unnecessarily complicating the database.

Notes for Creating fields in WeSay to be imported to FLEx

  1. Pay attention to ‘Name in file’ in the WeSay Config Tool, since that will be what the field will be called in the LIFT file, and in FLEx (and presumably in other programs that would use LIFT).
  2. You may need to click on ‘Show Hidden Fields’ to see the field in FLEx.
  3. There doesn’t seem to be a way to put fields anywhere than in the ‘Custom Fields’ section of FLEx, so I hope that’s where you want it (if not, stay tuned for the next installment, going the other way).

Getting Fieldworks lexical data into XLingpaper

I’ve written before about using WeSay to collect language data, and Wesay lift files can be fairly easily imported into Fieldworks Language Explorer (FLEx) for analysis. Recently, I’ve been working on getting data from the FLEx lexicon into XLingpaper, to facilitate the writing of reports and papers than can be full of data (which is the way I like them…:-)).
I start with a lexicon (basically just a word list) in flex, that has been parsed for root forms (go through noun class categorization and obligatory morphology with a speaker of the language). Figure out the canonical root syllable profile (e.g., around here, usually CVCV), and look for complimentary distribution and contrast within that type, both (though separately) for nouns and verbs.
I have a script that is putting out regular expressions based on what graphs we expect to use (those in Swahili, plus those we have added to orthographies in the past –since we start with data encoded in the Swahili/Lingala orthography, this covers most of the data that we work with). This script puts out expressions like

^bu([mn]{0,1})([[ptjfvmlryh]|[bdgkcsznw][hpby]{0,1}])([aiɨuʉeɛoɔʌ]{1,2})([́̀̌̂]{0,1})$

which means that the whole word/lexeme form (between ^ and $) is just b, u, some consonant (one of m or n, or not, then a consonant letter that appears alone, or the first and second of a digraph), then some vowel (any of ten basic ones, long or short, plus diacritics, or not). In other notation, It is giving buCV, or canonical structures with root initial [bu]. This data is paired with data from other regular expression filters giving [b] before other vowels to show a complete distribution of [b] before all vowels (presumably…).

The script puts out another expression,

^([mn]{0,1})([[ptjfvmlryh]|[bdkgcsznw][hpby]{0,1}])(a)([mn]{0,1})([[ptjfvmlryh]|[bdkgcsznw][hpby]{0,1}])3([́̀̌̂]{0,1})$

which gives me CaCa, as in the following screenshot:

(The 3 refers to the third set of parentheses, (a), so that changing (a) to (i) gives you CiCi.) The data from these filters gives evidence of the independent identity of a vowel, as opposed to vowels created through harmony rules.
So these regular expressions allow filtering of data in the FLEx lexicon to show just the data that you I need to prove a particular point you’re trying to make (in my case, why just these letters should be in the alphabet). But then, how to get the data out of FLEx, and into a document you’re writing?
FLEx has a number of export options out of the box, but none of them seem designed for outputting words with their glosses, based on a particular filter/sort of the lexicon. In particular, I’m looking for export into a format that can be validated against an XLingpaper DTD, since I use XLingpaper XML for most of my writing, both for archivability and longevity of my data, as well as for cross-compatibility in differing environments (there are also developed stylesheets to make XLingpaper docs into html, pdf, and usually word processor docs, too). The basic XML export of the data on the above sort starts like this:

<?xml version=”1.0″ encoding=”utf-8″?>
<ExportedDictionary>
<LexEntry id=”hvo16380″>
<LexEntry_HeadWord>
<AStr ws=”gey”>
<Run ws=”gey”>bana</Run>
</AStr>
</LexEntry_HeadWord>
<LexEntry_Senses>
<LexSense number=”1″ id=”hvo16382″>
<MoMorphSynAnalysisLink_MLPartOfSpeech>
<AStr ws=”en”>
<Run ws=”en”>num</Run>
</AStr>
</MoMorphSynAnalysisLink_MLPartOfSpeech>
<LexSense_Definition>
<AStr ws=”en”>
<Run ws=”en”>four (4)</Run>
</AStr>
</LexSense_Definition>
<LexSense_Definition>
<AStr ws=”fr”>
<Run ws=”fr”>quatre (4)s</Run>
</AStr>
</LexSense_Definition>
<LexSense_Definition>
<AStr ws=”swh”>
<Run ws=”swh”>nne</Run>
</AStr>
</LexSense_Definition>
<LexSense_Definition>
<AStr ws=”pt”>
<Run ws=”pt”>quatro (4)</Run>
</AStr>
</LexSense_Definition>
<LexSense_Definition>
<AStr ws=”es”>
<Run ws=”es”>cuatro</Run>
</AStr>
</LexSense_Definition>
</LexSense>
</LexEntry_Senses>
</LexEntry>
<LexEntry id=”hvo11542″>
<LexEntry_HeadWord>… and so on…

But this is way more information than I need (I got most of these glosses for free using the CAWL to elicit the data), and in the wrong form. The cool thing about XML is that you can take structured information and put in in another structure/form, to get the form you need. To do this, I needed to look (again) and xsl, the extensible stylesheet language, which had succesfully intimidated me a number of times already. But with a little time, energy, and despration, I got a working stylesheet. And with some help from Andy Black, I made it simpler and more straightforward, so that it looks like XLPMultipleFormGlosses.xsl looks today. Put it, and an xml file describing it, into /usr/share/fieldworks/Language Explorer/Export Templates, and this is now a new export process from within FLEx. To see the power of this stylesheet, the above data is now exported from FLEx as

<?xml version=”1.0″ encoding=”utf-8″?>
<!DOCTYPE xlingpaper PUBLIC “-//XMLmind//DTD XLingPap//EN” “XLingPap.dtd”>
<xlingpaper version=”2.8.0″>
<lingPaper>
<section1 id=”DontCopyThisSection”>
<secTitle>Click the [+] on the left, copy the example that appears below, then paste it into your XLingpaper document wherever an example in allowed.</secTitle>
<example num=”examples-hvo16380″>
<listWord letter=”hvo16380″>
<langData lang=”gey”>bana</langData>
<gloss lang=”en”>four (4)</gloss>
<gloss lang=”fr”>quatre (4)s</gloss>
</listWord>
<listWord letter=”hvo11542″>

which contains just enough header/footer to validate against the XLingpaper DTD (so you don’t get errors opening it in XMLmind), and the word forms and just the glosses I want (English and French, but that is easily customizable in the stylesheet). The example node can be copied and pasted into an existing XLingpaper document, which then can eventually be transformed into other formats, like the pdf from this screenshot:

which I think is a pretty cool thing to be able to do, and an advance for the documentation of languages we are working with.