Parsing .ovr

shenyihong

Maintainer of ovr2shp

Hierarchical File Architecture(HFA) is Hexagon Geospatial's proprietary file format. It is the file format behind ERDAS Imagine .img files and ERDAS Imagine Annotation Layer .ovr files. It is not exactly the most "open" file standard out there, but luckily GDAL implements a raster driver for.img files, which offers an glimpse into the internals of HFA.

Now that we know that a open-sourced driver that can parse a file with the HFA format exists, we can use that as our starting point.

My Goal: Convert Erdas Imagine Annotation Layer(.ovr) to ShapeFile(.shp)

The problem, however, is that GDAL is bulky and I have limited experience dealing with big c++ projects. Since we would only need specific files to parse HFA compliant files, I went digging for something more lightweight. I managed to find a web archive page that contained a early version of a .img to .tif converter. You can find the page here.

This page also turns out to be the holy grail as it also referenced a detailed documentation of .img file format and the internals of HFA

img2tif#

The source code included drivers for HFA and GeoTIFF, but there is just one caveat: The HFA driver included is not a general implementation but rather one that is specifically made for .img. It is the same case for the HFA driver in GDAL.

Fortunately, it has all the low-level implementations included to parse raw bytes into a generic higher level HFA data classes.

+---hfa
| hfa.h
| hfaband.cpp # class to represent a raster band in a .img
| hfacompress.cpp
| hfadictionary.cpp
| hfaentry.cpp
| hfafield.cpp
| hfaopen.cpp
| hfatype.cpp
| hfa_p.h

What's left for me is to fill in the gaps by implementing high level representations of data types found in a .ovr HFA structure. With a few changes, I was able to integrate this driver to successfully parse a .ovr file.

Hierarchical File Architecture (HFA)#

Hidden within the detailed documentation lies the secrets to HFA.

The hierarchical file architecture maintains an object−oriented representation of data in an ERDAS IMAGINE disk file through use of a tree structure. Each object is called an entry and occupies one node in the tree. Each object has a name and a type. The type refers to a description of the data contained by that object. Additionally each object may contain a pointer to a subtree of more nodes.

HFA File Structure

Header Tag#

  • dtype: Ehfa_HeaderTag
  • size: 20b
    • First 16b contains the unique signature of a ERDAS IMAGE HFA File: EHFA_HEADER_TAG
    • Remaining 4b contains the file pointer to the header record

The header tag does not correspond to the header node shown in the diagram above! This is just a tag that contains a reference to the header node

The value of a file pointer is simply the number of bytes from the start of the file

/* Referenced from HFA Driver */
// Parsing the first 16 bytes to verify if it is a HFA file
if( VSIFReadL( szHeader, 16, 1, fp ) < 1 )
{
CPLError( CE_Failure, CPLE_AppDefined,
"Attempt to read 16 byte header failed for\n%s.",
pszFilename );
return NULL;
}
if( !EQUALN(szHeader,"EHFA_HEADER_TAG",15) )
{
CPLError( CE_Failure, CPLE_AppDefined,
"File %s is not an Imagine HFA file ... header wrong.",
pszFilename );
return NULL;
}
// Parsing the subsequent 4 bytes to get header record/node
VSIFReadL( &nHeaderPos, sizeof(GInt32), 1, fp );
HFAStandard( 4, &nHeaderPos );
VSIFSeekL( fp, nHeaderPos, SEEK_SET );

Header Record#

  • dtype: Ehfa_File
  • The header record contains file pointers to the Root node of the HFA Tree and the MIF (Machine Independent Format) Dictionary.
    • MIF Dictionary stores all the type information for each kind of node in the HFA Tree.

MIF Dictionary#

  • dtype: char*
  • The MIF Dictionary contains different type information for different nodes in the HFA Tree. Hence, the dictionary must be read and decoded before any of the other objects in the file can be decoded
# Sample dtypes seen in MIF Dictionary
{1:lversion,1:LfreeList,1:LrootEntryPtr,1:sentryHeaderLength,1:LdictionaryPtr,}Ehfa_File,
{1:Lnext,1:Lprev,1:Lparent,1:Lchild,1:Ldata,1:ldataSize,64:cname,32:ctype,1:tmodTime,}Ehfa_Entry,
{16:clabel,1:LheaderPtr,}Ehfa_HeaderTag,

Type information is encoded like such:

{<num_element>:<dtype_char><attr_name>}<dtype_identifier>
  • num_element: Number of dtype_char element

    • Example: To encode the Header Tag in the Ehfa_HeaderTag data type, each of the first 16b has a char dtype occupying 1b of space. To encode the entire tag 16 char is needed, hence {16:clabel}
  • dtype_char: MIF data type identifier

    MIF ItemType

  • attr_name: Name of attribute under type

  • dtype_identifier: data type of node

Root Node#

  • dtype: Ehfa_Entry

  • The root node is the entry point into the main HFA tree structure. By traversing down the tree from the root, data nodes (where the "gold" is at) can extracted

  • A Ehfa_Entry dtype contains file pointers to the next node and child node. We use these pointers to traverse down the HFA Tree.

    • Next Node: The node "right" of the current node on the same level
    • Child Node: "Left-Most" child node

Ehfa_Entry#

Ehfa_Entry is data type representing the nodes of the HFA tree structure. A single Ehfa_Entry contains "tree" level attributes (file ptr to child node, name of node etc.) and a file pointer reference to the data block.

To be put it simply, there are 2 different sections to a Ehfa_Entry

To dive deeper into the 2 sections, let's look at the type definition of Ehfa_Entry found in the MIF Dictionary

{1:Lnext,1:Lprev,1:Lparent,1:Lchild,1:Ldata,1:ldataSize,64:cname,32:ctype,1:tmodTime,}Ehfa_Entry,

  • Base section

    • Tree level attributes/metadata (attributes in the type definition above are all tree level attributes)

      Tree level attributes can be interpreted as meta attributes that allow traversal of the tree structure without ever requiring to touch the data residing in it

  • Data section

    • Data within the node (annotation name, coordinates etc.)
    • The data block can be accessed using the file pointer value found in the data attribute

      {1:Ldata} contains the file pointer to the data node

    • The data that resides in the node has a corresponding data type that can be found in the MIF Dictionary. The data type is identifiable by the type attribute seen above.

      {32:ctype} contains the data type identifier

In order to extract the data we are interested in, we will seek the specified file position in the data attribute and parse the bytes into the higher level data structure specified in type attr.

The following are examples of data type identifiers that can be found in a Ehfa_Entry:

  • Rectangle2: Contains geometric definitions of ERDAS Rectangle Annotations

    {1:Lflags,1:lfillStyle,1:*oEevg_Coord,center,1:dwidth,1:dheight,1:dorientation,}Rectangle2
  • Eprj_MapInfo: Contains image map coordinates

    {0:pcproName,1:*oEprj_Coordinate,upperLeftCenter,1:*oEprj_Coordinate,lowerRightCenter,1:*oEprj_Size,pixelSize,0:pcunits,}Eprj_MapInfo
  • Eprj_ProParameters: Contains projection parameters

    {1:e2:EPRJ_INTERNAL,EPRJ_EXTERNAL,proType,1:lproNumber,0:pcproExeName,0:pcproName,1:lproZone,0:pdproParams,1:*oEprj_Spheroid,proSpheroid,}Eprj_ProParameters
  • Eprj_Datum: Contains datum information

    {0:pcdatumname,1:e3:EPRJ_DATUM_PARAMETRIC,EPRJ_DATUM_GRID,EPRJ_DATUM_REGRESSION,type,0:pdparams,0:pcgridname,}Eprj_Datum