1 # Symbols, Tables and Catalogs
8 +----+----+-------------------------------+
11 | +--->| SymbolTable | |
12 | | +---+---------+---------------+ |
14 | | | | +-------------------+ |
15 | | | | | Symbol (ID, Text) | |
16 | | | +---->| Symbol (ID, Text) | |
18 | | +---------+-------------------+ |
21 | +--->| SymbolTable | |
22 | | +---+---------+---------------+ |
24 | . | | +-------------------+ |
25 | . | +---->| Symbol (ID, Text) | |
27 | . +---------+-------------------+ |
29 +-----------------------------------------+
34 The Catalog holds a collection of ion\Symbol\Table instances queried from ion\Reader and ion\Writer instances.
36 See also [the ION spec's symbol guide chapter on catalogs](https://amzn.github.io/ion-docs/docs/symbols.html#the-catalog).
40 $catalog = new ion\Catalog;
41 $symtab = ion\Symbol\PHP::asTable();
42 $catalog->add($symtab);
48 There are three types of symbol tables:
52 - System (a special shared symbol table)
54 Local symbol tables do not have names, while shared symbol tables require them; only shared symbol tables may be added to a catalog or to a writer’s list of imports.
56 Local symbol tables are managed internally by Ion readers and writers. No application configuration is required to tell Ion readers or writers that local symbol tables should be used.
58 ### Using a shared symbol table
60 Using local symbol tables requires the local symbol table (including all of its symbols) to be written at the beginning of the value stream. Consider an Ion stream that represents CSV data with many columns. Although local symbol tables will optimize writing and reading each value, including the entire symbol table itself in the value stream adds overhead that increases with the number of columns.
62 If it is feasible for the writers and readers of the stream to agree on a pre-defined shared symbol table, this overhead can be reduced.
64 Consider the following CSV in a file called `test.csv`.
74 An application that wishes to convert this data into the Ion format can generate a symbol table containing the column names. This reduces encoding size and improves read efficiency.
76 Consider the following shared symbol table that declares the column names of `test.csv` as symbols. Note that the shared symbol table may have been generated by hand or programmatically.
79 $ion_shared_symbol_table::{
80 name: "test.csv.columns",
82 symbols: ["id", "type", "state"],
86 This shared symbol table can be stored in a file (or in a database, etc.) to be resurrected into a symbol table at runtime.
88 Because the value stream written using the shared symbol table does not contain the symbol mappings, a reader of the stream needs to access the shared symbol table using a catalog.
90 Consider the following complete example:
96 * Representing a CSV row
99 public function __construct(
100 public readonly int $id,
101 public readonly string $type,
102 public readonly bool $state = true
106 /* Fetch the shared symbol table from file, db, etc. */
107 $symtab = ion\unserialize(<<<'SymbolTable'
108 $ion_shared_symbol_table::{
109 name: "test.csv.columns",
111 symbols: ["id", "type", "state"],
116 /* Add the shared symbol table to a catalog */
117 $catalog = new ion\Catalog;
118 $catalog->add($symtab);
120 /* Use the catalog when writing the data */
121 $writer = new class(options: new ion\Writer\Options(
124 )) extends ion\Writer\Buffer\Writer {
125 public function writeRow(Row $row) : void {
126 $this->startContainer(ion\Type::Struct);
128 $this->writeFieldname("id");
129 $this->writeInt($row->id);
131 $this->writeFieldName("type");
132 $this->writeString($row->type);
134 $this->writeFieldName("state");
135 $this->writeBool($row->state);
137 $this->finishContainer();
141 $writer->writeRow(new Row(1, "foo", false));
142 $writer->writeRow(new Row(2, "bar"));
143 $writer->writeRow(new Row(3, "baz"));
149 Let's inspect the binary ION stream and verify that the column names are actually replaced by SymbolIDs:
154 foreach (str_split($writer->getBuffer(), 8) as $line) {
155 printf("%-26s", chunk_split(bin2hex($line), 2, " "));
156 foreach (str_split($line) as $byte) {
157 echo $byte >= ' ' && $byte <= '~' ? $byte : ".";
164 e0 01 00 ea ee a2 81 83 ........ \
165 de 9e 86 be 9b de 99 84 ........ |
166 8e 90 74 65 73 74 2e 63 ..test.c > here's ION symbol table metadata
167 73 76 2e 63 6f 6c 75 6d sv.colum |
168 6e 73 85 21 01 88 21 03 ns.!..!. <
169 da 8a 21 01 8b 83 66 6f ..!...fo |
170 6f 8c 11 da 8a 21 02 8b o....!.. > here's the actual data
171 83 62 61 72 8c 11 da 8a .bar.... |
172 21 03 8b 83 62 61 7a 8c !...baz. /
179 When unserializing without knowing the used symbols, our column name will actually be just symbol IDs `$<SID>`:
184 var_dump(ion\unserialize($writer->getBuffer(), [
185 "multiSequence" => true,
223 When unserializing with known symbols, the symbol IDs will be resolved when using the catatalog with the appropriate symbol tables:
228 $reader = new \ion\Reader\Buffer\Reader($writer->getBuffer(), [
229 "catalog" => $catalog
231 $unser = new ion\Unserializer\Unserializer(multiSequence: true);
232 var_dump($unser->unserialize($reader));