# Page Metadata

The SDK automatically collects page metadata from the DOM on every `track()` call. By default it reads from `application/ld+json` structured data embedded in the page.

## Default Field Mappings[​](#default-field-mappings "Direct link to Default Field Mappings")

| SDK Field      | JSON-LD Key           | Notes                                       |
| -------------- | --------------------- | ------------------------------------------- |
| `page_title`   | `headline` or `name`  | Falls back to `document.title` if not found |
| `content_type` | `@type`               |                                             |
| `content_id`   | `identifier` or `@id` | Tries `identifier` first                    |
| `publisher`    | `publisher.name`      | Nested object path                          |
| `object_type`  | `@type`               |                                             |
| `object_id`    | `identifier` or `@id` | Tries `identifier` first                    |
| `context`      | `articleSection`      |                                             |

`url` and `referer` are always collected automatically from `window.location.href` and `document.referrer`.

## JSON-LD Examples[​](#json-ld-examples "Direct link to JSON-LD Examples")

### NewsArticle[​](#newsarticle "Direct link to NewsArticle")

For article pages, include a `NewsArticle` block with all relevant fields:

```html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "headline": "City Council Approves New Housing Plan",
  "identifier": "story-12345",
  "publisher": {
    "@type": "Organization",
    "name": "Example News"
  },
  "articleSection": "Local Government"
}
</script>

```

### WebPage[​](#webpage "Direct link to WebPage")

For non-article pages, use a `WebPage` block:

```html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebPage",
  "name": "About Us",
  "@id": "https://example.com/about",
  "publisher": {
    "@type": "Organization",
    "name": "Example News"
  }
}
</script>

```

Note that for `WebPage`, the `context` field will be `null` since there is no `articleSection`.

### @graph format[​](#graph-format "Direct link to @graph format")

Some SEO plugins (such as Yoast SEO) emit a single JSON-LD block using the `@graph` format rather than one block per type:

```html
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "NewsArticle",
      "headline": "City Council Approves New Housing Plan",
      "identifier": "story-12345",
      "publisher": { "@type": "Organization", "name": "Example News" },
      "articleSection": "Local Government"
    },
    {
      "@type": "WebSite",
      "name": "Example News",
      "url": "https://example.com/"
    }
  ]
}
</script>

```

The SDK handles this automatically — it searches within `@graph` and applies the same type priority to pick the best item.

**Type priority:** If multiple JSON-LD blocks (or items within `@graph`) exist on the page, the SDK picks the one with the highest priority: `NewsArticle` > `Article` > `WebPage` > any other type.

## Overriding Defaults[​](#overriding-defaults "Direct link to Overriding Defaults")

Any field mapping can be overridden via the `pageData` property of `window.__allegro`, set before `client.js` loads. This is useful for publishers that use a GTM data layer or other data sources instead of JSON-LD.

### Data layer source[​](#data-layer-source "Direct link to Data layer source")

Publishers using a GTM data layer (`<meta name="gtm-dataLayer" content='{"key":"value"}'>`) can map fields like this:

```html
<script>
window.__allegro = {
    pageData: {
        page_title:   { source: 'dataLayer', key: 'gtmStoryTitle' },
        content_type: { source: 'dataLayer', key: 'gtmPageType' },
        content_id:   { source: 'dataLayer', key: 'gtmBspStoryId' },
        publisher:    { source: 'dataLayer', key: 'gtmSiteName' },
        object_type:  { source: 'dataLayer', key: 'gtmPageType' },
        object_id:    { source: 'dataLayer', key: 'gtmBspStoryId' },
        context:      { source: 'dataLayer', key: 'gtmCategory' },
    },
};
</script>
<script src="https://your-allegro-instance.com/client.js"></script>

```

### Meta tag source[​](#meta-tag-source "Direct link to Meta tag source")

Individual fields can also be pulled from `<meta>` tags:

```html
<script>
window.__allegro = {
    pageData: {
        content_id: { source: 'meta', key: 'article:id' },
        publisher:  { source: 'meta', key: 'og:site_name' },
    },
};
</script>
<script src="https://your-allegro-instance.com/client.js"></script>

```

## All Source Types[​](#all-source-types "Direct link to All Source Types")

| Source      | How it reads                                | `key` format                                                                   |
| ----------- | ------------------------------------------- | ------------------------------------------------------------------------------ |
| `jsonLd`    | `<script type="application/ld+json">`       | Dot-path like `publisher.name`; pipe-separated fallbacks like `headline\|name` |
| `meta`      | `<meta name="..." content="...">`           | The `name` attribute value                                                     |
| `dataLayer` | `<meta name="gtm-dataLayer" content='...'>` | Property name in the JSON object                                               |
| `selector`  | CSS selector → `textContent`                | Any valid CSS selector                                                         |
| `attribute` | CSS selector → element attribute            | `selector@attribute` e.g. `body@data-publisher`                                |

## Per-Call Overrides[​](#per-call-overrides "Direct link to Per-Call Overrides")

Any field can also be passed directly to `track()`. Per-call values take precedence over auto-collected page data:

```js
allegro.track('article_read', {
  content_id: '12345',
  content_type: 'article',
});

```
