Translate a Notion page using Azure Cognitive Services Translator (Part 1)

In part 1 of this tutorial, we’ll start by translating the title of a Notion page using the Azure Cognitive Services Translator API

In this tutorial, we’ll work towards a non-trivial sample of linking Notion with another API, the Azure Cognitive Services Translator API, and create a script that can automatically translate a Notion page.

In Part 1 (this post), we will set up a script to print out the translated title. In the second part, we’ll expand this and translate page content into a new page. Part 2 covers translating some common block types into a new page.

Setup #

You’ll need to create a Microsoft Azure account for this tutorial if you don’t already have one. This may require a credit card, but this tutorial can be completed with free resources.

Set up Python Environment #

  1. Create a folder for your code.
  2. It’s good practice to create separate Python environments for different projects. Instructions below use venv but you can use your preferred approach.
    1. Create: python3 -m venv .venv to make a venv in the folder .venv (or python -m venv .venv depending on your system).
    2. Activate: source .venv\bin\activate (Linux or Mac), .venv\scripts\activate (Windows)
  3. Install requests and python-dotenv
    1. pip install requests python-dotenv
  4. Create a .env file. This tutorial will use some environment variables to store API keys securely, and we use python-dotenv to load them, but if you prefer, you can set environment variables directly. The .env file should not be shared as it stores secret data (put it in your .gitignore if you’re making a git repository).

Create Microsoft Translator Resource #

  1. Create a Translator resource in the Azure Portal. Click on the link, or use the Search bar to search for Translator.
  2. Select a valid subscription (Yours will probably be Pay as You Go), and create a new resource group - you can call it something like translator-resources.
  3. Select a region near you.
  4. Choose a unique name. I used notion-translate-.
  5. Choose the “Free F0” pricing tier to avoid being charged. If this isn’t an option, try a different region.

Screenshot of Create Translator page in Azuure

Once you create the resource, it will take around 30s or so to complete, and then you’ll get the complete screen. You can hit the “Go to resource” button from here, or, you can find it via the Search bar at the top.

Screenshot of deployment complete page

Once you’ve created the resource, you’ll need to go to the Keys and Endpoint tab. From there, copy one of the keys and set the key and region in your environment or a .env file as COG_SERVICE_KEY and COG_SERVICE_REGION.

COG_SERVICE_KEY=c42...
COG_SERVICE_REGION=eastus

Screenshot of Keys and Endpoint page

Create Notion Integration #

  1. Create a new Notion integration

    1. for more information on setting up an integration and connecting a page, you can check out

      Getting Started with the Notion API Using Python

  2. Add the Internal Integration Token to your .env file as NOTION_KEY. Your .env file should now look something like:

    COG_SERVICE_KEY=a12...
    COG_SERVICE_REGION=eastus
    NOTION_KEY=secret_AB1...
    
  3. Create a test page and connect it to the integration.

  4. Optionally, you can create a separate parent page for the translated output as well (otherwise it will use the original page.

Code Breakdown #

In order to translate a page title from Notion, there’s 3 main steps:

  1. Get the text of the title from Notion
  2. Send the text to the translation API and parse the response
  3. Do something with the translated title (in this case, we’ll just print it out).

This code is organized into 3 classes: NotionClient handles requests to the Notion API, along with turning the responses into convenient formats for other classes, TranslatorClient handles requests to the Microsoft Translator API, and NotionTranslator glues these two pieces together. There’s then a main function which glues those bits together, and a bit of code that handles running things as a script. With this structure, the TranslatorClient class knows nothing about Notion and the NotionClient class knows nothing about translation, and it should be fairly straightforward to substitute another translation API or use the translation and Notion clients for other purposes in a larger application.

NotionClient Class #

As mentioned, NotionClient will handle requests to the API. In the __init__ method we set up our standard headers, and set up a session we’ll use for all requests for this class. Re-using sessions like this is good for performance. Notion versions it’s API via the Notion-Version header, and authorizes it using the Authorization header with a bearer token.

class NotionClient():
def __init__(self, notion_key):
self.notion_key = notion_key
self.default_headers = {'Authorization': f"Bearer {self.notion_key}",
'Content-Type': 'application/json', 'Notion-Version': '2022-06-28'}
self.session = requests.Session()
self.session.headers.update(self.default_headers)

To keep things simple for Part 1, the only API call we need is to get the page title. The title is a property (that all pages have), so we can use the properties endpoint to get it. The title has the id title. get_property is a method which could be used to retrieve any kind of property from any page.


def get_property(self, page_id, property_id):
url = f"https://api.notion.com/v1/pages/{page_id}/properties/{property_id}"
response = self.session.get(url)
return response.json()

Different properties return different kinds of objects as a response. Title is it’s own kind of property, so I’ve made a method to call the API and get the plain text out of the title property.

def get_title_text(self, page_id):
title_property = self.get_property(page_id, "title")["results"][0]
return title_property["title"]["plain_text"]

So with this version of the NotionClient, we can conveniently get the title of a page as text, or, retrieve the response for any kind of page property.

TranslatorClient Class #

The TranslatorClient handles the requests to the Translator API. In the __init__ method, we set up the authentication headers for the Translate API, and set up a re-usable session. Microsoft uses the oddly named Ocp-Apim-Subscription-Key header for it’s API key. Ocp-Apim-Subscription-Region is optional if using the Translator resource, but including it lets this code also work with a multi-service Cognitive Services resource.

class TranslatorClient():
def __init__(self, cog_service_key, cog_service_region):
self.cog_key = cog_service_key
self.cog_region = cog_service_region
self.translator_endpoint = 'https://api.cognitive.microsofttranslator.com'
self.default_headers = {
'Ocp-Apim-Subscription-Key': self.cog_key,
'Ocp-Apim-Subscription-Region': self.cog_region,
'Content-type': 'application/json'
}

self.session = requests.Session()
self.session.headers.update(self.default_headers)

The translate method takes the text to translate and calls the API to translate from the source language to the target language, and then parses the response to return just the translated text.

def translate(self, text, source_language, target_language):
url = self.translator_endpoint + '/translate'

# Specify Query Parameters
params = {
'api-version': '3.0', # Required
'from': source_language, # Optional, will auto-detect in most cases
'to': target_language # Required.
}

body = [{
'text': text
}]

# Send the request and get response
request = self.session.post(url, params=params, json=body)

# Parse the JSON Response
response = request.json()
translation = response[0]["translations"][0]["text"]

# Return the translation
return translation

This API is versioned using a query parameter, api-version, not a header like the Notion API. The to language is also required as a query parameter, but from is optional though if you know what it is it’s best to include it rather than rely on auto-detection.

The response format is a little tricky for a simple request of translating one chunk of text. It looks something like this:

[
{
"translations":[
{"text":"Bonjour!","to":"fr"}
]
}
]

There’s an outer array of objects, which have an inner array of translations. That’s because the API can translate multiple pieces of text (the array in the body) into multiple target languages all in one request - the to parameter can take an array of target languages. But in this case, we’re just translating one piece of text into one language, so response[0] gets us the translations for the first piece of text, while ["translations"][0]["text"] gets us the text of the first translation.

NotionTranslator Class #

This is the class that handles the logic that goes between Notion and the Translation API. We initialize it with a NotionClient and TranslatorClient, along with our source and target languages. If we wanted to translate between several language pairs, we could create multiple instances of NotionTranslator with the same NotionClient and TranslatorClient.

class NotionTranslator():
def __init__(self, notion_client, translate_client, source_language, target_language):
self.notion_client = notion_client
self.translate_client = translate_client
self.source_language = source_language
self.target_language = target_language

Since we’ve done most of the hard work already in the client classes, translating the title is then just a couple lines of code to get the title using the NotionClient, and then translating it using the TranslateClient.

def translate_title(self, source_page_id):
title = self.notion_client.get_title_text(source_page_id)

translated_title = self.translate_client.translate(
title, self.source_language, self.target_language)

return translated_title

Putting it all together #

The last step is to set everything up together and call it as a script with some arguments.

The main function loads the settings from environment variables, creates instances of the classes to do the translation and prints out the translated title.

def main(notion_page_id, source_language="en", target_language="fr"):
notion_client = NotionClient(os.getenv('NOTION_KEY'))

translate_client = TranslatorClient(
os.getenv('COG_SERVICE_KEY'), os.getenv('COG_SERVICE_REGION'))

translator = NotionTranslator(notion_client, translate_client, source_language, target_language)

translated_title = translator.translate_title(notion_page_id)

print(translated_title)

Finally, there’s the bit of code to do argument parsing when we’re running it as a script. We use load_dotenv to load the variables from the .env file, in this case overriding any that also exist in the system environment, and parse arguments from the command line using argparse to make it a nice flexible script with a bit of documentation, and finally call the main function from above.

If you’re new to Python, **if** __name__ **==**"__main__": is a funny bit of Python that basically means “if this is running as a script”. __name__ is the name of the module running the code, which, if we’re running as a script is __main__. So this code will run only when it’s running as a script, not if the classes are loaded as a module.

if __name__ == "__main__":
import argparse
load_dotenv(override=True)

parser = argparse.ArgumentParser(
description="Translate a Notion page's title. Supported language codes are listed at https://learn.microsoft.com/en-us/azure/cognitive-services/translator/language-support")
parser.add_argument('page_id', type=str,
help='A Notion page ID to translate')
parser.add_argument('target',
help='language code to translate the page to')
parser.add_argument('--source', default="en",
help="language code for the original language of the page")

args = parser.parse_args()

main(args.page_id, args.source, args.target)

Running the script #

To run the script, you’ll need an ID of a page that’s shared with your integration, and the language code for a supported language to translate into. Some examples of language codes are Most languages use a 2 character code, like en (English), fr (French), es (Spanish), de (German), or ar (Arabic), but some use different codes, like fr-ca for French (Canada), fil for Filipino, and zh-Hans for Chinese simplified. You can get the ID of a page from the URL, https://www.notion.so/Translation-Demo-f4be3aa4fe9c45989e44067effbbc7f9 has the id f4be3aa4fe9c45989e44067effbbc7f9.

python translate-notion-title.py f4be3aa4fe9c45989e44067effbbc7f9 fr

And you should get the translation of the title - my page is Translation Demo, and I get “DĂ©mo de traduction” in French.

You should also be able to see the documentation we set up using argparse by calling

python translate-notion-title.py --help

Full Code #

The full code is available as a Github Gist. Make sure to copy the correct values into the .env file!

Additional References #

Microsoft’s Quickstart