Facebook Insights


Description

Extractor for fetching data from Facebook Graph API.
- Documentation: https://developers.facebook.com/docs/reference/api/
- Graph API Explorer: https://developers.facebook.com/tools/explorer
- each Graph API call can be performed in browser as standard GET request with url in form:https://graph.facebook.com/{query}?access_token= with optional parameters &since= and &until= for Insights and list type queries, eg: https://graph.facebook.com/59663463879/posts?access_token=XXX&since=2012-12-02&until=2012-12-20

SLA

The service and it's API is provided free of charge and offers no warranty nor guaranteed support.

QA

Token Retrieve

Facebook access token is required for access to Facebook API. 
Visit https://syrup.keboola.com/ex-facebook/token for token retrieve. You need to copy & paste the token and give it to the consultant.

API

https://syrup.keboola.com/ex-facebook/run

configurationId (optional) table configuration specification
since (optional) start date for data fetching, if not set then it starts from last timestamp attribute (set automatically to each data table by extractor) or -14 days by default
until (optional) end date for data fetching, today by default
accountsOffset (optional) offset of accounts configuration table - to extract a subset from accounts
accountsLimit (optional) limit of accounts configuration table - to extract a subset from accounts
- each request must be accompanied by Storage API token in header X-StorageApi-Token
- accepts X-StorageApi-Url header
- accepts X-KBC-RunId header
- example of parameters definition:

{"since":"2012-12-01","until":"2012-12-03"}

Extractor Configuration

Accounts:

Configuration bucket should have name sys.c-ex-facebook and must contain table accounts:

"order","id","name","token","valid","user"

with primary key on column order. This table contains pages to fetch - their Facebook id, name, access token and Facebook id of user whose credentials are used.  You can use the following steps to obtain the user id

  1. Obtain token from user.
  2. Obtain <PAGE ID> using findmyfbid.com and verify it by going to https://www.facebook.com/<pageid>
  3. Verify <TOKEN> by going to https://developers.facebook.com/tools/explorer/ and pasting token
  4. Verify that user authorized for Keboola Connection Extractor App (!)
  5. Get <USER ID>:


Configurations:

Each project can have several extractor configurations each stored in its own table in sys.c-ex-facebook bucket. Name of the table can be freely chosen by project's administrator and must be unique within the project.

Table must contain these columns:

"type","query","columns","table","primaryColumn","timestampColumn"
  • type - may contain one of these values:

    • insightsPagesinsightsPostsinsightsPages_pivotedinsightsPosts_pivoted - processes data from API call as Insights stats
    • list - processes data from API call as list of objects
    • paginated - similar to list type but uses Graph API pagination instead of temporal pagination
    • object - processes data from API call as single object
    • value - fetch a single value from a request (allows to grab a value not visible to other requests)
  • query - API call query

    • may contain [accountId] placeholder used for looping throug accounts configuration table and using account id and token for calls, eg. [accountId]/insights
    • may contain at most one reference placeholder to data table and column, e.g. {posts.id}/insights goes toposts table in data bucket and for each value from column id calls defined API call query using the value instead of it's placeholder in braces
    • reference placeholder can contain restriction of number of days if table should not be fetched whole but only rows added in last X days, e.g. {posts.id,created_time:14} will fetch only posts created in last 14 days, created_time is name of column with timestamp dates
  • columns - definition of columns to save (relevant only for list type)

    • parses each object in API call result and gets values of defined properties
    • dot means entering to nested object but there has to be no loop in it
    • there are two possible placeholders:

      • #objectId creates column objectId where it puts actual value from column referenced in query via {}construct
      • #timestamp creates column timestamp with actual timestamp in ISO 8601 format
      • #datestamp creates column datestamp with actual date in ISO 8601 format
    • e.g. id,from.id,created_time
    • insights_pivoted and insightsLifetime_pivoted use a different syntax, see Insights chapter
  • table - name of target data table
  • primaryColumn - name of column which will be used as primary key for incremental saves (irrelevant forinsights* and value types )

    • may contain + for concatenation of more columns together, resulting column will be named id__composite
  • timestampColumn - name of column which contains time (used for determining from where to start fetching data in next extractor run (irrelevant for insightsobject and value types)

Example:


Data are saved to bucket in.c-facebook-{configTableName}.

API Call Types

Insights

Insights are divided into 4 parts

  • insightsPages - stores insights data one row per metric, key and period combination; this type allows you to backfill data from history (specifically designed for pages Insights)
  • insightsPosts - used for metrics, that have only lifetime period (designed for posts Insights),
  • insightsPages_pivotedinsightsPosts_pivoted - same as insightsPages and insightsPosts, but stores just one row for each object (Page or Post) and defines, which metric goes into which column

InsightsPages

Insights calls are similar for pages and posts, but there is a slight difference in handling dates and ranges, so there is a specific type for posts and pages. An example for insightsPages:

"insightsPages","[accountId]/insights","","insights","",""

Structure of target data table is fixed and cannot be changed. It is:

"ex__primary","objectId","metric","period","endTime","key","value"
  • ex__primary is identificator for incremental saves - but API call response contains no identificators so it is created from concatenated and hashed (MD5) values of columns fixed for changing values (account . pageId . metric . period . endTime . key)
  • ex__acount is identificator of accounts from configuration table accounts
  • ex__object is Facebook identificator of the object
  • metric is name of Insights metric
  • period is time period of metric (dayweekdays_28lifetime)
  • endTime is time in which the metric is effective (contains actual timestamp of data fetch for lifetime metrics); uses ISO 8601 format of date with specified time zone, eg. 2012-12-21T00:00:00-0700endTime is always rounded to the whole day in America/Los_Angeles time zone.
  • key is optional identificator of another dimension if data are drilled (e.g. name of city for metric drilled by cities); it is empty for non-drilled metric
  • value is actual value of the metric

This type allows to go to past and backfill data using eg. "days": 5 parameter in the request.

InsightsPosts

For querying insights for posts, you need to download list of posts (using /feed or /posts):

"list","[accountId]/feed","id,from.id","posts","id","created_time"
"insightsPosts","{posts.id}/insights","","post_insights","",""

Structure of target data table is fixed and cannot be changed. It is:

"ex__primary","objectId","metric","endTime","key","value"
  • ex__primary is identificator for incremental saves - but API call response contains no identificators so it is created from concatenated and hashed (MD5) values of columns fixed for changing values (account . postsId . metric . period . endTime . key)
  • ex__acount is identificator of accounts from configuration table accounts
  • ex__object is Facebook identificator of the object
  • metric is name of Insights metric
  • endTime is time in which the metric is effective (contains actual timestamp of data fetch for lifetime metrics); uses ISO 8601 format of date with specified time zone, eg. 2012-12-21T00:00:00-0700endTime is always rounded to the prevoous day in America/Los_Angeles time zone, so running the extractor today (2013-08-29T13:34:00-0700) will result in endTime being yesterday 12am (2013-08-28T00:00:00-0700).
  • key is optional identificator of another dimension if data are drilled (e.g. name of city for metric drilled by cities); it is empty for non-drilled metric
  • value is actual value of the metric

Pivoted Insights

Note: Due to a bug in the pivoted configurations all dates are shifted one day backward, eg. metrics for 2015-12-06 are displayed on 2015-12-05. We won't fix this bug as it might cause inconsistency in incremental data loads.

This applies to both insightsPages_pivoted and insightsPosts_pivoted. This type was created to use columns instead of rows for storing metrics, so for every object (page or post) you can have a single row with as many metrics as you want. This saves storage space in both GD and KBC. The common columns are:

  • ex__primary
  • ex__acount
  • ex__object
  • endTime
  • … (metric columns)

The source insights data looks like this:

...
{
    "id": "123456_123456/insights/post_story_adds/lifetime",
    "name": "post_story_adds",
    "period": "lifetime",
    "values": [
        {
            "value": 1561
        }
    ],
    "title": "Lifetime Post Stories",
    "description": "Lifetime The number of stories generated about your Page post. (Total Count)"
},
{
    "id": "123456_123456/insights/post_story_adds_by_action_type_unique/lifetime",
    "name": "post_story_adds_by_action_type",
    "period": "lifetime",
    "values": [
        {
            "value": {
                "like": 1444,
                "comment": 32,
                "share": 24
            }
        }
    ],
    "title": "Lifetime Talking About This (Post) by action type",
    "description": "Lifetime The number of unique people who created a story about your Page post by interacting with it. (Unique Users)"
},
...

The configuration of the columns is in JSON format, eg:

[
    {
        "column": "story_adds",
        "name": "post_story_adds",
        "period": "lifetime"
    },
    {
        "column": "story_adds_like",
        "name": "post_story_adds_by_action_type",
        "period": "lifetime",
        "key": "like"
    }
] 

You need to define parameters to track a single value of a metric to be stored in one column.

  • column - target column in Storage API
  • name - name of the metric in Graph API
  • period - period value (lifetime, day, week, days_28)
  • key (where available) - specifies the submetric value

    This configuration will result in 2 columns containing the correct values:

    "ex__primary","ex_account","ex_object","endTime","story_adds","story_adds_like"
    "3e09…","123456","123456_123456","2013-08-20…","1561","1444"

If the extractor is unable to find a specified value for a cell, it returns a 0.

Remember to escape this configuration JSON before you save it to the extractor's columns column.


Date & Time

We store all dates and times in Palo Alto timezone (America/Los_Angeles). Facebook internally switches new days in this timezone, so does GoodData. All InsightsPosts metrics are stored with endDate being midnight of the previous day, eg. downloading the data on 2013-08-29T13:34:00-0700 saves endTime 2013-08-28T00:00:00-0700. This is due to that you want to schedule the extraction as close to midnight as possible and then the date of current day is misleading. Saving number of likes on 2013-08-29T01:11:00-0700 does not tell you how many likes you had on 2013-08-29 but rather how many likes you had on 2013-08-28.

List of objects

This type can be used to fetch list of any objects, e.g. posts, post comments, post likes, etc.

Example of output from list of posts may look like this:


And example of fetched properties may be:

id,from.id,from.name,message,created_time

We have to specify property which will serve as primary column for data table (id in this example) and property with timestamp of the data to tell the extractor from which timestamp to continue fetching data in next run (it will be created_time)

Paginated list of objects

Similar to list type but does not paginate using start and end dates, uses Graph API pagination instead. It is intended for non-temporal data lists (e.g. likes and account administrators).

Single object

Fetches properties from API response as single object, configuration is very similar to list of objects above.

Single value

Using a value type you can extract just a single value from Graph API. Useful eg for getting likes of a post in ahttps://graph.facebook.com/POST_ID/likes?summary=1 request (summary.total_count). This request does not perform limiting the request (using fields query parameter), that's why using any other request will fail to grab the value . In the columns value you can specify a JSONPath query to extract the single value.

value type does not require setting up a primary key. Primary key gets assigned automatically using all available values of

  • accountId
  • objectId (for lists)
  • timestamp
  • datestamp

Also the column objectId is added whenever it makes sense. Using a combination of timestamp or datestamp you can automatically create snapshots of the value.

Examples of Configuration

Posts

"list","[accountId]/posts","id,from.id,from.name,type,message,link,created_time,likes.count","posts","id","created_time"

Snapshotting of Posts

"list","[accountId]/posts","id,from.id,from.name,type,message,link,created_time,likes.count","posts","id+created_time","created_time"

Posts Insights

This will save Insights for all posts newer than 28 days.

"insights","{posts.id,created_time:28}/insights","","post_insights","",""

Post Comments

"list","{posts.id,created_time:28}/comments","id,from.id,from.name,message,created_time,like_count","post_comments","id","created_time"

Competitive pages

You can create a table containing list of pages identificators in input backet (e.g. in.c-facebook-fb1.competitive_pages_list) and then use this configuration:

"object","{competitive_pages_list.id}","id,name,category,talking_about_count,likes,#timestamp","competitive_pages","id+#timestamp","timestamp"

Data table must contain indexed column called ex__account containing account id from accounts configuration table whose access token will be used for queries.

Or fetch just one competitive page:

"object","cocacola","id,name,category,talking_about_count,likes,#timestamp","competitive_pages","id+#timestamp","timestamp"

Snapshotting album (post) likes

We usually run this query with request parameter since as far back into past as required - to download all albums (override the 14 day default).

"list","[accountId]/albums","id,name,from.id,from.name,link,created_time,updated_time","albums","id","created_time"
"value","{albums.id}/likes?summary=1","summary.total_count,#datestamp","albums_likes","",""

Snapshotting post likes is very similar.

Comments