We have added support for the optional query parameter noredirects to the Diffbot API. This parameter prevents the Diffbot API from automatically following HTTP redirects for the submitted URL, giving you more control over the extraction process.

Usage To use this parameter, simply appendnoredirects to your API call URL.

Example Request: http://api.diffbot.com<YOUR_TOKEN>&url=<ARTICLE_URL>&noredirects

Error Handling When the noredirects parameter is used, the API will not follow a redirect. Instead, if a redirect is required to access the page content, the API will return an HTTP 500 Internal Server Error with a specific JSON response body. The final, redirected URL is not included in the response.

Example Error Response (HTTP 500): json { "errorCode": 500, "error": "This page requires a redirect. Please retry with redirects enabled if this url needs to be extracted." }

Primary Use Cases & Benefits This parameter is most beneficial when using specific extraction APIs, such as the Article API or Product API, rather than the Diffbot Analyze API. Use it to:

  • Prevent extractions: for example, use this param to prevent an outdated article or product offer URL from silently redirecting to a general index or homepage when the original content is no longer available. This avoids the undesired extraction of the first item from a list on the index page.
  • Control the extraction source: to ensure that the extraction is performed only on the exact URL submitted, giving developers certainty regarding the data source.

NACE Code Rev 2.1 Updates

by Kris Negulescu

KG DATA CHANGE NOTIFICATION - Organization.naceClassification

We will be updating Organization.naceClassification to NACE Rev. 2.1 in build v437 of the Diffbot Knowledge Graph, targeted to go live in about two weeks. Please read on for more details.

Ordinarily, we take extraordinary measures to avoid breaking changes in the Diffbot Knowledge Graph ontologies. However, in some cases, there is no benefit in retaining a prior version of the data, so we replace an existing attribute with a new data format. The Organization.naceClassification field is one such case. The current version of the NACE codes in the KG lacks level, isPrimary, and ancestor codes. And, some of the codes are no longer valid in the latest NACE Rev. 2.1 version.

In Rev 2.1 of the NACE codes:

  • NACE codes are no longer strictly 4-digit numbers.
  • NACE codes are structured into:
        Sections (letters A–V, level 1) →&#xA; Divisions (2 digits, level 2) →
        Groups (3 digits with dot, level 3) →&#xA; Classes (4 digits with dot, level 4).
  • Codes are unique. For example, both 28 and 29 share the same parent C, but it is not repeated after 28 because it already appears earlier in the primary chain.
  • There is at most one primary code per level.
  • Primary codes are listed before non-primaries.
  • Specific codes (e.g., 29.10) are listed before broader ones (e.g., 29.1).

For a comparison of the existing code format versus the new Rev 2.1 format, see below.

CURRENT DATA FORMAT: NACE codes - Organization.naceClassification

Volkswagen's current NACE classification in the KG appears as the following

[  
  {  
    "code": "2910",  
    "isPrimary": false,  
    "name": "Manufacture of motor vehicles"  
  },  
  {  
    "code": "7022",  
    "isPrimary": false,  
    "name": "Business and other management consultancy activities"  
  },  
  {  
    "code": "7021",  
    "isPrimary": false,  
    "name": "Public relations and communication activities"  
  }  
]

Issues with this data:

  • Missing level information
  • All codes are marked as non-primary
  • Parent codes are missing
  • Codes 7022 and 7021 are no longer valid in the Rev. 2.1 version of the codes
  • Volkswagen should not be classified under those industries in 7022 and 7021.

NEW DATA FORMAT: NACE Rev 2.1 Codes

When the updates deploy, Volkswagen's Organization.naceClassification NACE codes will look like this:

[  
  {  
    "code": "29.10",  
    "level": 4,  
    "isPrimary": true,  
    "name": "Manufacture of motor vehicles",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "29.1",  
    "level": 3,  
    "isPrimary": true,  
    "name": "Manufacture of motor vehicles",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "29",  
    "level": 2,  
    "isPrimary": true,  
    "name": "Manufacture of motor vehicles, trailers and semi-trailers",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "C",  
    "level": 1,  
    "isPrimary": true,  
    "name": "MANUFACTURING",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "28.11",  
    "level": 4,  
    "isPrimary": false,  
    "name": "Manufacture of engines and turbines, except aircraft, vehicle and cycle engines",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "28.1",  
    "level": 3,  
    "isPrimary": false,  
    "name": "Manufacture of general-purpose machinery",  
    "version": "Rev 2.1"  
  },  
  {  
    "code": "28",  
    "level": 2,  
    "isPrimary": false,  
    "name": "Manufacture of machinery and equipment n.e.c.",  
    "version": "Rev 2.1"  
  }  
]

Diffbot on Postman

by Jerome Choo

You can now find us on Postman! We're starting with Extract API, and moving quickly to get the rest of our APIs on Postman as well.

Postman is an API testing platform that eliminates the need to manually write cURL. The API testing UI is quite similar to what we have in the docs, with even more features to setup your environment, testing scripts, and more.

Note that our primary documentation platform will continue to live on docs.diffbot.com. Postman is an extension of our docs presence to make it easier for Postman users to test Diffbot APIs on their preferred platform.

Fork and watch our Diffbot API collection on Postman!

Investment Transactions are now searchable on LeadGraph! This makes it possible to:
⁃ Stay on top of recent funding rounds
⁃ Find investors that have invested in companies with particular industries, keywords, company size, etc.
⁃ See funding insights for investors, industries, funding rounds, and more....