DF API - submit matching job

DF API - submit matching job

2021, Jun 14    

to submit matching job with DF API you can use this app (Oracle office or VPN required)

In the upper part you can submit file for matching.

There are few requirements on this file:

  1. File needs to be in json lines format
  2. File needs to be gzipped and named with .gz extension

To generate json records you can use oracle function json_object()

e.g.

SELECT json_object('id' VALUE a.cxd_contact_id,
                   'name' VALUE a.company_given,
                   --'url' VALUE b.email_domain,
                   'city' VALUE a.city,
                   'street1' VALUE a.ADDRESS_LINE1,
                   'zipcode' VALUE a.zipcode,
                   'country' VALUE a.country absent on null)
from cxd_contacts a
where a.country = 'GERMANY'
and a.company_given is not null
and a.datafox_id is null
and rownum <= 10;

that generates this content:

{"id":776087140,"name":"Plebeijer","city":"Bielefeld","street1":"Amnke 13a","zipcode":"89111","country":"GERMANY"}
{"id":773173178,"name":"aaaaaaaaaaaaaa-pchi","country":"GERMANY"}
{"id":776094556,"name":"INFO@ITSERVICE-AM.DE-162985","city":"Lüneburg","street1":"Blümchensaal 1b","zipcode":"21337","country":"GERMANY"}
{"id":775594050,"name":"Al Mouselli","city":"Berlin","street1":"Dröpkeweg","zipcode":"12353","country":"GERMANY"}
{"id":776072638,"name":"UNITYMEDIA NRW GMBH","city":"Ratingen","street1":"Reinaldstr 8","zipcode":"40882","country":"GERMANY"}
{"id":773186583,"name":"TH Koeln","city":"Gummersbach","street1":"Steinmulleralee 1","zipcode":"51643","country":"GERMANY"}
{"id":776086964,"name":"Uwe Konrad Fahrzeug Zentrum","city":"Löhne","street1":"Lübbeckerstr. 35-39","zipcode":"32584","country":"GERMANY"}
{"id":776144334,"name":"Trifacta","city":"Berlin","street1":"Neue Gruenstrasse 18","zipcode":"12101","country":"GERMANY"}
{"id":776185251,"name":"bialog GmbH","city":"Rodgau","street1":"Rubens Str. 11a","zipcode":"63110","country":"GERMANY"}
{"id":776238322,"name":"Deutsche Bahn AG","city":"Berlin","street1":"Potsdamer Platz 2","zipcode":"10785","country":"GERMANY"}

The content should be put in a text file and file should be gzipped:

Finally we have a file called: de_test_210614.jsonl.gz

This file we can be uploaded now on http://chris.de.oracle.com:8765/

01.png

When the job is submitted the page should refresh and you should see that the job is in processing:

02.png

you can refresh the page manually from time to time to check the progress. Once ready the div with your job will turn green and the link to the output file will be available:

03.png

Clicking on the link will start generating the output file which will be finally delivered under following link:

04.png

Content of the file will be similar to this:

{"id":"776087140","datafox_id":"5130f0508989846a3601c77c","status":"Irregular","bucket":"not_matched","score":0.16798161486607555}
{"id":"773173178","datafox_id":"52af8c1df13b70f31000656c","status":"Irregular","bucket":"not_matched","score":0.07585818002124355}
{"id":"776094556","datafox_id":"5f08755421e3570100263276","status":"Irregular","bucket":"bad_input","score":0.16798161486607555}
{"id":"775594050","datafox_id":"59117c2472e040075ec4ab97","status":"Irregular","bucket":"not_matched","score":0.3543436937742046}
{"id":"776072638","datafox_id":"5d1ed730cb5af9e906594950","status":"Irregular","bucket":"not_matched","score":0.3543436937742046}
{"id":"773186583","datafox_id":"5d710f370160d6c75f079f7a","status":"Irregular","bucket":"not_matched","score":0.3543436937742046}
{"id":"776086964","datafox_id":"5d8142f611aaa12c046b7124","status":"Irregular","bucket":"not_matched","score":0.18242552380635635}
{"id":"776144334","datafox_id":"52109db9bdfe6c4d78187bbb","status":"Irregular","bucket":"not_matched","score":0.16798161486607555}
{"id":"776185251","datafox_id":"5d8639df242ccb1504aabf3e","status":"Irregular","bucket":"not_matched","score":0.16798161486607555}
{"id":"776238322","datafox_id":"52109db9bdfe6c4d78187bbb","status":"Verified","bucket":"matched","score":0.9945137011005495}

The above example matched just 1 record. Rest were not_matched or bad_input.

Source code in this git repo: https://orahub.oci.oraclecorp.com/krzysztof.cierpisz/dfjobs