With support for the phone number formats of 228 countries and regions, libphonenumber from Google Code is a valuable addition to Talend Open Studio.
libphonenumber
libphonenumber is hosted at Google Code: downloads. I built the 3.0 version from source code, but you can download a JAR file if you don't have a JDK and Maven handy. Note the download location for use in the tLoadLibrary component.
libphonenumber will take data like the following variations of a Maryland phone number.
(301) 555-5555|3015555555|301.555.5555
And produce three standardized entries.
(301) 555-5555|(301) 555-5555|(301) 555-5555
Talend Job
To demonstrate libphonenumber, the following job uses a tRowGenerator to send a record to a tJavaRow component. There are tLogRow components for output and a tLoadLibrary component for loading libphonenumber-3.0.jar and various imports.
libphonenumber Test Job |
tLibraryLoad
Configure the Basic settings tab by searching for the JAR file loaded on Google Code.
In the Advanced settings tab, add the following import statements. The class structure (nested classes and enums) can be a little tricky if you're not used to Java, so double-check this if there's a problem.
import com.google.i18n.phonenumbers.PhoneNumberUtil;
import static com.google.i18n.phonenumbers.Phonenumber.PhoneNumber;
import static com.google.i18n.phonenumbers.PhoneNumberUtil.PhoneNumberFormat;
tRowGenerator
The tRowGenerator column generates a single three-column record based on this schema.
tRowGenerator Schema |
tRowGenerator Generated Values |
tJavaRow
The tJavaRow component uses the classes of libphonenumber. It forms several PhoneNumber Java objects using the PhoneNumberUtils.parse() method. Note the country code that is listed along with the phone number ("US"). The parse() calls are followed up by format() calls that work with the PhoneNumber Java objects. The format() calls return Strings to the output_row columns.
Enter this code in the tJavaRow Basic settings tab. "INTERNATIONAL" or "E164" can be substituted for "NATIONAL" to render a different format.
try {
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
PhoneNumber col1_pn = phoneUtil.parse(input_row.usPhoneCol1, "US");
PhoneNumber col2_pn = phoneUtil.parse(input_row.usPhoneCol2, "US");
PhoneNumber col3_pn = phoneUtil.parse(input_row.usPhoneCol3, "US");
output_row.usPhoneCol1 = phoneUtil.format(col1_pn, PhoneNumberFormat.NATIONAL);
output_row.usPhoneCol2 = phoneUtil.format(col2_pn, PhoneNumberFormat.NATIONAL);
output_row.usPhoneCol3 = phoneUtil.format(col3_pn, PhoneNumberFormat.NATIONAL);
}
catch(NumberFormatException exc) {
exc.printStackTrace();
}
PhoneNumber objects can do a lot more than just be formatted. They can return parts of the phone number like the country code, area code, or local exchange.
228 Countries and Regions
You can work with Java regular expressions in Talend Open Studio, but to cover this much functionality will require quite a lot of them. This example could use some tweaking around parsing errors, possibly rejecting the record or providing a default value. As a starting point, this is a great way to get phone numbers in shape.
Thanks Carl for great post! Put me on the right way. One comment, though. I believe you were referring to tLibraryLoad component, not tLoadLibrary one.
ReplyDeleteThanks for the feedback. I corrected the section heading.
ReplyDeleteHey there. The tRowGenerator "values" image is the same as the "schema" image...
ReplyDelete