Extrair Informações do Facebook
ExtractFace Documentation
Using ExtractFace
- Start Mozilla Firefox;
- Start MozRepl addon if you didn't set the Activate on startup option;
- When using ExtractFace, be sure that MozRepl is started.
- You can check the Activate on startup option so it will start automatically when you use Firefox.
- MozRepl is listening for connections on the default port 4242. Be sure that you don't have a firewall that blocks the connection.
- Login to your profile;
- Go to your target profile;
- Right-click on ExtractFace taskbar icon to popup the menu.
Scroll and Expand
Facebook, like many other websites, use Javascript/Ajax to display content to the user. The page is the same but the content change as user click on things or do something. Often, to be able to see the whole page, you have to scroll and click lot of things which could be very time consuming if you want it all.
To automate that process, ExtractFace provides the Scroll and the Expand function. Both would work on many pages of a Facebook profile like the Timeline page or a page with photo and comments. Here are some more explanations about these functions:
- Scroll: This function will scroll down the page until the end, except if you set a maximum of pages to scroll or a maximum date. See Settings.
- Expand: This function will click on See more and other similar links. The Expand function can be customized. See Settings for details.
- Scroll and Expand: This function alternates between the Scroll and the Expand function so it scrolls one time, expands, scrolls again, expands, and so on...
These functions don't check if you are on the right page. So you have to choose it manually. To use these function on a page with photo and comments, you must open the page in normal view, not a popup window. To do this:
- Right click on the photo and select Open Link in New Tab
- If it opens a popup, close it.
Dump Albums
This function can be use to gather all (or selected) albums of photos. To be able to use it, you must be anywhere in the target profile.
When you select this function, ExtractFace:
- Displays the Dump albums window (shown above)
- Checks if you are already in the photos_albums page
- If not, goes to the right page
- Gathers the username or the profile ID to propose as filename
- Shows available albums
Like other "Dump" function, you have to select a directory where albums and photos will be saved. Options are:
- Include publication date: If you select this option, ExtractFace will gather the date that the photo or video has been published. Because ExtractFace has to open each page to get that information, using this option will take more time.
- Open output HTML file: At the end of the process, the HTML album page will be opened in your default browser.
- Open album folder: At the end of the process, the folder where the album have been saved will be opened in Window Explorer.
- Picture size: Choose the size of picture. Small picture is the one in the album page. That is the fastest method. For medium and large size, ExtractFace has to open each page and wait until it is fully loaded. This time correspond to the Time for loading, multiplicated by 5 (See Settings).
When ready, you can click on Dump to start the process. ExtractFace will create a directory for each album selected and download all the images from that album.
Dump Friends
This function can be use to gather all friend lists (or selected) of a profile. It produces a report that contains details of profile's friends. To be able to use it, you must be anywhere in the target profile.
When you select this function, ExtractFace:
- Displays the Dump friends window (shown above)
- Checks if you are already in the friends page
- If not, goes to the right page
- Gathers the username or the profile ID to propose as filename
- Shows available friend list categories
If process crash or you were not in the right page, you can use the "refresh" button.
You have to select a directory where the report will be saved. You also have to choose the format for the report: XLSX (see sample below), HTML or TXT (TSV format).
- Include profile icons: Image of profiles are included in the report. For HTML format, profile icon are always included, but if you check this option, images will be save locally instead of links. Like in Dump Chat, if you have problems, you can use HTML format without checking this option. Then, when you open the report with Firefox, you can use "Save as" in Firefox to save profile icon locally. You'll have to do this for each page. The "Include profile icons" option is not available for TXT format.
When ready, you can click on Dump to start the process. The report will look like this:
Dump Event Members
This function can be use to gather the guest lists associated to an event. It produces a XLSX file that contains some details on the event members. To be able to use it, you must be in the main page of the event.
When you select this function, ExtractFace:
- Displays the Dump event members window (shown above)
- Checks if you are in the events page
- If not, displays an error message
- If yes, gathers details about the event like the event ID, the profile ID of the author and the data URL. It also gathers details about the guest lists. There are four guest lists for each event:
- Going (or went);
- Maybe;
- Invited;
- Declined (Can't Go);
- Proposes a name for the XLSX file.
You have to select a directory where the XLSX file will be saved. You may also select the Include profile icons option to get every images associated with the profiles of the event members (much more slowly).
When ready, you can click on Dump to start the process. When you press Dump, ExtractFace:
- Gets the first data set which contains the four guest list members (max. 500 members for each list).
- Parses the data to gathers event member details.
- If you select the Include profile icons option:
- Downloads profile icons: Unlike other functions of ExtractFace, profiles icons are not included in the downloaded data, so they have to be downloaded separately. That's why it took much more time. During the process, ExtractFace may crash a few times. This is normal, but it may result corrupted image files. So at the end, ExtractFace will check integrity of every image files and replace corrupted ones (download it again).
- Creates the XLSX file, with a spreadsheet for each guest list.
At the end of the process, ExtractFace display the XLSX file if the option Open output XLSX file is checked. Report will look like the one produced by Dump Friends.
Important: For each guest list, you will get a maximum of 500 members. If you think not all event members have been gathered (yes it may happen!), you can restart the process, but don't delete old files. Some will be replaced, but profile icons already downloaded will not be downloaded again, so it's gonna be faster.
Dump Contributors
This function can be used to list all persons that contribute to a particular page. Like Scroll and Expand functions, this one can be used one any page that contains comments and/or Likes. Prior to use this function, you have to choose the page and use the Scroll and Expand function if necessary.
Available options are the same that are used in some other functions like Include profile icons and Open output XLSX file. You also have the Don't scroll Visitor Posts option that can be used if you want to scroll it manually before dumping.
This function produces a XLSX file like Dump Friends and Dump Event Members, but there is an additional column, that contains the number of times a contributor is found. Types of contributors supported are:
- Comments: All person that post a comment on the page. It includes replies. As comments are all on the main page, this one is the fastest.
- Likes: Any types of Likes. This one should be the slowest, because ExtractFace has to open a page for every group of Likes.
- Visitor Posts: You can find Visitors Posts on the left-hand side of some profile. When you select this type, a popup will appear and will be automatically scrolled (except if you select the Don't scroll Visitor Posts option.
Chat Functions
ExtractFace provides some functions to help you extract conversations from your Facebook profile. To be able to use these functions, you must select See All from the chat menu, then select any of the conversation on the left side. In the address bar of your browser, you should have an address like "https://www.facebook.com/messages/[id of the other person].
The Scroll Chat function will scroll until it reaches the beginning of the conversation, except if you set a maximum of pages to scroll or a maximum date. See Settings about that. If you only want a part of the conversation, you can also do a search by using the Search messages in the conversation function provided by Facebook. When you find the message, you can use Load Older Messagesand/or Load Newer Messages functions in ExtractFace to show older or newer messages around the searched one.
When you are ready, you can use the "Dump Chat" function. When you select it, ExtractFace:
- Displays the Dump Chat window (shown above)
- Checks if you are in the messages page
- If not, shows an error message
- If yes, gathers the interlocutor's username or profile ID to propose as filename
ExtractFace provides different options:
- Normal Mode: In normal mode, the page and all of its resources are saved at the beginning. When ExtractFace need to keep local copy of an image for example, it just has to look there, so it's faster. With this mode, you can download attached files in a chat like images, videos, documents, etc (see below). Also, ExtractFace uses Firefox to parse the page and find the elements.
- Safe Mode: Some resources associated with a page could stuck the saving process. So with safe mode, only the html page is saved. Then, ExtractFace parses this page without using Firefox to find the elements, so there is less possibilities of crashing and this mode should be more reliable. Anyway, in case of problem, you should try the two modes, because they are working differently.
- Searched part only: This option must be checked if you did a search by using the Search messages in the conversationfunction provided by Facebook. Otherwise, the dumped chat will be the most recent messages that were displayed before you did the search. If ExtractFace detects that you did a search, the option will be automatically checked.
- Download:
- Pictures: Pictures are images that have been included in the chat by the users, as opposite to images provided by Facebook like emoticons or stickers. If you don't select this option, pictures will be included in the dumped chat, but not saved locally. When the output HTML file is opened in Firefox, you can use "Save as" function to save the result. Firefox will then saved the images as local.
- Full size: This option is related to previous one. If you select it, full size picture will be downloaded. You should see a popup opening during the process. A link in the output HTML file will be added. Be aware that sometimes, the image in full size has the same filename so it will replace the smaller one in the output HTML file.
- Videos: There are two types of video: video as a link and video as attached document. Both are supported by ExtractFace. With this option, ExtractFace will load the page of the video (as a link) or the popup (as attached), gather the video image and the video file. A clickable image will be inserted in the output HTML file. If you don't select the option, a link to the video page will be inserted.
- Attached document: This can be any file that was attached to a message in a chat. If you don't select the option, a link to the document will be inserted.
- Vocal messages: This element is particular because there is no link to the file in the conversation. To be able to gather vocal messages, ExtractFace must use the Facebook Mobile Site. It uses the timestamp of the message to find the correct file. If you don't select the option, ExtractFace will suggest a filename to associate the file to the correct message, but you will have to download it by yourself.
- Dates: Here you can select a date range to be dumped.
- Hide me: If you select this option, your details (profile icon, profile URL and profile Name) will be blacked out.
Some elements (not mentionned below) can be found in a Facebook conversation. Here are some notes about how ExtractFace handles it:
- Emoticons and stickers These images are provided by Facebook, but choosen by users. An image sprite is used for each of them. Because of the nature of this "technology", ExtractFace will download the required image sprite, even in safe mode.
- SVG images: SVG (Scalable Vector Graphics) is a vector image format that uses XML. It cannot be shown if the file is not stored locally. That's why ExtractFace must download it, even in safe mode. Actually, the only image of this type I've seen is this one: https://www.facebook.com/rsrc.php/ya/r/FwHVs2eE5cr.svg.
- GPS coordinates: When available, ExtractFace will gather GPS coordinates and will add a Google map link in the output HTML file.
- Notifications: Ex.: "You missed a call", "Seen", "Sent from Mobile", etc. Notifications can be in a message, in the conversation or even "outside". ExtractFace should support them all, but it doesn't actually save the small image that could be associated to it.
Directories that could be created by ExtractFace (if needed):
- images_[page title]: For the image files
- videos_[page title]: For the video files
- pj_[page title]: For attached documents
- vm_[page title]: For vocal messages
The output HTML file should look like this:
Dumping huge conversation
If you have to dump a huge conversation (thousand of messages) and you can't display it all in Firefox because regular scrolling get stuck before reaching the beginning of the conversation. There is the solution:
- Use regular scrolling and dump the most recent messages;
- Reload the page in Firefox (to free some memory space);
- Until you reach the begining of the conversation:
- Use the Search messages in the conversation function in Facebook to search the older message displayed in the part you already dumped;
- Use the Load Older Messages function in ExtractFace. To prevent crash Firefox, you could set a limit to the scrolling (See Settings about that);
- Dump the displayed part (don't forget to check Searched part only option;
- Reload the page in Firefox before doing another search;
Settings
There are a few parameters that can be set in ExtractFace.
General options
In Tool section, we have the following functions and options:
- Export Lang.ini: Use this function to translate ExtractFace GUI. See Translation for help about this functionality.
- Check Update: Check on le-tools.com if a tool update is available.
- Check for update at startup: When ExtractFace starts, check on website for available update of the tool.
In Functions section, you have:
- Remember position of all windows: By default, every windows are centered in your main screen. If you use multiple monitor, it can be useful to have ExtractFace on the same screen as Firefox. Use this option to remember the position of any window. Position is saved when the window is closed. Progress window is centered on the top of the called function. For scrolling and expanding functions, it is centered on top of the main window.
- Time for loading: Time to wait when a page (or new content) must be loaded in Firefox. Default value is 2 seconds. As ExtractFace must often gather content from internet, it can be affected by network latency. When ExtractFace tries to access data in Firefox too fast, it may crash the process. ExtractFace has been designed to recover from crash like this, but it may slow down the whole process. If it happends too often, you can try to increase this value.
- Number of resumes: Number of times that ExtractFace will restart a crashed process before giving up. Default value is 10.
- Delete temp files when finished: Applies to most of functions. When ExtractFace downloads page, it saves data in a temp directory. If this options is checked, the temp directory is deleted when function ends.
- Enable debug logging: If you check this option, a debug.log will be created in the program folder. Every crash error will be logged in the file. This could be useful for troubleshooting.
- Charset: ExtractFace supports UTF8 (used on Facebook) internally, but the interface (Win32-GUI) doesn't support it. So, in some cases (ex.: Friends categories, album names, etc.), it may be a problem. Sometime, the charset will depend on the language of your profile and sometime, it will depend on the language of the target profile. To deal with that, ExtractFace supports different charsets. Default is cp1252.
Scroll options
The following options are related to the available scrolling functions within ExtractFace.
- Maximum scrolling (chat): When you use scroll chat functions (there are three), ExtractFace scrolls up or down, waits for the additional content to display, scrolls again, and so on until it reaches the beginning or the end of the conversation. By page is the maximum of times ExtractFace will scroll (not the number of messages displayed). Default value is 0 which means no limit. If you set a date using By date option, ExtractFace will stop scrolling when the given date will be visible in the conversation, but it doesn't mean it will stop exactly to this date. Be also aware that, if you set a date here, it won't be saved after ExtractFace will be closed as it will for the maximum set using By page.
- Maximum scrolling (other): Like the previous option, but it is asscoiated to the general Scroll function.
- When page loaded, scroll back to top: When a page is fully loaded (after scrolling to the bottom), scroll back to top. Default is checked. You may uncheck this option if you want to easily see if scrolling fails to load all.
Expand options
These options are related to the Expand function and the combined Scroll and Expand function. Every option allows to expand a particular type of content: additional text, comments or posts.
Comentários
Postar um comentário