WEB DATA COLLECTION
SPIDA Suite Web Crawlers by Point Duty are a set of tools that collect unstructured data from the clear, deep and dark web. SPIDA comes in three configurations that enable investigators to acquire and collate the material in the most appropriate form for any task.
Huntsman SPIDA by Point Duty is designed specifically for investigators and analysts using IBM’s i2 Analyst’s Notebook charts. Huntsman helps disseminate documents and websites extracting data into entities, link and attributes and allowing seamless addition into IBM i2 Analyst’s Notebook charts for further investigation.
- Huntsman saves time – Huntsman allows for extraction straight from text and image sources into IMB i2 ANB charts directly as entities, links, attributes whilst maintaining cards.
- Huntsman is efficient – Huntsman will extract images and text, website data and create an archive of the site or capture a screenshot, all from right click menu.
- Huntsman is discreet – Huntsman features a built in TOR browser for private data extraction from anything on clear, deep or dark web.
- Huntsman is smart – Huntsman prevents duplications and ensures all possible links are maintained and accurate. Huntsman will scan charts, looking for duplicates and action a response.
Huntsman can be used to manually extract text and images. Extracted data can import directly into i2 Analyst’s Notebook as entities, links, attributes whilst maintaining cards. Huntsman is used to extract data from webpages, forums, bulletin boards and social networks from many source types. Sources of data can come from PDF, Word, HTML or txt files.
Huntsman can extract HTML data using the inbuilt TOR Browser, allowing discrete extraction from the clear, deep or dark web. Huntsman captures data from the entire website, collected data is logged and maintained in i2 Cards as a screenshot and as text, images and scripting. All items are linked to original sources - for archival and evidence purposes. An audit trail is created with logs of all extractions created by a user.
Wolf SPIDA by Point Duty is our automated site capture program, that captures a site in entirety for further analysis. Wolf SPIDA has a crawl function that searches based on keywords or URL to extract all available data from a targeted term.
Wolf SPIDA is unique, with our heuristic learning engine enabling Wolf to learn the layouts of websites types such as forums, bulletin boards and social networks and the variety of formatting and layout conventions that data is presented in. Wolf SPIDA learns date formats, name conventions, post configurations and reply formats.
- Wolf SPIDA is adaptive – learns formatting and layouts of bulletins and forums.
- Wolf SPIDA is efficient – fully automated keyword search and site capture.
Funnelweb SPIDA by Point Duty is our fully automatic, complete website structure extraction program on keyword, queries or URL. Funnelweb allows for multiple search and capture tasks to be run in tandem allowing for scalable and extensive data collection. Funnelweb offers discreet collection options utilising the inbuilt TOR Browser allowing anonymous collection of web data.
- Funnelweb SPIDA is scalable – many searches can be done simultaneously.
- Funnelweb SPIDA is through – it captures everything.